57
Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Embed Size (px)

Citation preview

Page 1: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Ch10 Machine Learning: Symbol-Based

Dr. Bernard Chen Ph.D.University of Central Arkansas

Spring 2011

Page 2: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Machine Learning Outline

The book present four chapters on machine learning, reflecting four approaches to the problem: Symbol Based Connectionist Genetic/Evolutionary Stochastic

Page 3: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Ch.10 Outline

A framework for Symbol-Based Learning

ID3 Decision Tree Unsupervised Learning

Page 4: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The Framework for Symbol-Based Learning

Page 5: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The Framework Example

Data

The representation: Size(small)^color(red)^shape(round) Size(large)^color(red)^shape(round)

Page 6: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The Framework Example A set of operations:

Based on Size(small)^color(red)^shape(round)

replace a single constant with a variable produces the generalizations:

Size(X)^color(red)^shape(round) Size(small)^color(X)^shape(round)Size(small)^color(red)^shape(X)

Page 7: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The Framework Example

The concept space The learner must search this space to

find the desired concept. The complexity of this concept space

is a primary measure of the difficulty of a learning problem

Page 8: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The Framework Example

Page 9: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The Framework Example Heuristic search:

Based on Size(small)^color(red)^shape(round)

The learner will make that example a candidate “ball” concept; this concept correctly classifies the only positive instance

If the algorithm is given a second positive instance

Size(large)^color(red)^shape(round)

The learner may generalize the candidate “ball” concept to Size(Y)^color(red)^shape(round)

Page 10: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Learning process

The training data is a series of positive and negative examples of the concept: examples of blocks world structures that fit category, along with near misses.

The later are instances that almost belong to the category but fail on one property or relation

Page 11: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Examples and near misses for the concept arch

Page 12: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Examples and near misses for the concept arch

Page 13: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Examples and near misses for the concept arch

Page 14: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Examples and near misses for the concept arch

Page 15: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Learning process This approach is proposed by Patrick

Winston (1975) The program performs a hill climbing

search on the concept space guided by the training data

Because the program does not backtrack, its performance is highly sensitive to the order of the training examples

A bad order can lead the program to dead ends in the search space

Page 16: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Ch.10 Outline

A framework for Symbol-Based Learning

ID3 Decision Tree Unsupervised Learning

Page 17: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree ID3, like candidate elimination, induces

concepts from examples

It is particularly interesting for Its representation of learned knowledge Its approach to the management of

complexity Its heuristic for selecting candidate concepts Its potential for handling noisy data

Page 18: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

Page 19: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

The previous table can be represented as the following decision tree:

Page 20: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree In a decision tree, each internal node represents a test on some

property Each possible value of that property corresponds to a branch of the

tree Leaf nodes represents classification, such as low or moderate risk

Page 21: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

A simplified decision tree for credit risk management

Page 22: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree ID3 constructs decision trees in a top-down

fashion.

ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples

The algorithm recursively constructs a sub-tree for each parturition

This continues until all members of the partition are in the same class

Page 23: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

For example, ID3 selects income as the root property for the first step

Page 24: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

Page 25: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree How to select the 1st node? (and the

following nodes)

ID3 measures the information gained by making each property the root of current subtree

It picks the property that provides the greatest information gain

Page 26: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

If we assume that all the examples in the table occur with equal probability, then: P(risk is high)=6/14 P(risk is moderate)=3/14 P(risk is low)=5/14

Page 27: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

I[6,3,5]=

Based on

531.1)14

5(log

14

5)

14

3(log

14

3)

14

6(log

14

6)5,3,6()( 222 IDInfo

n

iii mpmpMI

12 ))((log)()(

ID3 Decision Tree

Page 28: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

564.0)5,1,0(14

6

)0,2,2(14

4)0,0,4(

14

4)(

I

IIDInfoincome

0)4

0(log

4

0)4

0(log

4

0)4

4(log

4

4)0,0,4()( 222 IDInfo

0.1)4

0(log

4

0)4

2(log

4

2)4

2(log

4

2)0,2,2()( 222 IDInfo

650.0)6

5(log

6

5)6

1(log

6

1)6

0(log

6

0)5,1,0()( 222 IDInfo

Page 29: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

The information gain form income is:Gain(income)= I[6,3,5]-E[income]= 1.531-

0.564=0.967

Similarly, Gain(credit history)=0.266 Gain(debt)=0.063 Gain(colletral)=0.206

Page 30: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree

Since income provides the greatest information gain, ID3 will select it as the root of the tree

Page 31: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Attribute Selection Measure: Information Gain (ID3/C4.5) Select the attribute with the highest

information gain Let pi be the probability that an

arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D|

Expected information (entropy) needed to

classify a tuple in D:)(log)( 2

1i

m

ii ppDInfo

Page 32: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Attribute Selection Measure: Information Gain (ID3/C4.5) Information needed (after using A

to split D into v partitions) to classify D:

Information gained by branching on attribute A

)(||

||)(

1j

v

j

jA DI

D

DDInfo

(D)InfoInfo(D)Gain(A) A

Page 33: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

ID3 Decision Tree Pseudo Code

Page 34: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Another Decision Tree Example

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

Page 35: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Decision Tree Example Info(Tenured)=I(3,3)=

log2(12)=log12/log2=1.07918/0.30103=3.584958.

Teach you what is log2 http://www.ehow.com/how_5144933_calculate-log.html

Convenient tool: http://web2.0calc.com/

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

Page 36: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Decision Tree Example InfoRANK (Tenured)=

3/6 I(1,2) + 2/6 I(1,1) + 1/6 I(1,0)=3/6 * ( ) + 2/6 (1) + 1/6 (0)= 0.79

3/6 I(1,2) means “Assistant Prof” has 3 out of 6 samples, with 1 yes’s and 2 no’s.

2/6 I(1,1) means “Associate Prof” has 2 out of 6 samples, with 1 yes’s and 1 no’s.

1/6 I(1,0) means “Professor” has 1 out of 6 samples, with 1 yes’s and 0 no’s.

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

Page 37: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Decision Tree Example InfoYEARS (Tenured)=

1/6 I(1,0) + 2/6 I(0,2) + 1/6 I(0,1) + 2/6 I (2,0)= 0

1/6 I(1,0) means “years=2” has 1 out of 6 samples, with 1 yes’s and 0 no’s.

2/6 I(0,2) means “years=3” has 2 out of 6 samples, with 0 yes’s and 2 no’s.

1/6 I(0,1) means “years=6” has 1 out of 6 samples, with 0 yes’s and 1 no’s.

2/6 I(2,0) means “years=7” has 2 out of 6 samples, with 2 yes’s and 0 no’s.

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

Page 38: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Ch.10 Outline

A framework for Symbol-Based Learning

ID3 Decision Tree Unsupervised Learning

Page 39: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning The learning algorithms discussed so far

implement forms of supervised learning

They assume the existence of a teacher, some fitness measure, or other external method of classifying training instances

Unsupervised Learning eliminates the teacher and requires that the learners form and evaluate concepts their own

Page 40: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning

Science is perhaps the best example of unsupervised learning in humans

Scientists do not have the benefit of a teacher.

Instead, they propose hypotheses to explain observations,

Page 41: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning The clustering problem starts with (1) a

collection of unclassified objects and (2) a means for measuring the similarity of objects

The goal is to organize the objects into classes that meet some standard of quality, such as maximizing the similarity of objects in the same class

Page 42: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning

Numeric taxonomy is one of the oldest approaches to the clustering problem

A reasonable similarity metric treats each object as a point in n-dimensional space

The similarity of two objects is the Euclidean distance between them in this space

Page 43: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning Using this similarity metric, a common clustering

algorithm builds clusters in a bottom-up fashion, also known as agglomerative clustering:

Examining all pairs of objects, select the pair with the

highest degree of similarity, and mark that pair a cluster

Defining the features of the cluster as some function (such as average) of the features of the component members and then replacing the component objects with this cluster definition

Repeat this process on the collection of objects until all objects have been reduced to a single cluster

Page 44: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning

The result of this algorithm is a Binary Tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size

We may also extend this algorithm to objects represented as sets of symbolic features.

Page 45: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Unsupervised Learning

Object1={small, red, rubber, ball} Object1={small, blue, rubber, ball} Object1={large, black, wooden, ball}

This metric would compute the similary values: Similarity(object1, object2)= ¾ Similarity(object1, object3)=1/4

Page 46: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Partitioning Algorithms: Basic Concept Given a k, find a partition of k clusters that optimizes

the chosen partitioning criterion

Global optimal: exhaustively enumerate all partitions

Heuristic methods: k-means and k-medoids algorithms k-means (MacQueen’67): Each cluster is represented by the

center of the cluster k-medoids or PAM (Partition around medoids) (Kaufman &

Rousseeuw’87): Each cluster is represented by one of the objects in the cluster

Page 47: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The K-Means Clustering Method

Given k, the k-means algorithm is implemented in four steps:

1. Partition objects into k nonempty subsets

2. Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)

3. Assign each object to the cluster with the nearest seed point

4. Go back to Step 2, stop when no more new assignment

Page 48: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

K-means Clustering

Page 49: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

K-means Clustering

Page 50: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

K-means Clustering

Page 51: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

K-means Clustering

Page 52: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

K-means Clustering

Page 53: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

The K-Means Clustering Method

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K object as initial cluster center

Assign each objects to most similar center

Update the cluster means

Update the cluster means

reassign

reassign

Page 54: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Example

Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations

Page 55: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Example

Centroids:3 – 2 3 4 7 9 new centroid: 5

16 – 10 11 12 16 18 19 new centroid: 14.33

25 – 23 24 25 30 new centroid: 25.5

Page 56: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Example

Centroids:5 – 2 3 4 7 9 new centroid: 5

14.33 – 10 11 12 16 18 19 new centroid: 14.33

25.5 – 23 24 25 30 new centroid: 25.5

Page 57: Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

In class Practice

Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations