33
Machine Learning 1 KU NLP KU NLP 9.3 The ID3 Decision Tree Induction Algorithm ID3 induces concepts from examples. ID3 represents concepts as decision trees. Decision tree: a representation that allows us to determine the classification of an object by testing its values for certain properties An example problem of estimating an individual’s credit risk on the basis of credit history, current debt, collateral, and income Table 9.1 lists a sample of individuals with known credit risks. The decision tree of Fig. 9.13 represents the classifications in Table 9.1

Ch 9-2.Machine Learning: Symbol-based[new]

  • Upload
    butest

  • View
    532

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 1

KU NLPKU NLP

9.3 The ID3 Decision Tree Induction Algorithm

ID3 induces concepts from examples.

ID3 represents concepts as decision trees. Decision tree: a representation that allows us to determine the

classification of an object by testing its values for certain properties

An example problem of estimating an individual’s

credit risk on the basis of credit history, current

debt, collateral, and income Table 9.1 lists a sample of individuals with known credit risks. The decision tree of Fig. 9.13 represents the classifications in

Table 9.1

Page 2: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 2

KU NLPKU NLP

Data from credit history of loan applications (Table 9.1)

Page 3: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 3

KU NLPKU NLP

A decision tree for credit risk assessment (Fig. 9.13)

Page 4: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 4

KU NLPKU NLP

9.3 The ID3 Decision Tree Induction Algorithm

In a decision tree, Each internal node represents a test on some property such as

credit history or debt Each possible value of the property corresponds to a branch of

the tree such as high or low Leaf nodes represents classifications such as low or moderate

risk An individual of unknown type may be classified by traversing the

decision tree.

The size of the tree necessary to classify a given set

of examples varies according to the order with which

properties are tested. Fig. 9.14 shows a tree simpler than Fig. 9.13 but the tree also

classifies the examples in Table 9.1

Page 5: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 5

KU NLPKU NLP

A simplified decision tree (Fig. 9.14)

Page 6: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 6

KU NLPKU NLP

9.3 ID3 Decision Tree Induction Algorithm

Choice of the optimal tree measure

the greatest likelihood of correctly classifying unseen data assumption of ID3 algorithm

“the simplest decision tree that covers all the training examples” is the optimal tree

rationale for this assumption is time-honored heuristic of preferring simplicity & avoiding unnecessary assumptions

Occam’s Razor principle“ It is vain to do with more what can be done with less…. Entities should not be multiplied beyond necessity ”

Occam’s Razor principle“ It is vain to do with more what can be done with less…. Entities should not be multiplied beyond necessity ”

Page 7: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 7

KU NLPKU NLP

9.3.1 Top-down Decision Tree Induction

ID3 algorithm constructs decision tree in a top-down fashion selects a property at the current node of the tree using the property to partition the set of examples recursively construct a subtree for each partition Continues until all members of the partition are in the same class Because the order of tests is critical, ID3 relies on its criteria for s

electing the test

For example, ID3 constructs Fig. 9.14 from Table 9.1 ID3 selects INCOME as the root property => Fig. 9.15 The partition {1,4,7,11} consists entirely of high-risk and CREDIT

HISTORY further devides the partition into {2,3}, {14], and {12} => Fig. 9.16

Page 8: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 8

KU NLPKU NLP

Decision Tree Construction Algorithm

Page 9: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 9

KU NLPKU NLP

A partially constructed decision tree (Fig. 9.15)

Page 10: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 10

KU NLPKU NLP

Another partially constructed decision tree (Fig. 9.16)

Page 11: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 11

KU NLPKU NLP

9.3.2 Information Theoretic Test Selection

Test selection method strategy

using information theory to select the test (property) procedure

measure the information gain pick the property providing the greatest information gain

Information gain from property P

aluesproperty vby }C,...C,{C into dpartitione C ofsubset a : C

Pset property in valueof No. :n instances, trainingofset :

)(||

||)( ,))((log)()(

n21i

112

C

CIC

CPECpCpCI

n

ii

in

iii

tree thecomplete n toinformatio expected : E(P)

tree theofcontent n informatio total:)I(

)()()(

C

PECIPgain

Page 12: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 12

KU NLPKU NLP

}13109865{

}141232{

}11741{

property income of partition

(bits) 1.531 14

5log

14

5

14

3log

14

3

14

6log

14

6)1.9 table(

353

35152

1501

222

,,,,,C

,,,C

,,,C

I

k:over$

kto$:$

kto:$

9.3.2 Information Theoretic Test Selection

Page 13: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 13

KU NLPKU NLP

7560l)(collatera

5810(debt)

2660history)(credit

bits 967056405311

(income)-9.1) (table(income)

bits5640650014

601

14

400

14

414

6

14

4

14

4income)( 321

.gain

.gain

.gain

. . - .

EIgain

....

)I(C)I(C)I(CE

9.3.2 Information Theoretic Test Selection

Because INCOME provides the greatest

information gain, ID3 will select it as the root.

Page 14: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 14

KU NLPKU NLP

9.5 Knowledge and Learning

Similarity-based Learning generalization is a function of similarities across training examples biases are limited to syntactic constraints on the form of learned

knowledge

Knowledge-based Learning the need of prior knowledge

the most effective learning occurs when the learner already has considerable knowledge of the domain

argument for the importance of knowledge similarity-based learning techniques rely on relatively large amount of

training data. In contrast, humans can form reliable generalizations from as few as a single training instance.

any set of training examples can support an unlimited number of generalizations, most of which are irrelevant or nonsensical.

Page 15: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 15

KU NLPKU NLP

9.5.2 Explanation-Based Learning

EBL use an explicitly represented domain theory to construct an

explanations of a training example By generalizing from the explanation of the instance, EBL

filter noise select relevant aspects of experience, and organize training data into a systematic and coherent structure

Page 16: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 16

KU NLPKU NLP

9.5.2 Explanation-Based Learning

Given A target concept

general specification of a goal state A training example

an instance of the target A domain theory

a set of rules and facts that are used to explain how the training example is an instance of the goal concept

Operationality criteria some means of describing the form that concept definitions may take

Determine A new schema that achieves target concept in a general way

Page 17: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 17

KU NLPKU NLP

9.5.2 Explanation-Based Learning

Example target concept : a rule used to infer whether an object is a cup

premise(X) -> cup(X) domain theory

liftable(X) ^ holds_liquid(X) -> cup(X) part(Z, W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z) light(Y) ^ part(Y, handle) -> liftable(Y) small(A) -> light(A) . made_of(A, feathers) -> light(A)

training example : an instance of the goal concept cup(obj1) , small(obj1), part(obj1, handle), owns(bob, obj1), part(obj1

, bottom), part(obj1, bowl), points_up(bowl), concave(bowl), color(obj1, red)

operationality criteria Target concepts must be defined in terms of observable, structural pr

operties such as part and points_up

Page 18: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 18

KU NLPKU NLP

9.5.2 Explanation-Based Learning

Algorithm construct an explanation of why the example is indeed an

instance of the training concept (Fig. 9.17) proof that the target concept logically follows from the example eliminates irrelevant concepts and captures relevant concepts to the

goal such as color(obj1, red) generalize the explanation to produce a concept definition

by substituting variables for constants that are part of the training instance while retaining those constants and constraints that are part of the domain theory

EBL defines a new rule whose conclusion is the root of the tree premise is the conjunction of the leaves

small(X) ^ part(X,handle) ^ part(X,W) ^ concave(W) ^ points_up(W) -> cup(X)

Page 19: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 19

KU NLPKU NLP

Proof that an object , X, is a cup (Fig. 9.17)

Page 20: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 20

KU NLPKU NLP

9.5.2 Explanation-Based Learning

Benefits of EBL

select the relevant aspects of the training instance using the

domain theory

form generalizations relevant to specific goals and that are

guaranteed to be logically consistent with the domain theory

learning from single instance

hypothesize unstated relationships between its goals and its

experience by constructing an explanation

Page 21: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 21

KU NLPKU NLP

9.5.3 EBL and Knowledge-Level Learning

Issues in EBL Objection

EBL cannot make the leaner do anything new EBL only learn rules within the deductive closure of its existing theo

ry sole function of training instance is to focus the theorem prover on r

elevant aspects of the problem domain Viewed as a form of speed up learning or knowledge base reformati

on

Responses to this objection Takes information implicit in a set of rules and makes it explicit

E.g.) chess game to focus on techniques for refining incomplete theories

development of heuristics for reasoning with imperfect theories, etc. to focus on integrating EBL and SBL.

EBL refine training data where the theory applies SBL further generalize the partially generalized data

Page 22: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 22

KU NLPKU NLP

9.6 Unsupervised Learning

Supervised vs Unsupervised learning supervised learning

the existence of a teacher, fitness function, some other external meth

od of classifying training instances

unsupervised learning eliminates the teacher

learner form and evaluate concepts on its own

The best example of unsupervised learning is human Propose hypotheses to explain observations

Evaluate their hypotheses using such criteria as simplicity, genera

lity, and elegance

Test hypotheses through experiments of their own design

Page 23: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 23

KU NLPKU NLP

9.6.2 Conceptual Clustering

Given a collection of unclassified objects some means of measuring the similarity of objects

Goal organizing the objects into a classes that meet some standard of

quality, such as maximizing the similarity of objects in a class

Numeric taxonomy The oldest approach to the clustering problem Represent a object as a collection of features (vector of n feature

values) similarity metric : the euclidean distance between objects Build clusters in a bottom-up fashion

Page 24: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 24

KU NLPKU NLP

9.6.2 Conceptual Clustering

Agglomerative Clustering Algorithm step 1

examine all pairs of objects select the pair with highest degree of similarity make the pair a cluster

step 2 define the features of the cluster as some function of the features of the

component members replace the component objects with the cluster definition

step 3 repeat the process on the collection of objects until all objects have been

reduced to a single cluster

The result of the algorithm is a binary tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size

Page 25: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 25

KU NLPKU NLP

Hierarchical Clustering

Agglomerative approach vs. Divisive approach

Step 0 Step 1 Step 2 Step 3 Step 4

b

d

c

e

a a b

d e

c d e

a b c d e

Step 4 Step 3 Step 2 Step 1 Step 0

AgglomerativeAgglomerative(Bottom-Up)(Bottom-Up)

DivisiveDivisive(Top-Down)(Top-Down)

Page 26: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 26

KU NLPKU NLP

Agglomerative Hierarchical Clustering

클러스터간의 유사도를 측정하는 방법 Single-Link

두 클러스터간의 유사도 두 클러스터에서 서로 가장 가까운 두 데이터의 유사도

Complete-Link 두 클러스터간의 유사도 두 클러스터에서 서로 가장 먼 두 데이터의

유사도 Group-Averaging

Single-Link 와 Complete-Link 의 “ Compromise”

Page 27: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 27

KU NLPKU NLP

Single-Link : For the Good Local Coherence!

Agglomerative Hierarchical Clustering

))Cin object ,Cin object (max(),( 2121 simCCsim

Page 28: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 28

KU NLPKU NLP

Agglomerative Hierarchical Clustering

Complete-Link: For the Good Global Cluster

Quality! ))Cin object ,Cin object (min(),( 2121 simCCsim

Page 29: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 29

KU NLPKU NLP

Agglomerative Hierarchical Clustering

Group Averaging Not the maximum similarity of two data from each cluster Not the minimum similarity of two data from each cluster Average value among all the pairs of two data from each

cluster!!

Efficiency? Single-Link Group Averaging < Complete-Link

Page 30: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 30

KU NLPKU NLP

K-means 알고리즘

K-means Clustering

Page 31: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 31

KU NLPKU NLP

K-means 알고리즘 (cont’d)

K-means Clustering

Page 32: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 32

KU NLPKU NLP

K-means 알고리즘 1) 임의로 k 개의 시작점 ( 클러스터 ) 을 구한다2) 각 데이터들에 대해 k 개의 시작점 중 가장 가까운 점에 해당하는

클러스터로 할당한다 .

3) 각 시작점에 할당된 데이터를 이용하여 k 개의 시작점을 다시 구한다 . 만일 시작점에 변화가 없으면 클러스터링을 중지한다 .

4) 2) 번을 수행한다 .

K-means Clustering

Page 33: Ch 9-2.Machine Learning: Symbol-based[new]

Machine Learning 33

KU NLPKU NLP

K-means Clustering

K-means 알고리즘의 특징 빠르고 구현하기 쉽다 . k 개의 점을 반드시 결정해야 한다 . “ 중점”을 구할 수 있는 데이터 형태에만 사용 가능하다 . 부적절한 k 값을 준다면 , 엉뚱한 클러스터들이 만들어지거나

클러스터링이 완료되지 않을 수도 있다 .

k=4 라면 ?