88
Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features (n-D vector in a metric space) Pattern Recognition

Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Embed Size (px)

Citation preview

Page 1: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

• Recognition (classification) = assigning a pattern/object to one of pre-defined classes

• Statictical (feature-based) PR - the pattern is described by features (n-D vector in a metric space)

Pattern Recognition

Page 2: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

• Recognition (classification) = assigning a pattern/object to one of pre-defined classes

• Syntactic (structural) PR - the pattern is described by its structure. Formal language theory (class = language, pattern = word)

Pattern Recognition

Page 3: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

• Supervised PR

– training set available for each class

• Unsupervised PR (clustering)

– training set not available, No. of classes may not be known

Pattern Recognition

Page 4: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

PR system - Training stage

Definition of the

features

Selection of the

training set

Computing the features

of the training set

Classification rule

setup

Page 5: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Desirable properties of the features

• Invariance

• Discriminability

• Robustness

• Efficiency, independence, completeness

Page 6: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Desirable properties of the training set

• It should contain typical representatives of each class including intra-class variations

• Reliable and large enough

• Should be selected by domain experts

Page 7: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Classification rule setup

Equivalent to a partitioning of the feature space

Independent of the particular application

Page 8: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

PR system – Recognition stage

Image acquisition

Preprocessing

Object detection

Computing of

the features

Classification

Class label

Page 9: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

An example – Fish classification

Page 10: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

The features: Length, width, brightness

Page 11: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

2-D feature space

Page 12: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Empirical observation

• For a given training set, we can have

several classifiers (several partitioning

of the feature space)

• The training samples are not always

classified correctly

• We should avoid overtraining of the

classifier

Page 13: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Formal definition of the classifier

• Each class is characterized by its

discriminant function g(x)

• Classification = maximization of g(x)

Assign x to class i iff

• Discriminant functions defines decision

boundaries in the feature space

Page 14: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Minimum distance (NN) classifier

• Discriminant function

g(x) = 1/ dist(x,)

• Various definitions of dist(x

• One-element training set

Page 15: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Voronoi polygons

Page 16: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Minimum distance (NN) classifier

• Discriminant function

g(x) = 1/ dist(x,)

• Various definitions of dist(x • One-element training set Voronoi pol

• NN classifier may not be linear

• NN classifier is sensitive to outliers k-NN classifier

Page 17: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

k-NN classifierFind nearest training points unless k samples

belonging to one class is reached

Page 18: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Discriminant functions g(x) are hyperplanes

Linear classifier

Page 19: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Assumption: feature values are random variables

Statistic classifier, the decission is probabilistic

It is based on the Bayes rule

Bayesian classifier

Page 20: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

The Bayes rule

A posteriori

probability

Class-conditional

probabilityA priori

probability

Total

probability

Page 21: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Bayesian classifier

Main idea: maximize posterior probability

Since it is hard to do directly, we rather maximize

In case of equal priors, we maximize only

Page 22: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Equivalent formulation in terms

of discriminat functions

Page 23: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

How to estimate ?

• Parametric estimate (assuming pdf is

of known form, e.g. Gaussian)

• Non-parametric estimate (pdf is

unknown or too complex)

• From the case studies performed before (OCR, speech recognition)

• From the occurence in the training set

• Assumption of equal priors

Page 24: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Parametric estimate of Gaussian

Page 25: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

d-dimensional Gaussian pdf

Page 26: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

The role of covariance matrix

Page 27: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Two-class Gaussian case in 2D

Classification = comparison of two Gaussians

Page 28: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Two-class Gaussian case – Equal cov. mat.

Linear decision boundary

Page 29: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Equal priors

Classification by minimum Mahalanobis distance

If the cov. mat. is diagonal with equal variances

then we get “standard” minimum distance rule

max

min

Page 30: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Non-equal priors

Linear decision boundary still preserved

Page 31: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

General G case

in 2D

Decision boundary is a hyperquadric

Page 32: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

General G case

in 3D

Decision boundary is a hyperquadric

Page 33: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

More classes, Gaussian case in 2D

Page 34: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

What to do if the classes are not normally

distributed?

• Gaussian mixtures

• Parametric estimation of some other pdf

• Non-parametric estimation

Page 35: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Non-parametric estimation – Parzen window

Page 36: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

The role of the window size

• Small window overtaining

• Large window data smoothing

• Continuous data the size does not matter

Page 37: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

n = 1

n = 10

n = 100

n = ∞

Page 38: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

The role of the window size

Small window Large window

Page 39: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Applications of Bayesian classifier

in multispectral remote sensing

• Objects = pixels

• Features = pixel values in the spectral bands

(from 4 to several hundreds)

• Training set – selected manually by means of thematic maps (GIS), and on-site observation

• Number of classes – typicaly from 2 to 16

Page 40: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Satellite MS image

Page 41: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Other classification methods in RS

• Context-based classifiers

• Shape and textural features

• Post-classification filtering

• Spectral pixel unmixing

Page 42: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Typically for “YES – NO” features

Feature metric is not explicitely defined

Decision trees

Non-metric classifiers

Page 43: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

General decision tree

Page 44: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Binary decision tree

Any decision tree can be replaced by a binary tree

Page 45: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Real-valued features

• Node decisions are in form of inequalities

• Training = setting their parameters

• Simple inequalities stepwise decision

boundary

Page 46: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Stepwise decision boundary

Page 47: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Real-valued features

The tree structure and the form of inequalities

influence both performance and speed.

Page 48: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features
Page 49: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features
Page 50: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

• How to evaluate the performance of the

classifiers?

- evaluation on the training set (optimistic error estimate)

- evaluation on the test set (pesimistic

error estimate)

Classification performance

Page 51: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

• How to increase the performance?

- other features

- more features (dangerous – curse of dimensionality!)

- other (larger, better) traning sets

- other parametric model

- other classifier

- combining different classifiers

Classification performance

Page 52: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Combining (fusing) classifiers

C feature vectors

Bayes rule: max

max

Several possibilities how to do that

Page 53: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Product rule

Assumption: Conditional independence of xj

max

Page 54: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Max-max and max-min rules

Assumption: Equal priors

max (max )

max (min )

Page 55: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Majority vote

• The most straightforward fusion method

• Can be used for all types of classifiers

Page 56: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Training set is not available, No. of classes may not be a priori known

Unsupervised Classification

(Cluster analysis)

Page 57: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

What are clusters?

Intuitive meaning

- compact, well-separated subsets

Formal definition

- any partition of the data into disjoint subsets

Page 58: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

What are clusters?

Page 59: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

How to compare different clusterings?

Variance measure J should be minimized

Drawback – only clusterings with the same

N can be compared. Global minimum

J = 0 is reached in the degenerated case.

Page 60: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Clustering techniques

• Iterative methods

- typically if N is given

• Hierarchical methods

- typically if N is unknown

• Other methods

- sequential, graph-based, branch & bound,

fuzzy, genetic, etc.

Page 61: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Sequential clustering

• N may be unknown

• Very fast but not very good

• Each point is considered only once

Idea: a new point is either added to an existing cluster or it forms a new cluster. The decision is based on the user-defined distance threshold.

Page 62: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Sequential clustering

Drawbacks:

- Dependence on the distance threshold

- Dependence on the order of data points

Page 63: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Iterative clustering methods

• N-means clustering

• Iterative minimization of J

• ISODATA

Iterative Self-Organizing DATa Analysis

Page 64: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

N-means clustering

1. Select N initial cluster centroids.

Page 65: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

N-means clustering

2. Classify every point x according to minimum distance.

Page 66: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

N-means clustering

3. Recalculate the cluster centroids.

Page 67: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

N-means clustering

4. If the centroids did not change then STOP else GOTO 2.

Page 68: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

N-means clustering

Drawbacks

- The result depends on the initialization.

- J is not minimized

- The results are sometimes “intuitively wrong”.

Page 69: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

N-means clustering – An example

Two features, four points, two clusters (N = 2)

Different initializations different clusterings

Page 70: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Initial centroids

N-means clustering – An example

Page 71: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Initial centroids

N-means clustering – An example

Page 72: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Iterative minimization of J

1. Let’s have an initial clustering (by N-means)

2. For every point x do the following:

Move x from its current cluster to another cluster, such that the decrease of J is maximized.

3. If all data points do not move, then STOP.

Page 73: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Example of “wrong” result

Page 74: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

ISODATA

Iterative clustering, N may vary.

Sophisticated method, a part of many statistical software systems.

Postprocessing after each iteration

- Clusters with few elements are cancelled

- Clusters with big variance are divided

- Other merging and splitting strategies can be implemented

Page 75: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Hierarchical clustering methods

• Agglomerative clustering

• Divisive clustering

Page 76: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Basic agglomerative clustering

1. Each point = one cluster

Page 77: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Basic agglomerative clustering

1. Each point = one cluster

2. Find two “nearest” or “most similar” clusters and merge them together

Page 78: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Basic agglomerative clustering

1. Each point = one cluster

2. Find two “nearest” or “most similar” clusters and merge them together

3. Repeat 2 until the stop constraint is reached

Page 79: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Basic agglomerative clustering

Particular implementations of this method differ from each other by

- The STOP constraints

- The distance/similarity measures used

Page 80: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Simple between-cluster distance measures

d(A,B) = d(m1,m2)

d(A,B) = min d(a,b) d(A,B) = max d(a,b)

Page 81: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Other between-cluster distance measures

d(A,B) = Hausdorf distance H(A,B)

d(A,B) = J(AUB) – J(A,B)

Page 82: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Agglomerative clustering – representation

by a clustering tree (dendrogram)

Page 83: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

How many clusters are there?

2 or 4 ?

Clustering is a very subjective task

Page 84: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

How many clusters are there?

• Difficult to answer even for humans

• “Clustering tendency”

• Hierarchical methods – N can be estimated from the complete dendrogram

• The methods minimizing a cost function –

N can be estimated from “knees” in J-N graph

Page 85: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Life time of the clusters

Optimal number of clusters = 4

Page 86: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Optimal number of clusters

Page 87: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

Applications of clustering in image proc.

• Segmentation – clustering in color space

• Preliminary classification of multispectral

images

• Clustering in parametric space – RANSAC,

image registration and matching

Numerous applications are outside image processing area

Page 88: Recognition (classification) = assigning a pattern/object to one of pre-defined classes Statictical (feature-based) PR - the pattern is described by features

References

Duda, Hart, Stork: Pattern Clasification,

2nd ed., Wiley Interscience, 2001

Theodoridis, Koutrombas: Pattern Recognition, 2nd ed., Elsevier, 2003