44
Pattern Recognition K-Nearest Neighbor Explained By Arthur Evans John Sikorski Patricia Thomas

Pattern Recognition

Embed Size (px)

DESCRIPTION

Pattern Recognition. K-Nearest Neighbor Explained By Arthur Evans John Sikorski Patricia Thomas. Overview. Pattern Recognition, Machine Learning, Data Mining: How do they fit together? Example Techniques K-Nearest Neighbor Explained. Data Mining. - PowerPoint PPT Presentation

Citation preview

Page 1: Pattern Recognition

Pattern Recognition

K-Nearest Neighbor ExplainedBy

Arthur EvansJohn Sikorski

Patricia Thomas

Page 2: Pattern Recognition

Overview Pattern Recognition, Machine

Learning, Data Mining: How do they fit together?

Example Techniques

K-Nearest Neighbor Explained

Page 3: Pattern Recognition

Data Mining Searching through electronically

stored data in an automatic way Solving problems with already

known data Essentially, discovering patterns in

data Has several subsets from statistics

to machine learning

Page 4: Pattern Recognition

Machine Learning Construct computer programs that

improve with use A methodology Draws from many fields: Statistics,

Information Theory, Biology, Philosophy, Computer Science . . .

Several sub-disciplines: Feature Extraction, Pattern Recognition

Page 5: Pattern Recognition

Pattern recognition Operation and design of systems

that detect patterns in data The algorithmic process Applications include image

analysis, character recognition, speech analysis, and machine diagnostics.

Page 6: Pattern Recognition

Pattern Recognition Process

Gather data Determine features to use Extract features Train your recognition engine Classify new instances

Page 7: Pattern Recognition

Artificial Neural Networks

A type of artificial intelligence that attempts to imitate the way a human brain works.

Creates connections between processing elements, computer equivalent of neurons

Supervised technique

Page 8: Pattern Recognition

ANN continued Tolerant of errors in data

Many applications: speech recognition, analyze visual scenes, robot control

Best at interpreting complex real world sensor data.

Page 9: Pattern Recognition

The Brain Human brain has about 1011 neurons Each connect to about 104 other neurons Switch in about 10-3 seconds

Slow compared to a computers at 10-13

seconds Brain recognizes a familiar face in about 10-1

seconds Only 200-300 cycles at its switch rate

Brain utilizes MASSIVE parallel processing, considers many factors at once.

Page 10: Pattern Recognition

Neural Network Diagram

Page 11: Pattern Recognition

Stuttgart Neural Network Simulator

Page 12: Pattern Recognition

Bayesian Theory Deals with statistical probabilities

One of the best for classifying text

Require prior knowledge about the expected probabilities

Page 13: Pattern Recognition

Conveyor Belt Example Want to sort apples and orange on

conveyor belt. Notice 80% are orange, therefore

80% are oranges. Bayesian theory says Decide worg if P(worg|x) > P(wapp|x);

otherwise decide wapp

Page 14: Pattern Recognition

Clustering A process of partitioning data into

meaningful sub-classes (clusters). Most techniques are unsupervised. Two main categories:

Hierarchical: Nested classes displayed as a dendrogram

Non-Hierarchical: Each class in one and only one cluster – not related.

Page 15: Pattern Recognition

Phylogenetic Tree - Hierarchical

Rattus norvegicus

Mus musculus

Homo sapiens

Equus caballus

Gallus gallus

Oryctolagus cuniculus

Macaca mulatta

Ciliary Neurotrophic Factor

Page 16: Pattern Recognition

Non-Hierarchical

Page 17: Pattern Recognition

K-Means Method Initial cluster seeds Initial cluster

boundaries

Page 18: Pattern Recognition

After one iteration. New cluster

assignments

Page 19: Pattern Recognition

Decision Tree A flow-chart-like tree structure Internal node denotes a test on an

attribute (feature) Branch represents an outcome of the

test All records in a branch have the same

value for the tested attribute Leaf node represents class label or class

label distribution

Page 20: Pattern Recognition

Example of Decision Treeoutlook

humidity windyP

P N PN

sunny overcast rain

high normal true false

Decision Tree forecast for playing golf

Page 21: Pattern Recognition

Instance Based Learning Training consists of simply storing

data No generalizations are made All calculations occur at

classification Referred to as “lazy learning”

Can be very accurate, but computationally expensive

Page 22: Pattern Recognition

Instance Based Methods Locally weighted regression

Case based reasoning

Nearest Neighbor

Page 23: Pattern Recognition

Advantages Training stage is trivial therefore it

is easily adaptable to new instances Very accurate Different “features” may be used

for each classification Able to model complex data with

less complex approximations

Page 24: Pattern Recognition

Difficulties All processing done at query time:

computationally expensive Determining appropriate distance

metric for retrieving related instances

Irrelevant features may have a negative impact

Page 25: Pattern Recognition

Case Based Reasoning Does not use Euclidean space Represented as complex logical

descriptions Examples

Retrieve help desk information Legal reasoning Conceptual design of mechanical

devices

Page 26: Pattern Recognition

Case Based Process Based on idea that current

problems similar to past problems Apply matching algorithms to past

problem-solution pairs

Page 27: Pattern Recognition

Nearest Neighbor Assumes all instances correspond

to points in n-dimensional space

Nearest neighbor defined as instance closest in Euclidean space

(Ax-Bx)2+(Ay-By)2…D=

Page 28: Pattern Recognition

Feature Extraction Features: unique characteristics

that define an object Features used depend on the

problem you are trying to solve Developing a good feature set is

more art than science

Page 29: Pattern Recognition

Sample Case – Identify Flower Species

Consider two features: Petal count: range from 3-15 Color: range from 0-255

Assumptions: No two species have exactly the same

color Multiple species have same petal

count

Page 30: Pattern Recognition

Graph of Instances

Species A

Species B

Query

Petal count

Colo

r

Page 31: Pattern Recognition

Calculate Distances

Petal count

Colo

r

Species A

Species B

Query

Page 32: Pattern Recognition

Species is Closest Neighbor

Petal count

Colo

r

Species A

Species B

Query

Nearest Neighbor

Page 33: Pattern Recognition

Problems

Data range for each feature is different

Noisy data may lead to wrong conclusion

One attribute may hold more importance

Page 34: Pattern Recognition

Without Normalization255

Petal count

Colo

r

0

3 15

Page 35: Pattern Recognition

Normalized

Normalize by subtracting smallest value from all then divide by largest

All values range from 0-1 Petal

count

Colo

r0

0

1

1

Page 36: Pattern Recognition

Noise Strategies Take an average of the k closest

instances K-nearest neighbor

Prune noisy instances

Page 37: Pattern Recognition

K-Nearest Neighbors

Petal count

Colo

r

K = 5

Identify as majority of k nearest neighbors

Species A

Species B

Query

Page 38: Pattern Recognition

Prune “Noisy” Instances Keep track of how often an

instance correctly predicts a new instance

When the value drops below a certain threshold, remove it from the graph

Page 39: Pattern Recognition

“Pruned” Graph

Petal count

Colo

r

Species A

Species B

Query

Page 40: Pattern Recognition

Avoid Over Fitting - Occams Razor

A: Poor but simple

B: Good but less simple

C: Excellent but too data specific

Page 41: Pattern Recognition

Weights

Weights are added to features more significant than others in producing accurate predictions. Multiply the feature value by the weight.

Petal Count

Colo

r

0

0 1

2

Page 42: Pattern Recognition

Validation Used to calculate error rates and

overall accuracy of recognition engine Leave One out: Use n-1 instances in

classifier, test, repeat n times. Holdout: Divide data into n groups,

use n-1 groups in classifier, repeat n times

Bootstrapping: Test with a randomly sampled subset of instances.

Page 43: Pattern Recognition

Potential Pattern Recognition Problems

Are there adequate features to distinguish different classes.

Are the features highly correlated.

Are there distinct subclasses in the data.

Is the feature space too complex.

Page 44: Pattern Recognition