INTRODUCTION TO Machine Learning Adapted from: ETHEM ALPAYDIN Lecture Slides for

INTRODUCTION TO Machine LearningAdapted from: ETHEM ALPAYDIN

Lecture Slides for

CHAPTER 1: Introduction

3

Why “Learn” ?Why “Learn” ?

Machine learning is programming computers to optimize a performance criterion using example data or past experience.

There is no need to “learn” to calculate payroll Learning is used when:

Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech

recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user

biometrics)

4

What We Talk About When We Talk About“Learning”What We Talk About When We Talk About“Learning” Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts);

knowledge is expensive and scarce. Example in retail: Customer transactions to consumer

behavior: People who bought “Introduction to Machine Learning (Adaptive Computation and Machine Learning)” also bought “Data Mining: Practical Machine Learning Tools and Techniques

” (www.amazon.com) Build a model that is a good and useful approximation to

the data.

5

Data MiningData Mining

Retail: Market basket analysis, Customer relationship management (CRM)

Finance: Credit scoring, fraud detection Manufacturing: Optimization, troubleshooting Medicine: Medical diagnosis Telecommunications: Quality of service

optimization Bioinformatics: Motifs, alignment Web mining: Search engines ...

6

What is Machine Learning?What is Machine Learning?

Optimize a performance criterion using example data or past experience.

Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to

Solve the optimization problem Representing and evaluating the model for

inference

7

What is Machine Learning? An AbstractWhat is Machine Learning? An Abstract

LearningMachine

(Algorithm)

TRAININGDATA Answer

TrainedMachine

Query

8

Some Learning MachinesSome Learning Machines

Linear models Kernel methods Neural networks Markov Model Decision trees …

9

Some Area of Applications Some Area of Applications

inputs

Trai

ning

exa

mpl

es

10

102

103

104

105

Bioinformatics

Ecology

OCRHWR

MarketAnalysis

TextCategorization

Machine Vision

Syst

em d

iagn

osis

10 102 103 104 105

10

Bioinformatics: Biomedical / BiometricsBioinformatics: Biomedical / Biometrics

Medicine: Screening Diagnosis and prognosis Drug discovery

Security: Face recognition Signature / fingerprint / iris

verification DNA fingerprinting

6

11

Computer / InternetComputer / Internet

Computer interfaces: Troubleshooting wizards Handwriting and speech Brain waves

Internet Hit ranking Spam filtering Text categorization Text translation Recommendation

7

Application Types

13

Application TypesApplication Types

Learning Association Supervised Learning

Classification Regression

Unsupervised Learning Reinforcement Learning (unsupervised)

14

Learning AssociationsLearning Associations

Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.

Example: P ( pen | note book ) = 0.7

15





16

ClassificationClassification

Example: Credit scoring

Differentiating between low-risk and high-risk customers from their income and savings

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk

17

Classification: ApplicationsClassification: Applications

Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style

Character recognition: Different handwriting styles.

Speech recognition: Temporal dependency. Use of a dictionary or the syntax of the language. Sensor fusion: Combine multiple modalities; eg, visual

(lip image) and acoustic for speech Medical diagnosis: From symptoms to illnesses ...

18

Face RecognitionFace Recognition

Training examples of a person

Test images

AT&T Laboratories, Cambridge UKhttp://www.uk.research.att.com/facedatabase.html

19

Example 1Example 1

“Sorting incoming Fish on a conveyor according to species using optical sensing”

Sea bassSpecies

Salmon

20

Problem Analysis

Set up a camera and take some sample images to extract features:

Length Lightness Width Number and shape of fins Position of the mouth, etc…

This is the set of all suggested features to explore for use in our classifier!

21

Preprocessing

Use a segmentation operation to isolate fishes from one another and from the background.

Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features.

The features are passed to a classifier.

22

23

Classification Select the length of the fish as a possible feature for

discrimination

The length is a poor feature alone!Select the lightness as a possible

feature.

24

Classification Select the lightness of the fish as a possible feature for

discrimination

Move decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!)

Task of decision theory

25

Adopt the lightness and add the width of the fish

Fish xT = [x1, x2 ]

lightness width (length)

We might add other features that are not correlated with the ones we already have.

26

Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

27

However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input

Issue of generalization!

However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input

Issue of generalization!

28

Example: Particle ClassificationExample: Particle Classification

Particles on an air filter

P1:

P2:

P3:

29

Sample data setSample data set

Area Perimeter Class

3 6 P15 7 P14 4 P17 6 P1

12 11 P215 10 P214 12 P217 13 P214 19 P313 20 P315 22 P312 18 P3… … …

Features Class

30

Histogram AnalysisHistogram Analysis

P1

P2

P1: P2: P3:

Area

Num

ber

P2

Area

Num

ber

P3

31

Histogram AnalysisHistogram Analysis

P2

P3

P1: P2: P3:

Perimeter

Num

ber

Area

Peri

mete

r

P1

P2

P3

32

Input Space PartitioningInput Space Partitioning

P1: P2: P3:

Area

Peri

mete

r P1

P2

P3

Area

Peri

mete

r P1

P2

P3

33

Classification SystemsClassification Systems

Sensing

Use of a transducer (camera or microphone) PR system depends of the bandwidth, the resolution

sensitivity distortion of the transducer

Segmentation and grouping

Patterns should be well separated and should not overlap

34

35

Feature extraction Discriminative features Invariant features with respect to translation, rotation

and scale.

Classification Use a feature vector provided by a feature extractor to

assign the object to a category

Post Processing Exploit context input dependent information other

than from the target pattern itself to improve performance

36

The Design CycleThe Design Cycle

Data collection Feature Choice Model Choice Training Evaluation Computational Complexity

37

38

Data Collection How do we know when we have collected an

adequately large and representative set of examples for training and testing the system?

Feature Choice Depends on the characteristics of the problem

domain. Simple to extract, invariant to irrelevant transformation insensitive to noise.

Model Choice Unsatisfied with the performance of our fish

classifier and want to jump to another class of model.

39

Training Use data to determine the classifier. Many different

procedures for training classifiers and choosing models

Evaluation Measure the error rate (or performance and

switch from one set of features to another one

40





41

RegressionRegression

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

0 10 200

20

40

[start Matlab demo lecture2.m]

Given examples

Predict given a new point

42

010

2030

40

0

10

20

30

20

22

24

26

Tem

pera

ture

Linear regression

PredictionPrediction

0 200

20

40

43

Regression - Example

Price of a used car x : car attributes

y : pricey = g (x | θ)

g ( ) : model,θ : parameters (car

type or model or options …

y = wx+w0

44

Some Regression Applications

Navigating a car: Angle of the steering wheel Kinematics of a robot arm

α1= g1(x,y)

α2= g2(x,y)

α1

α2

(x,y)

Response surface design

45

Response Surface Design

Response Surface Design is useful for the modeling and analysis of programs in which a response of interest is influenced by several variables and the objective is to optimize this response.

For example: Find the levels of temperature (x1) and pressure (x2) to maximize the yield (y) of a process. ),( 21 xxfy

46





47

Unsupervised LearningUnsupervised Learning

Learning “what normally happens” No output Clustering: Grouping similar instances Example applications

Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs

A motif is a pattern common to a set of nucleic or amino acid subsequences which share some biological property of interest (such as being DNA binding sites for a regulatory protein).

48





49

Reinforcement LearningReinforcement Learning

Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability, ...

50

Reinforcement Learning (RL)Reinforcement Learning (RL) In RL problems, an agent interacts with an

environment and iteratively learns a policy, i.e., it maps actions to states to maximize a reward signal.

Agent

Environment

action at

state st+1

reward rt+1

Update V(st) or Q(st,at)

51

Learning Autonomous AgentLearning Autonomous Agent

Environment

ActionsPerceptions

52


Environment

ActionsPerceptions

DelayedReward

53


Environment

ActionsPerceptions

Newbehavior

Delayed RewardThe Reward won’t be given immediately after agent’s action.

Usually, it will be given only after achieving the goal.

This delayed reward is the only clue to agent’s learning.

54

Resources: Datasets

UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html

UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html

Statlib: http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/

55

Resources: Journals

Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine

Intelligence Annals of Statistics Journal of the American Statistical Association ...

56

Resources: Conferences

International Conference on Machine Learning (ICML) ICML05: http://icml.ais.fraunhofer.de/

European Conference on Machine Learning (ECML) ECML05: http://ecmlpkdd05.liacc.up.pt/

Neural Information Processing Systems (NIPS) NIPS05: http://nips.cc/

Uncertainty in Artificial Intelligence (UAI) UAI05: http://www.cs.toronto.edu/uai2005/

Computational Learning Theory (COLT) COLT05: http://learningtheory.org/colt2005/

International Joint Conference on Artificial Intelligence (IJCAI) IJCAI05: http://ijcai05.csd.abdn.ac.uk/

International Conference on Neural Networks (Europe) ICANN05: http://www.ibspan.waw.pl/ICANN-2005/

...

Documents

INTRODUCTION TO Machine Learning Adapted from: ETHEM ALPAYDIN Lecture Slides for