Upload
beverly-townsend
View
298
Download
2
Tags:
Embed Size (px)
Citation preview
INTRODUCTION TO Machine LearningAdapted from: ETHEM ALPAYDIN
Lecture Slides for
CHAPTER 1: Introduction
3
Why “Learn” ?Why “Learn” ?
Machine learning is programming computers to optimize a performance criterion using example data or past experience.
There is no need to “learn” to calculate payroll Learning is used when:
Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech
recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user
biometrics)
4
What We Talk About When We Talk About“Learning”What We Talk About When We Talk About“Learning” Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce. Example in retail: Customer transactions to consumer
behavior: People who bought “Introduction to Machine Learning (Adaptive Computation and Machine Learning)” also bought “Data Mining: Practical Machine Learning Tools and Techniques
” (www.amazon.com) Build a model that is a good and useful approximation to
the data.
5
Data MiningData Mining
Retail: Market basket analysis, Customer relationship management (CRM)
Finance: Credit scoring, fraud detection Manufacturing: Optimization, troubleshooting Medicine: Medical diagnosis Telecommunications: Quality of service
optimization Bioinformatics: Motifs, alignment Web mining: Search engines ...
6
What is Machine Learning?What is Machine Learning?
Optimize a performance criterion using example data or past experience.
Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to
Solve the optimization problem Representing and evaluating the model for
inference
7
What is Machine Learning? An AbstractWhat is Machine Learning? An Abstract
LearningMachine
(Algorithm)
TRAININGDATA Answer
TrainedMachine
Query
8
Some Learning MachinesSome Learning Machines
Linear models Kernel methods Neural networks Markov Model Decision trees …
9
Some Area of Applications Some Area of Applications
inputs
Trai
ning
exa
mpl
es
10
102
103
104
105
Bioinformatics
Ecology
OCRHWR
MarketAnalysis
TextCategorization
Machine Vision
Syst
em d
iagn
osis
10 102 103 104 105
10
Bioinformatics: Biomedical / BiometricsBioinformatics: Biomedical / Biometrics
Medicine: Screening Diagnosis and prognosis Drug discovery
Security: Face recognition Signature / fingerprint / iris
verification DNA fingerprinting
6
11
Computer / InternetComputer / Internet
Computer interfaces: Troubleshooting wizards Handwriting and speech Brain waves
Internet Hit ranking Spam filtering Text categorization Text translation Recommendation
7
Application Types
13
Application TypesApplication Types
Learning Association Supervised Learning
Classification Regression
Unsupervised Learning Reinforcement Learning (unsupervised)
14
Learning AssociationsLearning Associations
Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services.
Example: P ( pen | note book ) = 0.7
15
Application TypesApplication Types
Learning Association Supervised Learning
Classification Regression
Unsupervised Learning Reinforcement Learning (unsupervised)
16
ClassificationClassification
Example: Credit scoring
Differentiating between low-risk and high-risk customers from their income and savings
Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk
17
Classification: ApplicationsClassification: Applications
Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style
Character recognition: Different handwriting styles.
Speech recognition: Temporal dependency. Use of a dictionary or the syntax of the language. Sensor fusion: Combine multiple modalities; eg, visual
(lip image) and acoustic for speech Medical diagnosis: From symptoms to illnesses ...
18
Face RecognitionFace Recognition
Training examples of a person
Test images
AT&T Laboratories, Cambridge UKhttp://www.uk.research.att.com/facedatabase.html
19
Example 1Example 1
“Sorting incoming Fish on a conveyor according to species using optical sensing”
Sea bassSpecies
Salmon
20
Problem Analysis
Set up a camera and take some sample images to extract features:
Length Lightness Width Number and shape of fins Position of the mouth, etc…
This is the set of all suggested features to explore for use in our classifier!
21
Preprocessing
Use a segmentation operation to isolate fishes from one another and from the background.
Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features.
The features are passed to a classifier.
22
23
Classification Select the length of the fish as a possible feature for
discrimination
The length is a poor feature alone!Select the lightness as a possible
feature.
24
Classification Select the lightness of the fish as a possible feature for
discrimination
Move decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!)
Task of decision theory
25
Adopt the lightness and add the width of the fish
Fish xT = [x1, x2 ]
lightness width (length)
We might add other features that are not correlated with the ones we already have.
26
Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:
27
However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input
Issue of generalization!
However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input
Issue of generalization!
28
Example: Particle ClassificationExample: Particle Classification
Particles on an air filter
P1:
P2:
P3:
29
Sample data setSample data set
Area Perimeter Class
3 6 P15 7 P14 4 P17 6 P1
12 11 P215 10 P214 12 P217 13 P214 19 P313 20 P315 22 P312 18 P3… … …
Features Class
30
Histogram AnalysisHistogram Analysis
P1
P2
P1: P2: P3:
Area
Num
ber
P2
Area
Num
ber
P3
31
Histogram AnalysisHistogram Analysis
P2
P3
P1: P2: P3:
Perimeter
Num
ber
Area
Peri
mete
r
P1
P2
P3
32
Input Space PartitioningInput Space Partitioning
P1: P2: P3:
Area
Peri
mete
r P1
P2
P3
Area
Peri
mete
r P1
P2
P3
33
Classification SystemsClassification Systems
Sensing
Use of a transducer (camera or microphone) PR system depends of the bandwidth, the resolution
sensitivity distortion of the transducer
Segmentation and grouping
Patterns should be well separated and should not overlap
34
35
Feature extraction Discriminative features Invariant features with respect to translation, rotation
and scale.
Classification Use a feature vector provided by a feature extractor to
assign the object to a category
Post Processing Exploit context input dependent information other
than from the target pattern itself to improve performance
36
The Design CycleThe Design Cycle
Data collection Feature Choice Model Choice Training Evaluation Computational Complexity
37
38
Data Collection How do we know when we have collected an
adequately large and representative set of examples for training and testing the system?
Feature Choice Depends on the characteristics of the problem
domain. Simple to extract, invariant to irrelevant transformation insensitive to noise.
Model Choice Unsatisfied with the performance of our fish
classifier and want to jump to another class of model.
39
Training Use data to determine the classifier. Many different
procedures for training classifiers and choosing models
Evaluation Measure the error rate (or performance and
switch from one set of features to another one
40
Application TypesApplication Types
Learning Association Supervised Learning
Classification Regression
Unsupervised Learning Reinforcement Learning (unsupervised)
41
RegressionRegression
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
0 10 200
20
40
[start Matlab demo lecture2.m]
Given examples
Predict given a new point
42
010
2030
40
0
10
20
30
20
22
24
26
Tem
pera
ture
Linear regression
PredictionPrediction
0 200
20
40
43
Regression - Example
Price of a used car x : car attributes
y : pricey = g (x | θ)
g ( ) : model,θ : parameters (car
type or model or options …
y = wx+w0
44
Some Regression Applications
Navigating a car: Angle of the steering wheel Kinematics of a robot arm
α1= g1(x,y)
α2= g2(x,y)
α1
α2
(x,y)
Response surface design
45
Response Surface Design
Response Surface Design is useful for the modeling and analysis of programs in which a response of interest is influenced by several variables and the objective is to optimize this response.
For example: Find the levels of temperature (x1) and pressure (x2) to maximize the yield (y) of a process. ),( 21 xxfy
46
Application TypesApplication Types
Learning Association Supervised Learning
Classification Regression
Unsupervised Learning Reinforcement Learning (unsupervised)
47
Unsupervised LearningUnsupervised Learning
Learning “what normally happens” No output Clustering: Grouping similar instances Example applications
Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs
A motif is a pattern common to a set of nucleic or amino acid subsequences which share some biological property of interest (such as being DNA binding sites for a regulatory protein).
48
Application TypesApplication Types
Learning Association Supervised Learning
Classification Regression
Unsupervised Learning Reinforcement Learning (unsupervised)
49
Reinforcement LearningReinforcement Learning
Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability, ...
50
Reinforcement Learning (RL)Reinforcement Learning (RL) In RL problems, an agent interacts with an
environment and iteratively learns a policy, i.e., it maps actions to states to maximize a reward signal.
Agent
Environment
action at
state st+1
reward rt+1
Update V(st) or Q(st,at)
51
Learning Autonomous AgentLearning Autonomous Agent
Environment
ActionsPerceptions
52
Learning Autonomous AgentLearning Autonomous Agent
Environment
ActionsPerceptions
DelayedReward
53
Learning Autonomous AgentLearning Autonomous Agent
Environment
ActionsPerceptions
Newbehavior
Delayed RewardThe Reward won’t be given immediately after agent’s action.
Usually, it will be given only after achieving the goal.
This delayed reward is the only clue to agent’s learning.
54
Resources: Datasets
UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html
Statlib: http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/
55
Resources: Journals
Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine
Intelligence Annals of Statistics Journal of the American Statistical Association ...
56
Resources: Conferences
International Conference on Machine Learning (ICML) ICML05: http://icml.ais.fraunhofer.de/
European Conference on Machine Learning (ECML) ECML05: http://ecmlpkdd05.liacc.up.pt/
Neural Information Processing Systems (NIPS) NIPS05: http://nips.cc/
Uncertainty in Artificial Intelligence (UAI) UAI05: http://www.cs.toronto.edu/uai2005/
Computational Learning Theory (COLT) COLT05: http://learningtheory.org/colt2005/
International Joint Conference on Artificial Intelligence (IJCAI) IJCAI05: http://ijcai05.csd.abdn.ac.uk/
International Conference on Neural Networks (Europe) ICANN05: http://www.ibspan.waw.pl/ICANN-2005/
...