View
217
Download
2
Embed Size (px)
Citation preview
1 Supervised Learning
Supervised LearningUwe Lämmel
Business SchoolInstitute of
Business Informatics
www.wi.hs-wismar.de/~laemmel
2 Supervised Learning
Neural Networks
– Idea
– Artificial Neuron & Network
– Supervised Learning
– Unsupervised Learning
– Data Mining – other Techniques
3 Supervised Learning
Supervised Learning
Feed-Forward Networks– Perceptron – AdaLinE – LTU– Multi-Layer networks– Backpropagation Algorithm– Pattern recognition– Data preparation
Examples– Bank Customer– Customer Relationship
4 Supervised Learning
Connections
– Feed-forward– Input layer– Hidden layer– Output layer
Hopfield network
– Feed-back / auto-associative – From (output) layer back to
previous (hidden/input) layer– All neurons fully connected to
each other
5 Supervised Learning
Perceptron – Adaline – TLU
– One layer of trainable links only– Adaptive linear element– Threshold Linear Unit
– class of neural network of a special architecture:
...
6 Supervised Learning
Papert, Minsky and Perceptron - History
"Once upon a time two daughter sciences were born to the new science of cybernetics. One sister was natural, with features inherited from the study of the brain, from the way nature does things.
The other was artificial, related from the beginning to the use of computers. …
But Snow White was not dead.
What Minsky and Papert had shown the world as proof was not the heart of the princess; it was the heart of a pig."
Seymour Papert, 1988
7 Supervised Learning
Perception
Perceptionfirst step of recognition
becoming aware of something via the senses
mapping layer
picture
fixed 1-1- links
trainable, fully connected
output-layer
8 Supervised Learning
Perceptron
– Input layer – binary input, passed trough, – no trainable links
– Propagation functionnetj = oiwij
– Activation function
oj = aj = 1 if netj j , 0 otherwise
A perceptron can learn all the functions, that can be represented, in a finite time .
(perceptron convergence theorem, F. Rosenblatt)
9 Supervised Learning
Linear separable
Neuron j should be 0,iff both neurons 1 and 2 have the same value (o1=o2), otherwise 1:
netj = o1w1j + o2w2j
0 w1j + 0w2j < j
0 w1j + 1w2j j
1 w1j + 0w2j j
1 w1j + 1w2j < j
j
1 2
? w1j w2j
j
10 Supervised Learning
Linearseparable
– netj = o1w1j + o2w2j
line in a 2-dim. space– line divides plane so,
that (0,1) and (1,0) are in different sub planes. – the network can not solve the problem.
– a perceptron can represent only some functions
a neural network representing the XOR-function needs hidden neurons
1
1 (1,1)
(0,0) o1
o2
o1*w1 +o2*w2=q
11 Supervised Learning
Learning is easy
while input pattern do beginnext input patter calculate output for each j in OutputNeurons doif ojtj thenif oj=0 then {output=0, but 1 expected }for each i in InputNeurons do wij:=wij+oi else if oj=1 then {output=1, but 0 expected }for each i in InputNeurons do wij:=wij-oi ;
end repeat until desired behaviour
12 Supervised Learning
Exercise
– Decoding– input: binary code of a digit– output - unary representation:
as many digits 1, as the digit represents: 5 : 1 1 1 1 1
– architecture:
13 Supervised Learning
Exercise
– Decoding– input: Binary code of a digit– output: classification:
0~ 1st Neuron, 1~ 2nd Neuron, ... 5~ 6th Neuron, ...
– architecture:
14 Supervised Learning
Exercises
1. Look at the EXCEL-file of the decoding problem
2. Implement (in PASCAL/Java) a 4-10-Perceptron which transforms a binary representation of a digit (0..9) into a decimal number. Implement the learning algorithm and train the network.
3. Which task can be learned faster?(Unary representation or classification)
15 Supervised Learning
Exercises
5. Develop a perceptron for the recognition of digits 0..9. (pixel representation)input layer: 3x7-input neuronsUse the SNNS or JavaNNS
6. Can we recognize numbers greater than 9 as well?
7. Develop a perceptron for the recognition of capital letters. (input layer 5x7)
16 Supervised Learning
multi-layer Perceptron
– several trainable layers– a two layer perceptron can classify convex
polygons – a three layer perceptron can classify any sets
Cancels the limits of a perceptron
multi layer perceptron = feed-forward network
= backpropagation network
17 Supervised Learning
Multi-layer feed-forward network
18 Supervised Learning
Feed-Forward Network
19 Supervised Learning
Evaluation of the net output in a feed forward network
Ni
Nj
Nk
netj
netk
Oj=actj
Ok=act
k
Trai
ning
pa
ttern
p
Oi=pi
Input-Layer hidden Layer(s) Output Layer
20 Supervised Learning
Backpropagation-Learning Algorithm
– supervised Learning
– error is a function of the weights wi :E(W) = E(w1,w2, ... , wn)
– We are looking for a minimal error– minimal error = hollow in the error surface– Backpropagation uses the gradient
for weight adaptation
21 Supervised Learning
error curve
weight1weight2
22 Supervised Learning
Problem
– error in output layer:– difference output – teaching output
– error in a hidden layer?
output teaching output
input layer
hidden layer
23 Supervised Learning
Gradient descent
– Gradient:– Vector orthogonal to a
surface in direction of the strongest slope
– derivation of a function in a certain direction is the projection of the gradient in this direction
0,00
0,40
0,80
-1 -0,6 -0,2 0,2 0,6 1
example of an error curve of a weight wi
24 Supervised Learning
Example: Newton-Approximation
– calculation of the root– f(x) = x²-5
– x = 2– x‘ = ½(x + 5/x) = 2.25– X“= ½(x‘ + 5/x‘) =
2.2361xx‘
f(x)= x²-a
tan = f‘(x) = 2x tan = f(x) / (x-x‘) x‘ =½(x + a/x)
25 Supervised Learning
Backpropagation - Learning
– gradient-descent algorithm – supervised learning:
error signal used for weight adaptation– error signal:
– teaching – calculated output , if output neuron– weighted sum of error signals of successor
– weight adaptation: : Learning rate : error signal jiijij oww '
26 Supervised Learning
Standard-Backpropagation Rule– gradient descent: derivation of a function
– logistic function:
f´act(netj) = fact(netj)(1- fact(netj)) = oj(1-oj)
– the error signal j is therefore:
neuronoutputisjif)()1(
neuronhidden isjif)1(
jjjj
kjkkjj
j
otoo
woo
jiijij oww '
xLogistic exf
1
1)(
27 Supervised Learning
Backpropagation
– Examples:– XOR (Excel)– Bank Customer
28 Supervised Learning
Backpropagation - Problems
B CA
29 Supervised Learning
Backpropagation-Problems
– A: flat plateau – weight adaptation is slow– finding a minimum takes a lot of time
– B: Oscillation in a narrow gorge– it jumps from one side to the other and back
– C: leaving a minimum – if the modification in one training step is to
high, the minimum can be lost
30 Supervised Learning
Solutions: looking at the values
– change the parameter of the logistic function in order to get other values
– Modification of weights depends on the output:if oi=0 no modification will take place
– If we use binary input we probably have a lot of zero-values: change [0,1] into [-½ , ½] or [-1,1]
– use another activation function, eg. tanh and use [-1..1] values
31 Supervised Learning
Solution: Quickprop
assumption: error curve is a square function calculate the vertex of the curve
)1()()1(
)()(
tw
tStS
tStw ijij
)()(
tw
EtS
ij
-2 2 6
slope of the error curve:
32 Supervised Learning
Resilient Propagation (RPROP)
– sign and size of the weight modification are calculated separately: bij(t) – size of modification
bij(t-1) + if S(t-1)S(t) > 0bij(t) = bij(t-1) - if S(t-1)S(t) < 0
bij(t-1) otherwise
+>1 : both ascents are equal „big“ step0<-<1 : ascents are different „smaller“ step
-bij(t) if S(t-1)>0 S(t) > 0wij(t) = bij(t) íf S(t-1)<0 S(t) < 0
-wij(t-1) if S(t-1)S(t) < 0 (*) -sgn(S(t))bij(t) otherwise
(*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.
33 Supervised Learning
Limits of the Learning Algorithm
– it is not a model for biological learning
– no teaching output in natural learning
– no feedbacks in a natural neural network (at least nobody has discovered yet)
– training of an ANN is rather time consuming
34 Supervised Learning
Exercise - JavaNNS
– Implement a feed forward network containing of 2 input neurons, 2 hidden neurons and one output neuron. Train the network so that it simulates the XOR-function.
– Implement a 4-2-4-network, which works like the identity function. (Encoder-Decoder-Network). Try other versions: 4-3-4, 8-4-8, ...What can you say about the training effort?
35 Supervised Learning
Pattern Recognition
Eingabeschicht 1. verdeckte 2. verdeckte Ausgabe-Schicht schicht schicht
input layer output layer
2. hidden layer
1. hidden layer
36 Supervised Learning
Example: Pattern Recognition
JavaNNS example: Font
37 Supervised Learning
„font“ Example
– input = 24x24 pixel-array
– output layer: 75 neurons, one neuron for each character:– digits– letters (lower case, capital) – separators and operator characters
– two hidden layer of 4x6 neurons each
– all neuron of a row of the input layer are linked to one neuron of the first hidden layer
– all neuron of a column of the input layer are linked to one neuron of the second hidden layer.
38 Supervised Learning
Exercise– load the network “font_untrained”– train the network, use various learning
algorithms:
(look at the SNNS documentation for the parameters and their meaning)
– Backpropagation =2.0– Backpropagation =0.8 mu=0.6 c=0.1
with momentum– Quickprop =0.1 mg=2.0
n=0.0001 – Rprop =0.6
– use various values for learning parameter, momentum, and noise:– learning parameter 0.2 0.3 0.5 1.0– Momentum 0.9 0.7 0.5 0.0– noise 0.0 0.1 0.2
39 Supervised Learning
A1: Credit historyA2: debtA3: collateralA4: income
Example: Bank Customer
• network architecture depends on the coding of input and output• How can we code values like good, bad, 1, 2, 3, ...?
40 Supervised Learning
Data Pre-processing
– objectives– prospects of better
results– adaptation to algorithms– data reduction – trouble shooting
– methods– selection and
integration– completion– transformation
– normalization– coding– filter
41 Supervised Learning
Selection and Integration
– unification of data (different origins)– selection of attributes/features– reduction
– omit obviously non-relevant data – all values are equal – key values– meaning not relevant– data protection
42 Supervised Learning
Completion / Cleaning
– Missing values – ignore / omit attribute– add values
– manual– global constant („missing
value“)– average– highly probable value
– remove data set
– noised data– inconsistent data
43 Supervised Learning
Transformation
– Normalization– Coding– Filter
44 Supervised Learning
Normalization of values
– Normalization – equally distributed– in the range [0,1]
– e.g. for the logistic functionact = (x-minValue) / (maxValue - minValue)
– in the range [-1,+1]– e.g. for activation function tanh
act = (x-minValue) / (maxValue - minValue)*2-1
– logarithmic normalization– act = (ln(x) - ln(minValue)) / (ln(maxValue)-
ln(minValue))
45 Supervised Learning
Binary Coding of nominal values I
– no order relation, n-values– n neurons, – each neuron represents one and only one value:
– example: red, blue, yellow, white, black
1,0,0,0,0 0,1,0,0,0 0,0,1,0,0 ...– disadvantage:
n neurons necessary lots of zeros in the input
46 Supervised Learning
Bank Customercredit
history debt incomecollateral
Are these customers good ones?1: bad high adequate 32: good low adequate 2
47 Supervised Learning
The Problem: A Mailing Action
– mailing action of a company: – special offer– estimated annual income per customer:
– given:– 10,000 sets of customer data
containing 1,000 cancellers (training)
– problem: – test set containing 10,000 customer data– Who will cancel ? Whom to send an offer?
customerwillcancel
willnot cancel
gets an offer 43.80€ 66.30€
gets no offer 0.00€ 72.00€
Data Mining Cup 2002
48 Supervised Learning
Mailing Action – Aim?
– no mailing action:– 9,000 x 72.00 = 648,000
– everybody gets an offer:– 1,000 x 43.80 + 9,000 x 66.30 = 640,500
– maximum (100% correct classification):– 1,000 x 43.80 + 9,000 x 72.00 = 691,800
customerwillcancel
willnot cancel
gets an offer
43.80€
66.30€
gets no offer
0.00€ 72.00€
49 Supervised Learning
Goal Function: Lift
basis: no mailing action: 9,000 · 72.00goal = extra income:liftM = 43.8 · cM + 66.30 · nkM – 72.00· nkM
customerwillcancel
willnot cancel
gets an offer
43.80€
66.30€
gets no offer
0.00€ 72.00€
50 Supervised Learning
Dataresults>
<important
^missing values^
----- 32 input data ------
51 Supervised Learning
Feed Forward Network – What to do?
– train the net with training set (10,000)– test the net using the test set ( another 10,000)
– classify all 10,000 customer into canceller or loyal
– evaluate the additional income
52 Supervised Learning
Results
data mining cup 2002
neural network project 2004
– gain: – additional income by the mailing action
if target group was chosen according analysis
53 Supervised Learning
Review Students Project
– copy of the data mining cup
– real data– known results– contest
motivation
enthusiasm
better results
• wishes– engineering approach data
mining– real data for teaching purposes
54 Supervised Learning
Data Mining Cup 2007
– started on April 10. – check-out couponing – Who will get a rebate coupon?– 50,000 data sets for training
55 Supervised Learning
Data
56 Supervised Learning
DMC2007
– ~75% output = N(o)– e.g. classification has to > 75%!!
– first experiments: no success – deadline: May 31st
57 Supervised Learning
Optimization of Neural Networks
objectives
– good results in an application: better generalisation (improve correctness)
– faster processing of patterns(improve efficiency)
– good presentation of the results(improve comprehension)
58 Supervised Learning
Ability to generalize
– network too large:– all training patterns are learned from
memory– no ability to generalize
– network too small:– rules of pattern recognition can not be
learned(simple example: Perceptron and XOR)
• a trained net can classify data (out of the same class as the learning data)that it has never seen before– aim of every ANN development
59 Supervised Learning
Development of an NN-application
calculate network output compare to
teaching output
use Test set data
evaluate output
compare to teaching output
change parameters
modify weights
input of training pattern
build a network architecture
quality is good enough
error is too high
error is too high
quality is good enough
60 Supervised Learning
Possible Changes
– Architecture of NN– size of a network– shortcut connection– partial connected layers– remove/add links– receptive areas
– Find the right parameter values– learning parameter– size of layers– using genetic algorithms
61 Supervised Learning
Memory Capacity
– figure out the memory capacity– change output-layer: output-layer input-layer
– train the network with an increasing number of random patterns:
– error becomes small: network stores all patterns– error remains: network can not store all
patterns– in between: memory capacity
Number of patternsa network can store without generalisation
62 Supervised Learning
Memory Capacity - Experiment
– output-layer is a copy of the input-layer
– training set consisting of n random pattern
– error:– error = 0
network can store more than n patterns
– error >> 0 network can not store n patterns
– memory capacity:error > 0 and error = 0 for n-1 patterns and error >>0 for n+1 patterns
63 Supervised Learning
Layers Not fully Connected
– partial connected (e.g. 75%)– remove links, if weight has been nearby 0 for
several training steps– build new connections (by chance)
new
removed
connections:
remaining
64 Supervised Learning
Summary
– Feed-forward network– Perceptron (has limits)– Learning is Math
– Backpropagation is a Backpropagation of Error Algorithm
– works like gradient descent– Activation Functions: Logistics, tanh
– Application in Data Mining, Pattern Recognition– data preparation is important
– Finding an appropriate Architecture