Upload
urdemalez
View
278
Download
16
Embed Size (px)
DESCRIPTION
Neural Nets: An Introduction for Mathcad Users
Citation preview
Neural Nets: An Introduction for Mathcad Users Page 1 of 27
by Eric EdelsteinMathSoft, Inc.
In this article, we will consider modeling a feed forward network (a special type of weighteddirected graph) after the way a brain operates and begin looking at algorithms that teach thenetwork how to learn.
Page 2 of 27
One of the things that makes humans efficient is our ability to change. This manifests itself inmany ways. The first is that our brains need not be completely redesigned just to change ourlunch order when we find they're out of octopus sukiyaki. This is non-trivial. Consider theadvantages our flexibility has over, for example, a microchip. The electronic circuits may beable to perform many different operations, but the number is finite and the abilities don'tchange over time. If you have an OR gate, it must be taken apart and rebuilt if you want anAND gate. If you want it to add numbers, you have to compile quite a few components.
A human, however, can add to his/her stockpile of abilities without adding brain cells. Howdoes this happen? No one knows exactly, but certain ideas in the theory of learning aregetting clearer, and some can be modeled on a computer. We are now in the age wherecomputers can be taught to learn new tricks. That is, a program representing a neural net canbe made to learn infinitely many different routines (one at a time). That makes it extremelyflexible, and hence, powerful. Neural nets have been created that mimic and anticipatehuman behavior, run machinery in automated factories, read books aloud, make complexfinancial decisions, and a host of other impressive tasks.
Page 3 of 27
One of the most common tasks required of a neural net is the recognition of patterns andreaction to them in some manner. This will be demonstrated in this article. The reason forthis particular emphasis is that once a neural net can find a pattern, it can start predicting.The art of prediction is an old one. There are many and varied statistical techniques toapproximate predictions. However, neural nets have been shown to be more accurate onsome occasions. Also, unlike a standard statistical program which allows for one set ofanalyses, the same neural net can learn to do different analyses on different kinds of data.It will just need to be retrained. However, the most fundamental difference is one of action.A neural net will not only predict, but will also act in accordance with this prediction as weshall see later on.
Now, consider the cellular make up of a brain: neurons. There are millions of neuronsinterconnected along axons. The center of a neuron receives stimuli and decides,somehow, whether or not to send a signal to neighboring neurons. If it decides to send outa signal, an electrical burst, it does so through the axons. This is how the brain makes itsown predictions and actions. Given this description, the brain can be thought of as a graphwith the main neuron body represented by a node or vertex and the edges representingaxons.
Page 4 of 27
These graphs (also called networks) contain points, called nodes or vertices; the linesegments connecting these nodes are called edges. The endpoints of an edge areits vertices. An orientation of the edges is a choice of starting and ending vertex forthe edge. Usually, we draw an arrow on an oriented edge pointing from the initial tothe final node. If each edge of the graph has an orientation, the graph is called adirected graph (or digraph, for short).
A graph with this association and the inherent implications that brings is called aneural network (or a neural net, for short). We shall restrict ourselves to the study ofneural nets of a certain form: we assume our neural nets are layered andforward-feed. These are weighted, directed graphs with nodes that can be broken upinto discrete vertical layers (that is, the nodes lie on vertical slices through the graph).The orientations given to the edges are the same throughout the graph, either left toright or right to left. In this article we will use the convention of left to right. Such adigraph looks like:
Page 5 of 27
With the edge orientation ofleft to right, the leftmost layeris called the input layer, therightmost, the output layer,and all those between, thehidden layers. Nodes areoften called units, makingthe leftmost ones, inputunits, the rightmost, outputunits, and those in between,hidden units.
As mentioned earlier, we are concerned not only with the choices of edges and nodes, butalso with the weighting of them. To determine what the weighting should be, let's return tothe brain. If a neuron receives a very small stimulus, it does not fire. Once, however, it doesreceive a significant enough stimulus, it fires a complete burst. It follows the all-or-noneprincipal. The cut-off value for stimuli is called a threshold. It is the amount of stimulus for aparticular neuron below which no reaction signal will be sent.
Page 6 of 27
In modeling graphs after brains, we associate to each node, , a threshold value, , rather
like a transistor has in a logic gate.
The axon connections may be very strong or weak. That is, the signal sent from one neuronto another via a particular axon may be completely passed on, or it may be impeded. Thiscan be thought of as the strength of the connection. The degree of connection betweenthose two neurons will reflect how interdependent they are. This strength between them isused to define the weights on the edges in the neural net. If the weight is close to zero on anedge between two nodes, then we can think of these two units as having little effect on eachother. If, on the other hand, the weights are high in absolute value, then the units' effect oneach other is strong. The weight on the edge from vertex to vertex is denoted w
At this point we've completed the fundamental association between a simplified brainand a layered feed-forward neural net. Let's encapsulate in a table, below:
Page 7 of 27
Page 8 of 27
It remains only to show how signals are passed along. Let's say that we're in the middle of aneural net at a vertex, . It would look something like:
Where x1 through x4 represent the strengths of the impulses that have been sent to thisnode, . The effect of x1on will be determined by the strength of the connection W1. So
by defining our weights correctly the effect of x1on will be the product W1x1. Taking the
other incoming impulses into consideration, sees the following impulse:
Page 9 of 27
i 1 4
i
xi Wiν
The reaction at to this impulse must be determined. First we must see if the incomingsignal passes the threshold test. To do this, subtract the threshold from the impulse anddetermine if the result is positive or negative. Then, a response function of some kind,called the activation function, will act on the impulse, provided it is above the thresholdlevel. We perform these two steps as one by assuming some structure on the function.We will assume that the activation function will treat positive numbers and negativenumbers differently. That is, the function values for a positive input will correspond to theneuron firing. The function values for negative input values will correspond to non-firing.With this we find the response at to the stimulus is:
f
i
xi Wiν
τν
Page 10 of 27
A typical activation function might be f x( ) x 0( ) x 5 4.995 5
4 2 0 2 40.5
0
0.5
1
1.5
f x( )
xτν 3
This is an example of an all or none response. Note what it wouldlook like when applied in a neural net with a threshold value 3:
f x( ) x τν 0
0.5
0
0.5
1
1.5
f x( )
x
Page 11 of 27To get an idea of what's going on geometrically, let's considertwo impulses going to a unit with the same threshold of 3. Let'ssay one edge has a weight of a half, and the other a quarter. w1 .5 w2 .25
f x1 x2( ) x1 w1 x2 w2 τν 0
i 5 10 j 5 10 Mi 5( ) j 5( ) f i j( )
The z-axis describes the node'sreaction output to the two stimuli x1and x2, plotted in the x-y plane.
The neural net to the left of thenode :
MPage 12 of 27
All the way, from left side to right side:
Now that we know how a single nodereacts to stimuli, we can determinethe outputs of the output units for achoice of input units. We consider avery simple neural net:
There are two input nodes, 1 and 2, three hidden units, 3, 4, and 5, and one outputnode 6.
Let's assign some weights to the edges.
w13 w14 w24 w25 w36 w46 w56( ) 1 1 1 1 1 2 1( )
We must decide upon an activation function. Let's choose: f x( ) x 0( )
Page 13 of 27Pick threshold values: Now pick the input values:
y1
y2
1
1
τ3
τ4
τ5
τ6
0
1.5
0
.5
For the input layer we assume the thresholds are zero and the activation function is theidentity, so that the signal put into 1 is the same as the signal coming out from 1.
y3 f y1 w13 τ3 y4 f y1 w14 y2 w24 τ4 y5 f y2 w25 τ5
y6 f y3 w36 y4 w46 y5 w56 τ6
The output unit for the corresponding input pattern is y6 0
Do you recognize this binary function? (Hint: It's one of the standard logical operations.)
Page 14 of 27Learning in Neural nets
Let's now consider how to change the net. Thinking of the graph as a brain, it seems clearthat as learning goes on, the vertices (neurons) aren't going to go wandering all about.That is, as we learn, the cellular structure of the brain can't move around very much. It wasfound that as we learn, the chemical structure of the brain does change in small local ways.When we learn to do something, or not to do something else, various connectionsbetween the neurons are either strengthened or weakened. This corresponds to achange on the edge weights of our network. We start with the simplest type of neural net, atwo layered, feed-forward net. We will show how the weight changes take place. Sincethere are only two layers, and in every feed forward net there is both an input and outputlayer, there can be no hidden units.
We say that a layeredgraph is fully connected ifevery node in each layer isconnected to every othernode in the next layer to theright. It generally looks like:
Note that nodes in onelayer aren't connectedto any other nodes inthe same layer. This isalways the case inlayered neural nets.
Page 15 of 27
There is a routine that we can carry out so that the neural net can figure out what the weightson the edges should be to realize a certain set of fixed reactions. We feed the net specificinputs with known desired outputs. We compare the network's output with the desiredoutput and change weights accordingly. This routine is then repeated until all outputs arecorrect for all inputs.
Essentially, this can be thought of as a pattern recognition problem. Let's say that we havea 2 layer neural net with two input nodes, and one output. We might want to teach the net toproduce the result 1 AND 2 for the output 3, using the following logic table:
The net must be trained to recognize the pattern (1,1) as 1 and the other three as 0, in thesame way as you apply a name to a face.
Page 16 of 27
For problems of this type it is often convenient to talk about input and output patterns. We'vealready mentioned that the input can be thought of as a pattern. The output can be thought ofone as well. Consider a big neural net with one input node, some hidden units, and 64 outputnodes arranged as an 8 by 8 square. We could train the net that given an input of 0 to send1's to the outer most units of the square, and 0's to all others. We could in addition, teach it thatgiven in input of 1, it should produce outputs of 1's to the fourth and fifth columns in the squareof output units, and 0's to the others. It would look like:
The 0's have been left out for clarity. The ellipse in the middle represents the hidden units.The square represents the output units in an 8 by 8 square. As you can see, the output nowrepresents a pattern in the visual sense. The output looks like the numeral for the input (well,sort of). Note that this is completely equivalent to learning the action of a function.
Page 17 of 27As far as the computer is concerned, the neural net is a function,
f:RR64
with the following property:
In this way we realize that pattern recognition and learning the action of a fixed function are thesame in principle.
With this in mind, there is a learning algorithm which teaches the two layered feed-forwardneural net to recognize patterns. It works as follows:
2-Layer Feed-Forward Learning for binary inputand one binary output and all thresholds equal
Page 18 of 27
Note: By "binary" in this section, we mean the set {-1,1} (we use -1 instead of the usual 0).Assume we start with the edges having random weights assigned to them. Then, given aninput pattern I (some sequence of -1's and 1's), there is an output pattern Z (a number,though in general, not the correct one) and a corresponding desired output pattern O (also
a number).The weights going out from the vth input unit must be changed by adding:
Δwv
ε O Iv
1 O Z=( )[ ]= so that wv
wv
ε O Iv
1 O Z=( )[ ]=
is a small increment. We find the direction for the change from the OIv(1-(O=Z)) part. The
step size is given by . Note that if the net's output is the ideal desired output (i.e., it haslearned to identify that pattern or function correctly), then O=Z. In this case w=0 for the net,so no changes will take place. This follows the "if it ain't broke, don't fix it" principle of highercomputer science. Since a function usually consists of correctly identifying several patterns(one pattern for each point in its domain), we would like to see this net learn several differentpatterns concurrently. This is one of the real advantages of the neural net model. It can learnseveral different things without changing its basic structure. You can have a neural net learnthe AND function, and then with a change of weights learn the OR function. No new circuitry isneeded. And in this case, the underlying graph is completely identical.
Page 19 of 27
The more patterns we try to make the net learn, the more likely it will incorrectly remember apreviously learned pattern. Luckily, the weights won't have changed much (with small), sowe keep training and retraining. In certain cases, it has been proven that this method mustconverge to successful several pattern recognition in a finite number of steps. This problemis very much like the tent peg problem. It's easy to nail in one peg, but while nailing thesecond peg, you've loosened the first, which then has to get rehammered..
One final improvement before continuing. Since we want to be able to change the thresholdsas the network learns, we treat them as weights for new edges. To do this we add a newnode for each different threshold in the net. When we give the net its input patterns, we makesure the value of 1 goes to the nodes providing threshold values. The weight on an edgeconnecting such a vertex to the next layer will work as a threshold. Let's try an example: Say we want the computer to come up with a neural net that will producean AND function. We start with a net that has two input nodes, one output node, and no hiddenunits. This is only a guess. In general it is a difficult problem to know how many units areneeded to solve your problem, and if it's solvable by these methods at all. Let's assume thatall thresholds will be the same through the learning. In this case it is sufficient to add only oneinput node (which will always get an input value of 1). The network looks like this:
Page 20 of 27
With a little foresight and a hunch based on our choice of the binary system as {-1,1} wechose the activation function accordingly:
f 0( ) 0f x( )
x
xx 0=( )
4 2 0 2 4
1
0
1
f x( )
x
f 5( ) 1
f 5( ) 1
Page 21 of 27
k 0 2
We start with the weights set randomly. Let's try:
w0
1 w1
0 w2
2 ε .3
For this network, the output, Z is given by:
Z ν0 ν1 ν2( ) f ν0 w0
ν1 w1
ν2 w2
Page 22 of 27We begin with the first pattern (1,-1,-1). This has an ideal output of -1.
I 1 1 1( )( )T
O 1
The actual output is: Z1 Z 1 1 1( ) Z1 1
The change of weights: ε O Ik
1 O Z1=
0
0
0
Change the weights: wk
wk
ε O Ik
1 O Z1=
New weights: w0
1 w1
0 w2
2
Page 23 of 27The second pattern (1,1,-1). This has an ideal output of -1.
O 1I 1 1 1( )( )
T
The actual output is: Z2 Z 1 1 1( ) Z2 1
The change of weights: ε O Ik
1 O Z2=
0
0
0
Change the weights: wk
wk
ε O Ik
1 O Z2=
New weights: w0
1 w1
0 w2
2
Page 24 of 27The third pattern (1,-1,1). This has an ideal output of -1.
I 1 1 1( )( )T
O 1
The actual output is: Z3 Z 1 1 1( ) Z3 1
The change of weights: ε O Ik
1 O Z3=
0.3
0.3
0.3
Change the weights: wk
wk
ε O Ik
1 O Z3=
New weights: w0
0.7 w1
0.3 w2
1.7
Page 25 of 27The fourth pattern (1,1,1). This has an ideal output of +1.
I 1 1 1( )( )T
O 1
The actual output is: Z4 Z 1 1 1( ) Z4 1
The change of weights:ε O I
k 1 O Z4=
0
0
0
Change the weights: wk
wk
ε O Ik
1 O Z4=
New weights: w0
0.7 w1
0.3 w2
1.7
Page 26 of 27
At this point we've made a pass through each pattern exactly once. We repeat thisprocedure several times, until the weights stabilize. To do this, change the initialassignments of the weights to the edges (where the big red arrow is.) Then pagedown to see what the new weights should be.
Eventually, you will see that the matrices of weight changes is zero. At this point theweights stop changing, and the output will be the correctly predicted and desired outputfor each pattern. This should take six complete passes starting with w0=1, w1=0, andw2=2.
In Future Issues:
BIG, Multilayered neural nets, Gradient Descent Learning, and Back Propagation Learning.
Page 27 of 27
References
1. Drew Van Camp, "Neurons for Computers," Scientific American, Sept. 1992, pp.170-172.
2. R. C. Lacher, Artificial Neural Networks, An Introduction to the Theory and Practice.Lecture Notes, Version 1, October 19, 1991.
3. Patrick Shea and Vincent Lin, "Detection of Explosives in Checked Airline BaggageUsing an Artificial Neural System," Science Applications International Corporation, SantaClara, CA.