Upload
others
View
6
Download
1
Embed Size (px)
Citation preview
Lyle N. Long 1
Artificial Neural NetworksArtificial Neural Networks
Lyle N. LongLyle N. Long
Professor of Aerospace EngineeringProfessor of Aerospace EngineeringDirector, Institute for Computational ScienceDirector, Institute for Computational Science
http://www.personal.psu.edu/lnlhttp://www.personal.psu.edu/[email protected]@psu.edu
The Pennsylvania State University, University Park, PA 16802The Pennsylvania State University, University Park, PA 16802
Seminar for Institute for Computational Science Seminar SeriesSeminar for Institute for Computational Science Seminar SeriesOct. 2005Oct. 2005
Lyle N. Long 2
IntroductionThere is increasing interest in neuroscience, intelligent systems, artificial intelligence, robots, ...Neural networks are just one approach in this area Can we build neural networks the size of the human brain or larger ?
For neuroscience, can we develop systems that emulate the human brain ?For intelligent systems, can we build enormous neural networks (for pattern recognition, nonlinear function approximation, etc) ?
If we can develop them, how long would it take to train them?
After all, it takes about 18 years to train a human...Could these become “conscious” ?
Lyle N. Long 3
Uses of Neural NetworksPattern recognitionFunction approximationScientific classificationControlCognitive models...
For linear applications or linear equations, you don’t really need ANN’s,but for nonlinear applications or equations they are quite valuable. Most things
are nonlinear, and linear theory and linear algebra has been used too long.
Lyle N. Long 4
Neural Networks
Neural Networks
Engineering Applications
Brain Modeling / Simulation
Biological plausibilityAlgorithm efficiency and accuracy
Intelligent Systems Cognitive modeling and Neuroscience
Lyle N. Long 5
Aston Martin DB9
“The 2005 DB9 contains the first onboard neural network in an engine control module. Unlike traditional computer systems that need to be programmed for each step, neural networks are programs modeled on the way human brains learn and adapt. The DB9's module keeps tabs on engine combustion performance with a sophisticated software program that compares actual engine performance to the design specifications.”
http://media.ford.com/newsroom/feature_display.cfm?release=18677
Lyle N. Long 6
Types of ANN’sMulti-layer perceptrons (MLP) SpikingAdaptive Resonance Theory (ART)Recurrent
...
Lyle N. Long 7
Introduction (cont.)Massively parallel computers are approaching power of human brainCan we develop neural networks which work well on these machines?It is well known that artificial neural networks do not scale well
Lyle N. Long 8
Creatures and Technology
from:H. Moravec
IBMBlueGene/L
Lyle N. Long 9
Human Brain vs SupercomputersHuman Brain
~100 billion neurons (1011) (for comparison, a rat cortex has about 30 million)~1000 synapses per neuron (1014 total ) This is roughly 100 terabytes (1014) of data storageIt is also capable of roughly 1000 teraflops (1015 operations per second)
IBM BlueGene/L Computer (DOE) has:65,536 processors ( 1013 transistors? )33 terabytes RAM (1013)137 teraflops (1014 operations per second)
NASA’s Columbia computer has:10,240 Itanium processors (1.5 GHz)10 terabytes RAM (1013) 40 teraflops (1013 operations per second)Theoretically more capable than the brain of a monkey ... and near human capability ...
Lyle N. Long 10
Brains
Lyle N. Long 11
Human Neocortex
• Outer layer of brain, about size and shape of wrinkled napkin.• Neocortex exists primarily in mammals (somewhat in birds and reptiles)• Responsible for Perception, language, imagination, mathematics, art,
music, planning, ... • Six layers and columnar structure• Approximately 30 billion neurons • Size = 50 cm x 50 cm x 2 mm• About 50,000 neurons / mm2 of sheet
from: T. Dean
Lyle N. Long 12
Neocortex Area
Human: 2500 cm2
Monkey: 250 cm2
Cat: 83 cm2
Rat: 6 cm2
Lyle N. Long 13
Neurons: Dendrites and Axons(Information travels thru electrical signals)
(about 3microns)
Lyle N. Long 14
Synapses
Neuron
SynapseNeurotransmitters
In dendrites and axons signals propagate via electrical signals
In synapses, information propagates via chemical-based neuro-transmitters
There are roughly 1014
synapses in human brains
Lyle N. Long 15
Synapses“You are your synapses” (LeDoux)Your memories, emotions, etc. are stored in your synapsesLearning occurs via changes to the synapsesSome of the synapses are set at birth, while others are trainedHuman memory is a product of the synapses, and the process is often referred to as Hebbianlearning (after D.O. Hebb, a Canadian neuroscientist, The Organization of Behavior, Wiley, 1949.)
Lyle N. Long 16
Synapses (cont.)The synapses can be inhibitory or excitatory. Glutamate (excitatory) and GABA (gamma-aminobutyricacid) (inhibitory) are the primary neural transmitters in the synapseThe human ability to hear, remember, fear, or desire are all related to glutamate and GABA in the synapses. Drugs can directly effect synapses:
The drug Valium works by enhancing GABAProzac prevents the removal of serotonin in the synaptic spaceLSD acts on serotonin receptorsCocaine and amphetamines affect norepinephrine and dopamine levels in synapses
Lyle N. Long 17
SynapsesSome Artificial Neural Network’s (ANN) (e.g. spiking NN’s) are more biologically plausible than othersSome ANN’s (e.g. those that use backpropagation) are better suited to engineering applications (e.g. nonlinear function approximation, pattern recognition, nonlinear control, etc.)The complex chemistry that occurs in human synapses are very crudely approximated in ANN’s(e.g. as weights in ANN)
Lyle N. Long 18
ConsciousnessWhat is consciousness?Can a computer become conscious?See book by LeDoux (Synaptic Self):
“Conciousness can be thought of as the product of underlying cognitive processes”“… we are never aware of processing, but only of the consequences of processing.”“You are your synapses.”
See book by Hawkins and BlakesleeYour neocortex is reading this text!
Lyle N. Long 19
ConsciousnessSee book by Dennett (Consciousness Explained):
“Human consciousness is itself a huge complex of memes (or more exactly, meme-effects in brains) that can best be understood as the operation of a “Von Neumannesque” virtual machine implemented in the parallel architecture of a brain that was not designed for any such activities. The powers of this virtual machine vastly enhance the underlying powers of the organic hardware on which it runs...”
Wikipedia.org: “In casual use, the term meme often refers to any piece of information passed from one mind to another.”
Lyle N. Long 20
Moore’s LawFor about 50 years we have seen computer performance double every 18 monthsIntel expects this to continue until at least 2011Number of transistors per chip area has doubled every 18 months:
1974, Intel 8088, 29,000 transistors2000, Intel Pentium 4, 42,000,000 transistors2011, Intel, 20 billion transistors/chip expected (maybe 128 processors per chip)
Lyle N. Long 21
Cycle TimesTypical cycle times in the brain are on the order of 20 milliseconds (2E-2)The IBM BlueGene/L computer uses 700 MHz chips, which corresponds to 1.4 nanoseconds (1.4E-9 seconds)Cognitive neuroscientists are trying to emulate the human brain (and often limit their cycle times to 20 ms)Engineers are trying to build Intelligent Systems and would be very happy to have performance and cycle times better than human brains!
Lyle N. Long 22
IBM BlueGene/L (2005)
65,536 processors33 terabytes RAM (1013)
137 teraflops (1014 operations per second)
http://www.research.ibm.com/journal/rd49-23.html
Lyle N. Long 23
Future ComputersIntel expects the first petaflop (1015
ops/second) to appear by about 2009 – this is human-brain-level computing power (it might cost $150M and require 8 MWatts of power)By 2011 there could be several petaflopcomputersBy 2011 there could be supercomputers several times more powerful than human brains
S. Wheat, AIAA Paper No. 2005-7148, InfoTech@Aerospace Conference, Sept., 2005
Lyle N. Long 24
Very Rough No. of Inputs to Brain
O(104)Hair cells in cochleaO(106)Skin nerve endingsO(108)Retinal rodsO(106)Retinal conesO(106)Olfactory Receptor CellsO(104)Taste budsO(106)Fibers in Optic Nerve
NumberItem
O(104)Hair cells in cochleaO(106)Skin nerve endingsO(108)Retinal rodsO(106)Retinal conesO(106)Olfactory Receptor CellsO(104)Taste budsO(106)Fibers in Optic Nerve
NumberItem
Lyle N. Long 25
Artificial Neural NetworksArtificial Actual
Crude Model:Inputs ~ Dendrites
Outputs ~ AxonsWeights ~ Synapses
Lyle N. Long 26
Forward propagation
j i ijx yW=∑( )j jy f x=
2( ) 11 jx
f xe−
= −+
20.5 ( )i iE y d= −∑
y1
y2
y3
y4
w1
w2
w3
w4
xj
(sigmoid fctn.)
(d is desired output)
e
y = f(xj)
A Single Artificial Neuron
The weights (wj) store the “knowledge”and need to be trained
Lyle N. Long 27
Neural Network
7-3-3 Network:No. of weights = 3 * 7 + 3 * 3 = 30
Imagine a huge network, e.g. 1,000,000 – 100 – 10:No. of weights = 100,001,000 (~ 1 gigabyte)And all neurons are usually connected to all neurons in each layer
Lyle N. Long 28
Backward propagation
' ' ''(1 ) ( )ij ij j i ij ijw w e x w wα η α= + − + −
(1 )( )j j j j je y y d y= − −
)1.0..()5.0..(
geratelearninggefactormomentum
==
ηα
Given a sets of inputs and corresponding outputs, we use backpropagationto adjust the weights (to learn the dataset)
Lyle N. Long 29
Scalable Character Test Set(4 output values)
As a small example, lets assume we want a ANN to be able to detect which of the following four letters (A, B, C, or D) are being displayed. And lets represent each letter using 15 pixels (3x5).
1 0
Lyle N. Long 30
Digitizing Character Pixels (an “A”)
= 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1
Note: you don’t have to use just 0 or 1, you could use 0.0 to 1.0 (ie shades of grey) or colors
Lyle N. Long 31
Small Character Test Set Example(4 output values)
15 Inputs (3x5 pixels)
4 outputs (A, B, C, and D)
75 weights(i.e. 15x5)
(Note:95 weights =380 bytes)
20 weights(i.e. 5x4)
Lyle N. Long 32
ANN Training ProcessSet all weights to random valuesUsing a large training set of data, show the network one example at a time (where we know what the output should be) and adjust the weights each time using forward and backward propagation
Lyle N. Long 33
Example: Training for An “A”
1 0 0 0
0 1 0 1 0 1 1 1 1 1 0 1 1 0 1
FeedForward(determineoutput values ofthis input)
Back Propagation(adjust weights to reduce error
Lyle N. Long 34
Example: Use of Network after Training
0.85 0.19 0.23 0.54
Given Some Inputs
Since output 1 islarge, this is likely to be an “A” ?
Lyle N. Long 35
Scalable Character Test Set(48 output values)
Inputs = 15 pixels
Outputs = 48 chars
If we use 10 hidden layers, then:
Weights = = 48*10 + 10*15= 630
(2520 bytes)
Lyle N. Long 36
How Long to Train? How Large Should Training Set Be?
Very difficult to say ... depends on the problemNumber of Hidden layers to use?
Log2 (no. of inputs) ?Average of no. of inputs and outputs ?
How many hidden layers?Need at least one to capture nonlinear effectsPeople seldom use more than oneHuman brain as roughly six layers in neocortex
Need to avoid overtraining and undertraining of network
Lyle N. Long 37
Training an ANN
00.5
1
1.52
2.53
3.54
0 1000 2000 3000Iterations
Erro
r
Lyle N. Long 38
Recurrent ANN
Feedback is used in human brain. Recurrent ANN’s are very valuable for time-series applications.
Lyle N. Long 39
Our Parallel ANN Approach
Inputs feed into each column
But in hidden layers not all neurons are connected to all other neurons
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,”AIAA Paper No. 2005-7168, Sept. 2005.
http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 40
Object Oriented (C++)Artificial Neural Network (ANN)
Serial code: C++ Parallel code: C++ combined with MPI
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 41
Serial ANN Training Time
Table 1. Training time required for the serial ANN
Resolution Total Weights Iterations Training time (sec)
1 1890 37824 5.4 2 3240 64800 13.1 4 8640 172800 76 5 12690 253776 157 8 30240 604800 842 9 37890 757776 1313
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 42
ANN Training (serial code)
0
20
40
60
80
100
120
0 5000 10000 15000 20000
No. of trainings
Per
cent
cor
rect
5184 weights
27072 weights
2016 weights
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 43
ANN Training (parallel code)
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 44
Training Time (scaling processors with weights)
200
300
400
500
600
700
800
0 100 200 300 400 500 600Number of Processors
Tim
e(se
c)
1.0E+05
5.1E+06
1.0E+07
1.5E+07
2.0E+07
2.5E+07
3.0E+07
Wei
ghts
Training time(sec)Weights
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 45
Forward Propagation Time (scaling processors with weights)
0.00E+00
5.00E-05
1.00E-04
1.50E-04
2.00E-04
2.50E-04
3.00E-04
3.50E-04
4.00E-04
4.50E-04
5.00E-04
0 100 200 300 400 500 600Number of Processors
Tim
e(se
c)
1.0E+05
5.1E+06
1.0E+07
1.5E+07
2.0E+07
2.5E+07
3.0E+07
Wei
ghts
Forward propagationtime(sec)Weights
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 46
Large ANN Cases
Processors Inputs Neurons Neurons Per Hidden Layer
Weights Percent Correct
MemoryUsed (GB)
CPU Time (sec.)
16 37,500 1584 256 9,613,376 100 % 0.08 246 64 150,000 6272 1024 153,652,480 100 % 1.20 2489
500 600,000 25,000 4000 2,400,106,384 89 % 19.0 6238
~10 times fewer synapses than rat. Trained in under two hours.
Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf
Lyle N. Long 47
ConclusionsANN’s are widely used in practical applicationsLarge ANN’s will be quite interestingWe have developed parallel object-oriented C++ software to simulate very large neural networksMore tests are needed, but the performance results are very promising
Lyle N. Long 48
References1. Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA
Paper No. 2005-7168, Sept. 2005.2. Mitchell, T. M., Machine Learning, McGraw-Hill, NY, 1997.3. Rumelhart, D.E. and McClelland, J.L., Parallel Distributed Processing, MIT Press, 1986.4. LeDoux, J., Synaptic Self, Penguin, New York, 2002.5. Hawkins, J., and Blakeslee, S., On Intelligence, Times Books, New York, 2004.6. Dean, T., “A Computational Model of the Cerebral Cortex”, AAAI Conference, Pittsburgh,
20057. Mountcastle, V.B., “Introduction to the special issue on computation in cortical columns,”
Cerebral Cortex, Vol. 13, No. 1, 2003.8. Mumford, D., “On the computational architecture of the neocortex II: The role of cortico-
cortical loops”, Biological Cybernetics, Vol. 66, 1992.9. Dennett, D.C., Consciousness Explained, Back Bay, Boston, 1991.10. Kurzweil, R., The Age of Spiritual Machines, Penguin, NY, 1999.11. Moravec, H., Robot: mere machine to transcendent mind, Oxford University Press, November
1998.12. Gerstner, W. And Kistler, W.M., Spiking Neuron Models, Cambridge Univ. Press, Cambridge,
2002.13. http://www.nas.nasa.gov/Resources/Systems/columbia.html14. Haykin, S., Neural Networks: A Comprehensive Foundation, 2nd Ed., Prentice-Hall, 1999.15. Werbos, P.J., Backpropagation: Basics and New Development, The Handbook of Brain
Theory and Neural Networks, MIT Press, First Edition, 1995.