Artificial Neural Networks - Computational Science - Home

Lyle N. Long 1

Artificial Neural NetworksArtificial Neural Networks

Lyle N. LongLyle N. Long

Professor of Aerospace EngineeringProfessor of Aerospace EngineeringDirector, Institute for Computational ScienceDirector, Institute for Computational Science

http://www.personal.psu.edu/lnlhttp://www.personal.psu.edu/[email protected]@psu.edu

The Pennsylvania State University, University Park, PA 16802The Pennsylvania State University, University Park, PA 16802

Seminar for Institute for Computational Science Seminar SeriesSeminar for Institute for Computational Science Seminar SeriesOct. 2005Oct. 2005

Lyle N. Long 2

IntroductionThere is increasing interest in neuroscience, intelligent systems, artificial intelligence, robots, ...Neural networks are just one approach in this area Can we build neural networks the size of the human brain or larger ?

For neuroscience, can we develop systems that emulate the human brain ?For intelligent systems, can we build enormous neural networks (for pattern recognition, nonlinear function approximation, etc) ?

If we can develop them, how long would it take to train them?

After all, it takes about 18 years to train a human...Could these become “conscious” ?

Lyle N. Long 3

Uses of Neural NetworksPattern recognitionFunction approximationScientific classificationControlCognitive models...

For linear applications or linear equations, you don’t really need ANN’s,but for nonlinear applications or equations they are quite valuable. Most things

are nonlinear, and linear theory and linear algebra has been used too long.

Lyle N. Long 4

Neural Networks

Neural Networks

Engineering Applications

Brain Modeling / Simulation

Biological plausibilityAlgorithm efficiency and accuracy

Intelligent Systems Cognitive modeling and Neuroscience

Lyle N. Long 5

Aston Martin DB9

“The 2005 DB9 contains the first onboard neural network in an engine control module. Unlike traditional computer systems that need to be programmed for each step, neural networks are programs modeled on the way human brains learn and adapt. The DB9's module keeps tabs on engine combustion performance with a sophisticated software program that compares actual engine performance to the design specifications.”

http://media.ford.com/newsroom/feature_display.cfm?release=18677

Lyle N. Long 6

Types of ANN’sMulti-layer perceptrons (MLP) SpikingAdaptive Resonance Theory (ART)Recurrent

...

Lyle N. Long 7

Introduction (cont.)Massively parallel computers are approaching power of human brainCan we develop neural networks which work well on these machines?It is well known that artificial neural networks do not scale well

Lyle N. Long 8

Creatures and Technology

from:H. Moravec

IBMBlueGene/L

Lyle N. Long 9

Human Brain vs SupercomputersHuman Brain

~100 billion neurons (1011) (for comparison, a rat cortex has about 30 million)~1000 synapses per neuron (1014 total ) This is roughly 100 terabytes (1014) of data storageIt is also capable of roughly 1000 teraflops (1015 operations per second)

IBM BlueGene/L Computer (DOE) has:65,536 processors ( 1013 transistors? )33 terabytes RAM (1013)137 teraflops (1014 operations per second)

NASA’s Columbia computer has:10,240 Itanium processors (1.5 GHz)10 terabytes RAM (1013) 40 teraflops (1013 operations per second)Theoretically more capable than the brain of a monkey ... and near human capability ...

Lyle N. Long 10

Brains

Lyle N. Long 11

Human Neocortex

• Outer layer of brain, about size and shape of wrinkled napkin.• Neocortex exists primarily in mammals (somewhat in birds and reptiles)• Responsible for Perception, language, imagination, mathematics, art,

music, planning, ... • Six layers and columnar structure• Approximately 30 billion neurons • Size = 50 cm x 50 cm x 2 mm• About 50,000 neurons / mm2 of sheet

from: T. Dean

Lyle N. Long 12

Neocortex Area

Human: 2500 cm2

Monkey: 250 cm2

Cat: 83 cm2

Rat: 6 cm2

Lyle N. Long 13

Neurons: Dendrites and Axons(Information travels thru electrical signals)

(about 3microns)

Lyle N. Long 14

Synapses

Neuron

SynapseNeurotransmitters

In dendrites and axons signals propagate via electrical signals

In synapses, information propagates via chemical-based neuro-transmitters

There are roughly 1014

synapses in human brains

Lyle N. Long 15

Synapses“You are your synapses” (LeDoux)Your memories, emotions, etc. are stored in your synapsesLearning occurs via changes to the synapsesSome of the synapses are set at birth, while others are trainedHuman memory is a product of the synapses, and the process is often referred to as Hebbianlearning (after D.O. Hebb, a Canadian neuroscientist, The Organization of Behavior, Wiley, 1949.)

Lyle N. Long 16

Synapses (cont.)The synapses can be inhibitory or excitatory. Glutamate (excitatory) and GABA (gamma-aminobutyricacid) (inhibitory) are the primary neural transmitters in the synapseThe human ability to hear, remember, fear, or desire are all related to glutamate and GABA in the synapses. Drugs can directly effect synapses:

The drug Valium works by enhancing GABAProzac prevents the removal of serotonin in the synaptic spaceLSD acts on serotonin receptorsCocaine and amphetamines affect norepinephrine and dopamine levels in synapses

Lyle N. Long 17

SynapsesSome Artificial Neural Network’s (ANN) (e.g. spiking NN’s) are more biologically plausible than othersSome ANN’s (e.g. those that use backpropagation) are better suited to engineering applications (e.g. nonlinear function approximation, pattern recognition, nonlinear control, etc.)The complex chemistry that occurs in human synapses are very crudely approximated in ANN’s(e.g. as weights in ANN)

Lyle N. Long 18

ConsciousnessWhat is consciousness?Can a computer become conscious?See book by LeDoux (Synaptic Self):

“Conciousness can be thought of as the product of underlying cognitive processes”“… we are never aware of processing, but only of the consequences of processing.”“You are your synapses.”

See book by Hawkins and BlakesleeYour neocortex is reading this text!

Lyle N. Long 19

ConsciousnessSee book by Dennett (Consciousness Explained):

“Human consciousness is itself a huge complex of memes (or more exactly, meme-effects in brains) that can best be understood as the operation of a “Von Neumannesque” virtual machine implemented in the parallel architecture of a brain that was not designed for any such activities. The powers of this virtual machine vastly enhance the underlying powers of the organic hardware on which it runs...”

Wikipedia.org: “In casual use, the term meme often refers to any piece of information passed from one mind to another.”

Lyle N. Long 20

Moore’s LawFor about 50 years we have seen computer performance double every 18 monthsIntel expects this to continue until at least 2011Number of transistors per chip area has doubled every 18 months:

1974, Intel 8088, 29,000 transistors2000, Intel Pentium 4, 42,000,000 transistors2011, Intel, 20 billion transistors/chip expected (maybe 128 processors per chip)

Lyle N. Long 21

Cycle TimesTypical cycle times in the brain are on the order of 20 milliseconds (2E-2)The IBM BlueGene/L computer uses 700 MHz chips, which corresponds to 1.4 nanoseconds (1.4E-9 seconds)Cognitive neuroscientists are trying to emulate the human brain (and often limit their cycle times to 20 ms)Engineers are trying to build Intelligent Systems and would be very happy to have performance and cycle times better than human brains!

Lyle N. Long 22

IBM BlueGene/L (2005)

65,536 processors33 terabytes RAM (1013)

137 teraflops (1014 operations per second)

http://www.research.ibm.com/journal/rd49-23.html

Lyle N. Long 23

Future ComputersIntel expects the first petaflop (1015

ops/second) to appear by about 2009 – this is human-brain-level computing power (it might cost $150M and require 8 MWatts of power)By 2011 there could be several petaflopcomputersBy 2011 there could be supercomputers several times more powerful than human brains

S. Wheat, AIAA Paper No. 2005-7148, InfoTech@Aerospace Conference, Sept., 2005

Lyle N. Long 24

Very Rough No. of Inputs to Brain

O(104)Hair cells in cochleaO(106)Skin nerve endingsO(108)Retinal rodsO(106)Retinal conesO(106)Olfactory Receptor CellsO(104)Taste budsO(106)Fibers in Optic Nerve

NumberItem

O(104)Hair cells in cochleaO(106)Skin nerve endingsO(108)Retinal rodsO(106)Retinal conesO(106)Olfactory Receptor CellsO(104)Taste budsO(106)Fibers in Optic Nerve

NumberItem

Lyle N. Long 25

Artificial Neural NetworksArtificial Actual

Crude Model:Inputs ~ Dendrites

Outputs ~ AxonsWeights ~ Synapses

Lyle N. Long 26

Forward propagation

j i ijx yW=∑( )j jy f x=

2( ) 11 jx

f xe−

= −+

20.5 ( )i iE y d= −∑

y1

y2

y3

y4

w1

w2

w3

w4

xj

(sigmoid fctn.)

(d is desired output)

e

y = f(xj)

A Single Artificial Neuron

The weights (wj) store the “knowledge”and need to be trained

Lyle N. Long 27

Neural Network

7-3-3 Network:No. of weights = 3 * 7 + 3 * 3 = 30

Imagine a huge network, e.g. 1,000,000 – 100 – 10:No. of weights = 100,001,000 (~ 1 gigabyte)And all neurons are usually connected to all neurons in each layer

Lyle N. Long 28

Backward propagation

' ' ''(1 ) ( )ij ij j i ij ijw w e x w wα η α= + − + −

(1 )( )j j j j je y y d y= − −

)1.0..()5.0..(

geratelearninggefactormomentum

==

ηα

Given a sets of inputs and corresponding outputs, we use backpropagationto adjust the weights (to learn the dataset)

Lyle N. Long 29

Scalable Character Test Set(4 output values)

As a small example, lets assume we want a ANN to be able to detect which of the following four letters (A, B, C, or D) are being displayed. And lets represent each letter using 15 pixels (3x5).

1 0

Lyle N. Long 30

Digitizing Character Pixels (an “A”)

= 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1

Note: you don’t have to use just 0 or 1, you could use 0.0 to 1.0 (ie shades of grey) or colors

Lyle N. Long 31

Small Character Test Set Example(4 output values)

15 Inputs (3x5 pixels)

4 outputs (A, B, C, and D)

75 weights(i.e. 15x5)

(Note:95 weights =380 bytes)

20 weights(i.e. 5x4)

Lyle N. Long 32

ANN Training ProcessSet all weights to random valuesUsing a large training set of data, show the network one example at a time (where we know what the output should be) and adjust the weights each time using forward and backward propagation

Lyle N. Long 33

Example: Training for An “A”

1 0 0 0

0 1 0 1 0 1 1 1 1 1 0 1 1 0 1

FeedForward(determineoutput values ofthis input)

Back Propagation(adjust weights to reduce error

Lyle N. Long 34

Example: Use of Network after Training

0.85 0.19 0.23 0.54

Given Some Inputs

Since output 1 islarge, this is likely to be an “A” ?

Lyle N. Long 35

Scalable Character Test Set(48 output values)

Inputs = 15 pixels

Outputs = 48 chars

If we use 10 hidden layers, then:

Weights = = 48*10 + 10*15= 630

(2520 bytes)

Lyle N. Long 36

How Long to Train? How Large Should Training Set Be?

Very difficult to say ... depends on the problemNumber of Hidden layers to use?

Log2 (no. of inputs) ?Average of no. of inputs and outputs ?

How many hidden layers?Need at least one to capture nonlinear effectsPeople seldom use more than oneHuman brain as roughly six layers in neocortex

Need to avoid overtraining and undertraining of network

Lyle N. Long 37

Training an ANN

00.5

1

1.52

2.53

3.54

0 1000 2000 3000Iterations

Erro

r

Lyle N. Long 38

Recurrent ANN

Feedback is used in human brain. Recurrent ANN’s are very valuable for time-series applications.

Lyle N. Long 39

Our Parallel ANN Approach

Inputs feed into each column

But in hidden layers not all neurons are connected to all other neurons

Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,”AIAA Paper No. 2005-7168, Sept. 2005.

http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf

Lyle N. Long 40

Object Oriented (C++)Artificial Neural Network (ANN)

Serial code: C++ Parallel code: C++ combined with MPI

Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA Paper No. 2005-7168, Sept. 2005.http://www.personal.psu.edu//lnl/papers/aiaa20057168.pdf

Lyle N. Long 41

Serial ANN Training Time

Table 1. Training time required for the serial ANN

Resolution Total Weights Iterations Training time (sec)

1 1890 37824 5.4 2 3240 64800 13.1 4 8640 172800 76 5 12690 253776 157 8 30240 604800 842 9 37890 757776 1313


Lyle N. Long 42

ANN Training (serial code)

0

20

40

60

80

100

120

0 5000 10000 15000 20000

No. of trainings

Per

cent

cor

rect

5184 weights

27072 weights

2016 weights


Lyle N. Long 43

ANN Training (parallel code)


Lyle N. Long 44

Training Time (scaling processors with weights)

200

300

400

500

600

700

800

0 100 200 300 400 500 600Number of Processors

Tim

e(se

c)

1.0E+05

5.1E+06

1.0E+07

1.5E+07

2.0E+07

2.5E+07

3.0E+07

Wei

ghts

Training time(sec)Weights


Lyle N. Long 45

Forward Propagation Time (scaling processors with weights)

0.00E+00

5.00E-05

1.00E-04

1.50E-04

2.00E-04

2.50E-04

3.00E-04

3.50E-04

4.00E-04

4.50E-04

5.00E-04

0 100 200 300 400 500 600Number of Processors

Tim

e(se

c)

1.0E+05

5.1E+06

1.0E+07

1.5E+07

2.0E+07

2.5E+07

3.0E+07

Wei

ghts

Forward propagationtime(sec)Weights


Lyle N. Long 46

Large ANN Cases

Processors Inputs Neurons Neurons Per Hidden Layer

Weights Percent Correct

MemoryUsed (GB)

CPU Time (sec.)

16 37,500 1584 256 9,613,376 100 % 0.08 246 64 150,000 6272 1024 153,652,480 100 % 1.20 2489

500 600,000 25,000 4000 2,400,106,384 89 % 19.0 6238

~10 times fewer synapses than rat. Trained in under two hours.


Lyle N. Long 47

ConclusionsANN’s are widely used in practical applicationsLarge ANN’s will be quite interestingWe have developed parallel object-oriented C++ software to simulate very large neural networksMore tests are needed, but the performance results are very promising

Lyle N. Long 48

References1. Long, L.N. and Gupta, A., “Scalable Massively Parallel Artificial Neural Networks,” AIAA

Paper No. 2005-7168, Sept. 2005.2. Mitchell, T. M., Machine Learning, McGraw-Hill, NY, 1997.3. Rumelhart, D.E. and McClelland, J.L., Parallel Distributed Processing, MIT Press, 1986.4. LeDoux, J., Synaptic Self, Penguin, New York, 2002.5. Hawkins, J., and Blakeslee, S., On Intelligence, Times Books, New York, 2004.6. Dean, T., “A Computational Model of the Cerebral Cortex”, AAAI Conference, Pittsburgh,

20057. Mountcastle, V.B., “Introduction to the special issue on computation in cortical columns,”

Cerebral Cortex, Vol. 13, No. 1, 2003.8. Mumford, D., “On the computational architecture of the neocortex II: The role of cortico-

cortical loops”, Biological Cybernetics, Vol. 66, 1992.9. Dennett, D.C., Consciousness Explained, Back Bay, Boston, 1991.10. Kurzweil, R., The Age of Spiritual Machines, Penguin, NY, 1999.11. Moravec, H., Robot: mere machine to transcendent mind, Oxford University Press, November

1998.12. Gerstner, W. And Kistler, W.M., Spiking Neuron Models, Cambridge Univ. Press, Cambridge,

2002.13. http://www.nas.nasa.gov/Resources/Systems/columbia.html14. Haykin, S., Neural Networks: A Comprehensive Foundation, 2nd Ed., Prentice-Hall, 1999.15. Werbos, P.J., Backpropagation: Basics and New Development, The Handbook of Brain

Theory and Neural Networks, MIT Press, First Edition, 1995.

Lyle N. Long 49

Thank You. Questions?

Lyle N. Long

[email protected]://www.personal.psu.edu/lnl

Documents

Artificial Neural Networks - Computational Science - Home