Upload
ngokhanh
View
221
Download
3
Embed Size (px)
Citation preview
From Neurons to Neural Networks
Jeff KnisleyEast Tennessee State University
Mathematics of Molecular and Cellular Biology Seminar
Institute for Mathematics and its ApplicationsInstitute for Mathematics and its Applications, April 2, 2008
Outline of the TalkOutline of the Talk
Brief Description of the NeuronBrief Description of the NeuronA “Hot-Spot” Dendritic Model
Cl i l H d ki H l (HH) M d lClassical Hodgkin-Huxley (HH) ModelA Recent Approach to HH Nonlinearity
Artificial Neural Nets (ANN’s)1957 – 1969: Perceptron Models1980’s – soon: MLP’s and Others1990’s – : Neuromimetic (Spiking) Neurons1990 s : Neuromimetic (Spiking) Neurons
Components of a NeuronComponents of a Neuron
Soma AxonSynaptic Terminals
Dendrites nucleus
Myelin Sheaths
Pre-Synaptic to Post-SynapticPre Synaptic to Post Synaptic
If threshold exceededIf threshold exceeded,then neuron “fires,” sending a signal l ialong its axon.
Signal Propagation along AxonSignal Propagation along Axon
Signal is electricalgMembrane depolarization from resting -70 mVMyelin acts as an insulator
Propagation is electro-chemicalSodium channels open at breaks in myelin
SMuch higher external Sodium ion concentrationsPotassium ions “work against” sodiumChloride, other influences also very importantChloride, other influences also very important
Rapid depolarization at these breaksSignal travels faster than if only electrical
Signal Propagation along AxonSignal Propagation along Axon
reversal+++- - -
reversalreversal+++- - - +++- - -reversal
- - -+++
reversal
Action PotentialsAction Potentials
Sodium ion channels open and closeWhich causes
Potassium ion channels to open and close
Action PotentialsAction Potentials
Model “Spike”Model Spike
Actual Spike TrainActual Spike Train
Post-Synaptic may be SubThresholdPost Synaptic may be SubThreshold
Signals Decay at Soma if below a Certain threshold
M d l b iModels beginwith section of a dendrite.
Derivation of the ModelDerivation of the Model
Some AssumptionsSome AssumptionsAssume Neuron separates R3 into 3 regions—interior (i), exterior (e), and boundary membrane surface (m)Assume El is electric field and Bl is magnetic flux density, where l = e, i
Maxwell’s Equations:Maxwell s Equations: Assume magnetic induction is negligible
∂B 0ll t
−∂∇× = =
∂BE
Ee = – ∇Ve and Ei = – ∇Vi for potentials Vl , l = i,e
Current Densities ji and jCurrent Densities ji and je
Let σl = conductivity 2-tensor, l = i eLet σl conductivity 2 tensor, l i, eIntracellular homogeneous; small radiusExtracellular: Ion Populations!Extracellular: Ion Populations!
Ohm’s Law (local): lll Ej σ=j
Charges (ions) collecton outside of boundarysurface (especially Na+)
ji
je surface (especially Na )
h bme I∝⋅∇ j
+ + + +
L0where Im = membranecurrents. Thus,
( ) IV ∝∇∇ σ0=⋅∇ ij 02 =∇ iV
( ) mee IV ∝∇⋅∇ σ
Assume: Circular Cross-sectionsAssume: Circular Cross sectionsLet V = Vi – Ve – Vrest be membrane potential difference, and let i e rest
Rm, Ri , C be the membrane resistance, intracellular resistance, membrane capacitance, respectively. Let Isyn be a “catch all” for ion channel activityfor ion channel activity.
ionm
m ItVC
RVI +
∂∂
+=
d
d V V VC I⎛ ⎞∂ ∂ ∂
= + +⎜ ⎟Lord Kelvin: I( ) ( )4 syni m
C Ix R x x R x t
= + +⎜ ⎟∂ ∂ ∂⎝ ⎠Cable EquationIion
Dimensionless CablesR d
Let and let and τm= RmC constant4
m
i
R dx XR
=
2
2 m m synV V V R I
X tτ∂ ∂
− − =∂ ∂
IionX t∂ ∂
Tapered Cylinders: Z instead of X and a taper constant K.
2
2 m m synV V VK V R I
Z Zτ∂ ∂ ∂
+ − − =∂ ∂ ∂
Iion2 m m synZ Z t∂ ∂ ∂ ion
Rall’s Theorem for UntaperedRall s Theorem for Untapered
d htIf at each branching the parent diameter and the daughter cylinder
parentdaughters
diameters satisfy
3/2 3/2d d∑3/2 3/2parent j
j daughtersd d
∈
= ∑ Equivalent Cylinder
then the dendritic tree can be reduced to a single equivalent cylinder. g q y
Dendritic ModelsDendritic Models
SomaSoma
Full Arbor ModelTapered Equivalent Cylinder
Tapered Equivalent CylinderTapered Equivalent Cylinder
Rall’s theorem (modified for taper) allows us to collapse to an equivalent cylinder
“Assume “hot spots” at x0, x1, …, xm
Soma
. . .
0 x0 x1 . . . xm l
Ion Channel Hot SpotsIon Channel Hot Spots
(Poznanski) Ij due to ion channel(s) at the jth hot spot
( ) ( )2 nR d V V∂ ∂ ∑ ( ) ( )2
14m
m j jji
R d V VR C V I t x xR x t
δ=
∂ ∂− − = −
∂ ∂ ∑
Green’s function G(x, xj, t) is solution to hot spot equation for Ij as a point source and others = 0
Plus boundary conditions and Initial conditionsGreen is solution to Equivalent Cylinder model
Equivalent Cylinder Model (Ii = 0)For Tapered Equivalent Cylinder
M d l ti i f th fEquivalent Cylinder Model (Iion 0)
2V V∂ ∂
Model, equation is of the form
( ) 02 ∂∂
+∂ VVVZFV τ
2 0mV V V
X tV
τ∂ ∂− − =
∂ ∂∂
( ) 02 =−∂
−∂
+∂
VtZ
ZFZ mτ
( ), 0 ( )V L t no current through endX∂
=∂
( ) ( ) ( ) ( )tanh 0,0, 0, s
L V tV t V tX t
τρ
⎛ ⎞∂∂= +⎜ ⎟∂ ∂⎝ ⎠
Soma: V (0,t) = Vclamp (voltage clamp)
( ),0 . .
X t
V x Steady State from const curr
ρ∂ ∂⎝ ⎠=
PropertiesProperties
Spectrum is solely non-negative eigenvaluesp y g gEigenvectors are orthogonal in Voltage ClampEigenvectors are not orthogonal in original
Solutions are multi-exponential decays
( ) ( )∑∞
−= /, tk
keXCtXV τ
Linear Models useful for subthreshold activation
( ) ( )∑=1
,k
k eXCtXV
Linear Models useful for subthreshold activation assuming nonlinearities (Iion) are not arbitrarily close to soma (and no electric field (ephaptic) effects)
Somatic Voltage RecordingSomatic Voltage Recording
Saturate to Steady State
Experimental ArtifactMultiExponential Decayp y
Ionic Channel Effects
0 10ms
Hodgkin-Huxley: Ionic CurrentsHodgkin Huxley: Ionic Currents
1963 Nobel Prize in Medicine Cable Equation plus Ionic Currents (Isyn)sy
From Numerous Voltage Clamp Experiments with squid giant axon (0.5-1.0 mm in diameter)Produces Action Potentials
Ionic Channelsn = potassium activation variablem = sodium activation variableh = sodium inactivation variable
Hodgkin-Huxley EquationsHodgkin Huxley Equations
( ) ( ) ( )2
4 3d V V∂ ∂ ( ) ( ) ( )
( ) ( )
4 324
1 1
m l K NaK Nai
d V VC g V V g n V V g m h V VR x t
n mβ β
∂ ∂− − − = − + −
∂ ∂
∂ ∂( ) ( )
( )
1 , 1 ,
1
n n m mn mn n m mt t
h h h
α β α β
α β
∂ ∂= − − = − −
∂ ∂∂
= ( )1h hh ht
α β= − −∂
where any V with subscript is constant, any g with a bar is d h f h ’ d β’ f i il fconstant, and each of the α’s and β’s are of similar form:
( ) ( ) /8010 1 VVV V eα β −−= =( ) ( ) ( )10 /10
,8100 1
n nVV V e
eα β
−= =
⎡ ⎤−⎣ ⎦
HH combined with “Hot Spots”HH combined with Hot Spots
The solution to the equiv cylinder with hotspots is
( ) ( ) ( )n t
V t V G t I d+∑∫where Ij is the restriction of V to jth “hot spot”.
( ) ( ) ( )0
0, , ,initial j j
jV x t V G x x t I dτ τ τ
=
= + −∑∫j j p
At a hot-spot, V satisfies ODE of the form
( ) ( ) ( )4 3VC V V V V h V V∂
where m n and h are functions of V
( ) ( ) ( )4 3m l K NaK NaC g V V g n V V g m h V V
t= − + − + −
∂where m, n, and h are functions of V.
Brief description of an Approach to HH ion channel nonlinearities
Goal: Accessible Approximations that still produce action potentials.Can be addressed using Linear Embedding which isCan be addressed using Linear Embedding, which is closely related to the method of Turning Variables.
Maps an finite degree polynomially nonlinear dynamical system into an infinite degree linear systemsystem into an infinite degree linear system.The result is an infinite dimensional linear system which is as unmanageable as the original nonlinear equation.
N l t ith ti f i lNon-normal operators with continua of eigenvaluesDifficult to project back to nonlinear system (convergence and stability are thorny)
B t till th h h l ( ti t ti l )But still the approach has some value (action potentials).
The Hot-Spot Model “Qualitatively”The Hot Spot Model Qualitatively
n
( ) ( ) ( )0
00, 0, ,
n t
j jV t G x t I dτ τ τ= −∑∫0j=
Inputs fromOther Neurons
and ion channels
From Subthreshold (Rall Eq. Cyl or
Full Arbor) and ion channels
Key Features: Summation of Synaptic Inputs. If V(0,t)
Full Arbor)
is large, action potential travels down axon.
Artificial Neural Network (ANN)Artificial Neural Network (ANN)
Made of artificial neurons, each of whichMade of artificial neurons, each of whichSums inputs xi from other neuronsCompares sum to thresholdCompares sum to thresholdSends signal to other neurons if above threshold
Synapses have weightsModel relative ion collectionsModel efficacy (strength) of synapse
Artificial NeuronArtificial Neuron
th thi jw synaptic weight betweeni and j neuron=
thj threshold of j neuronθ =( ) " "firing function that maps state to outputσ =j f j
1x2x 1iw
2iwNonlinear firing function
( )i i ix sσ θ= −i ij js w x= Σ3x 2i
3iww
..
.
nx inw
First Generation: 1957 - 1969First Generation: 1957 1969
Best Understood in terms of ClassifiersBest Understood in terms of ClassifiersPartition a data space into regions containing data points of the same classification. The regions are predictions of the classification of new data points.
Simple Perceptron ModelSimple Perceptron Model
Given 2 classes – Reference and SampleGiven 2 classes Reference and Sample
⎧w1
⎩⎨⎧
=referencefromifsamplefromif
Output01w2
w
Firing function (activation function) has only two
wn
Firing function (activation function) has only two values, 0 or 1. “Learning” is by incremental updating of weights g y p g gusing a linear learning rule
Perceptron LimitationsPerceptron Limitations
Cannot Do XOR (1969, Minsky and Papert)Cannot Do XOR (1969, Minsky and Papert)Data must be linearly separable
1970’ ANN’ “Wild E i ” l1970’s: ANN’s “Wilderness Experience” – only a handful working and very “un-neuron-like”
Support Vector Machine: Perceptron on a Feature Space
Data is projected into a high-dimensionalData is projected into a high dimensional Feature Space, separated with a hyperplane
Choice of Feature Space (kernel) is key. p ( ) yPredictions based on location of hyperplane
Second Generation: 1981 - SoonSecond Generation: 1981 Soon
Big Ideas from other FieldsgJ. J. Hopfield compares neural networks to Ising Spin Glass models. Uses statistical Mechanics to prove that ANN’s minimize aMechanics to prove that ANN’s minimize a total energy functional.Cognitive Psychology provides new insights g y gy p ginto how neural networks learn.
Big Ideas from MathKolmogorov’s Theorem
AND
Firing Functions are SigmoidalFiring Functions are Sigmoidal
jκjκ
j
θ jθ
( ) 1sσ θ =( ) ( )1 j j jj j s
se κ θ
σ θ− −
− =+
3 Layer Neural Network3 Layer Neural Network
The output layer mayconsist of a single neuron
Output
Hidden
Input
Hidden(is usually much larger)
Multilayer Network
( )1 1 1tσ θ ξ− =w x( )
1x2x 1α
2αN
ξ∑3x...
...
2α
α1
j jj
out α ξ=
=∑
nx
( )tσ θ ξ− =w x
Nα
( )N N Nσ θ ξ=w x
( )N
tout α σ θ= −∑ w x( )1
j j jj
out α σ θ=
= −∑ w x
Hilbert’s Thirteenth ProblemHilbert s Thirteenth Problem
Original: “Are there continuous functions of 3Original: Are there continuous functions of 3 variables that are not representable by a superposition of composition of functions of 2 variables?”Modern: Can a continuous function of nvariables on a bounded domain of n-space be written as sums of compositions of functions of 1 variable?of 1 variable?
Kolmogorov’s TheoremKolmogorov s Theorem
Modified Version: Any continuous function fModified Version: Any continuous function f of n variables can be written
2 1n n+ ⎛ ⎞
where only h and w’s depend on f
( ) ( )2 1
11 1
, ,n n
n ij j ij i
f s s h g sω+
= =
⎛ ⎞= ⎜ ⎟
⎝ ⎠∑ ∑K
where only h and w s depend on f
(That is the g’s are fixed)(That is, the g s are fixed)
Cybenko (1989)Cybenko (1989)
Let σ be any continuous sigmoidal function, y g ,and let x = (x1,…,xn). If f is absolutely integrable over the n-dimensional unit cube, then for all ε>0,there exists a (possibly very large ) integer N andvectors w1,…,wN such that
( ) ( )1
NT
j j jj
f α σ θ ε=
− − <∑x w x
where α1,…,αN and θ1,…,θN are fixed parameters.
Multilayer Network (MLP’s)
( )1 1 1tσ θ ξ− =w x( )
1x2x 1α
2αN
ξ∑3x...
...
2α
α1
j jj
out α ξ=
=∑
nx
( )tσ θ ξ− =w x
Nα
( )N N Nσ θ ξ=w x
( )N
tout α σ θ= −∑ w x( )1
j j jj
out α σ θ=
= −∑ w x
ANN as a Universal ClassifierANN as a Universal Classifier
Designs a function f : Data -> ClassesDesigns a function f : Data ClassesExample: f ( Red ) = 1, f ( Blue) = 0Support of f defines the regionsSupport of f defines the regions
Data is used to train (i.e., design ) function fsupp(f)supp(f)
Example – Predicting Trees that are or are not RNA-like
D d-t d-a d-L d-D Lamb-2 E-ratio Randics
0.333333 0.666667 0.666667 0.5 0.666667 0.2679 0.8 2.914214
0.333333 0.5 0.5 0.5 0.666667 0.3249 1 2.770056
0.5 0.5 0.5 0.5 0.5 0.382 1 2.80806
0 166667 0 333333 0 5 0 833333 0 833333 1 2 2 236068
RNALike
0.166667 0.333333 0.5 0.833333 0.833333 1 2 2.236068
0.333333 0.333333 0.333333 0.666667 0.666667 0.4384 1.2 2.642734
0 333333 0 333333 0 333333 0 666667 0 666667 0 4859 1 4 2 56066
NotRNA 0.333333 0.333333 0.333333 0.666667 0.666667 0.4859 1.4 2.56066RNALike
Construct Graphical InvariantsConstruct Graphical InvariantsTrain ANN using known RNA-treesPredict the othersPredict the others
2nd Generation: Phenomenal Success2 Generation: Phenomenal Success
Data Mining of Micro-array dataData Mining of Micro array dataStock and commodities trading: ANN’s are an important part of “computerized trading” p p p gPost office mail sorting
This tiny 3-Dimensional Artificial Neural Network, modeled after neural networksmodeled after neural networksin the human brain, is helpingmachines better visualize their surroundings.
The Mars RoversThe Mars RoversANN decides between “rough” and “smooth”
“rough” and “smooth”areare ambiguousLearningvia manyvia many“examples”
A d l t kAnd a neural network can lose up to 10% of its neurons without significant loss in performance!
ANN LimitationsANN Limitations
Overfitting: e.g, if Training Set is “unbalanced”g g, g
Overfitting may
Produce
Mislabeled data can lead to slow (or no)
ProduceIsolatedRegions
Mislabeled data can lead to slow (or no) convergence or incorrect results.Hard Margins: No “fuzzing” of the boundaryHard Margins: No fuzzing of the boundary
Problems on the HorizonProblems on the Horizon
Limitations are becoming very limitingLimitations are becoming very limitingTrained networks often are poor learners (and self-learners are hard to train))In real neural networks, more neurons imply better networks (not so in ANNs ).Temporal data is problematic – ANN’s have no concept or a poor concept of time
“H b idi d ANN’ ” b i th l“Hybridized ANN’s” becoming the ruleSVM’s probably the tool of choice at presentSOFM’ F ANN’ C ti iSOFM’s, Fuzzy ANN’s, Connectionism
Third Generation: 1997 -Third Generation: 1997
Back to Bio: Spiking Neural Networks (SNN)Back to Bio: Spiking Neural Networks (SNN)Asynchronous, action-potential driven ANN’s have been around for some time.SNN’s show “promise” but results beyond current ANN’s have been elusive
Simulating actual HH equations (neuromimetic) has to date not been enoughTime is both a promise and a curseTime is both a promise and a curse
A Possible Approach: Use current dendritic models to modify existing ANN’s. y g
ANN’s with Multiple Time ScalesANN s with Multiple Time Scales
SNN that reduces to ANN & preserves Kolmogorov Thmp gThe solution to the equiv cylinder with hotspots is
( ) ( ) ( )n t
∑∫f th “ ”
( ) ( ) ( )0
0
0, 0, ,t
initial j jj
V t V G x t I dτ τ τ=
= + −∑∫where Ij is the restriction of V to jth “hot spot”.Equivalent Artificial Neuron:
( ) ( ) ( )∑∫≠
−=ij
t
jji dxtts τττω0
Incorporating MultiExponentialsIncorporating MultiExponentials
G (0,x,t) is often a multi-exponential decay.( , , ) p yIn terms of time constants τk
∫∞
⎞⎛n t ( )∑ ∫∑=
−
=
⎟⎠⎞⎜
⎝⎛=
j
t
jut
jkk
duxeews kk
10
//
1
τττ
wjk are synaptic “weights” τk from electrotonic and morphometric datak p
Rate of taper, Length of dendritesBranching, capacitance, resistance
Approximation and SimplificationApproximation and Simplification
If xj(u) approx 1 or xj(u) approx 0, thenj( ) pp j( ) pp ,
( ) ( )txews j
nt
kjkk∑∑ −
∞
−= /1 ττ
A Special Case (k is a constant)j k= =1 1
( )( ) j
n
j
ktjj xepws ∑
=
−−+=1
1
t = 0 yields the standard Neural Net ModelStandard Neural Net as initial Steady State
j
Modify with time-dependent transient
Artificial NeuronArtificial Neuron
thj threshold of j neuronθ =
wij, pij = synaptic weights
( ) " "firing function that maps state to outputσ =j f j
1x2x
Nonlinear firing functionwi1, pi1
( )i i ix sσ θ= −3x...
( )( ) j
n
j
ktijiji xepws ∑
=
−−+=1
1
w pnx win, pin
Steady State and TransientSteady State and Transient
Sensitivity and Soft Marginsy gt = 0 is a perceptron with weights wijt = ∞ is a perceptron with weights wij + pijj jFor all t in (0, ∞), a traditional ANN with weights between wij and wij + pij
Transient is a perturbation schemeTransient is a perturbation schemeMany predictions over time (soft margins)
AlgorithmgPartition training set into subsetsTrain at t=0 for initial subsetTrain at t > 0 values for other subsets
Training the NetworkTraining the Network
Define an energy functionDefine an energy function
( )∑ −−
=n
iiiE 21 πξα
π vectors are the information to be “learned”
( )∑=i
iiiE12
πξα
π vectors are the information to be learnedNeural networks minimize energyThe “information” in the network isThe information in the network is equivalent to the minima of the total squared energy functionenergy function
Back PropagationBack Propagation
Minimize Energy E E∂ ∂gy
Choose wj and αj so thatIn practice, this is hard
0, 0ij j
E Ew α∂ ∂
= =∂ ∂
Back Propagation with cont. sigmoidalFeed Forward, Calculate E, modify weights
( )( )−−=+=n
jjjjjjjnewj
newj yyy 1, πδξλδαα
Repeat until E is sufficiently close to 0
( )∑=
−+=n
jjjkjjjj
newj
newj xww
1
δαπξξλ
Repeat until E is sufficiently close to 0
Back Propagation with TransientBack Propagation with Transient
Train Network Initially (choose wj and αj)y ( j j)Each “synapse” given a transient weight pij
−ktnewnew )1(ξλδ
( )∑−
−
−−+=
−+=n
jjkjjjkt
jnew
joutputnew
jhidden
ktjj
newjoutput
newjoutput
expp
epp
,,
,,
)1(
),1(
δαπξξλ
ξλδ
Algorithm Addressing Over-fitting/SensitivityW i ht t b i d i iti l l
( )∑=j
jjkjjjjjoutputjhidden1
,, )( ξξ
Weights must be given random initial valuesWeights pij also given random initial valuesSeparate Training of w and α and pSeparate Training of wj and αj and pijameliorates over-fitting during the training sequence
Observations/ResultsObservations/Results
Spiking does occurSpiking does occurBut only if network is properly “initiated”Spikes only resemble Action PotentialsSpikes only resemble Action Potentials
This is one approach to SNN’sNot likely to be the final wordNot likely to be the final wordOther real neuron features may be necessary (e.g., tapering axons can limit frequency of action potentials: also—branching! )
This approach does show promise in handling temporal information
Any Questions?
Thank you!a you