14
IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013 115 Biologically Inspired SNN for Robot Control Eric Nichols, Liam J. McDaid, and Nazmul Siddique, Senior Member, IEEE Abstract—This paper proposes a spiking-neural-network-based robot controller inspired by the control structures of biological systems. Information is routed through the network using fa- cilitating dynamic synapses with short-term plasticity. Learning occurs through long-term synaptic plasticity which is implemented using the temporal difference learning rule to enable the robot to learn to associate the correct movement with the appropriate input conditions. The network self-organizes to provide memories of environments that the robot encounters. A Pioneer robot simu- lator with laser and sonar proximity sensors is used to verify the performance of the network with a wall-following task, and the results are presented. Index Terms—Dynamic synapses, self-organization, spiking neural network (SNN), temporal difference (TD) learning rule. I. I NTRODUCTION M ANY MOBILE robot applications require the capa- bility to maneuver in complex environments, and to this end, robots need to perceive and respond to events in their environment. Biological organisms perform such tasks remarkably well, and so, researchers have focused on the development of models based on biological nervous systems. Due to their functional similarity to biological neurons, spiking neural networks (SNNs) can simulate elementary processes in the brain, including neural information processing, plasticity, and learning. SNNs provide a biologically inspired way of manipulating data for different sensory modalities and com- putations [1] where it has been shown that the timing of spikes is significant in neuronal information processing [2]. By mimicking the chemical and electrical functions of the peripheral and central nervous systems, SNNs can provide more biologically realistic behaviors than could be achieved through other methods, and it is for this reason that an SNN is used in this paper. The spiking nature of the neurons that convey infor- mation by individual spike is computationally more powerful than sigmoidal neural networks, and hence, SNNs have found a broad range of applications such as signal processing, real- world data classification, speech recognition, spatial navigation, trajectory tracking, path planning, decision making and action selection, or motor control. The Blue Brain Project [3] cites that a better understanding of the brain from the molecular level to the emergence of Manuscript received June 22, 2011; revised January 5, 2012 and April 7, 2012; accepted April 28, 2012. Date of publication June 18, 2012; date of current version January 11, 2013. This work was supported by the Intelligent Systems Research Centre, University of Ulster Magee Campus. This paper was recommended by Associate Editor C.-M. Lin. The authors are with the Intelligent Systems Research Centre, University of Ulster Magee Campus, BT48 J7L Derry, U.K. (e-mail: ericjnichols@ gmail.com; [email protected]; [email protected]). Digital Object Identifier 10.1109/TSMCB.2012.2200674 intelligence can be understood by testing hypotheses on a computer simulation. The Cortex Project [4] is using SNNs in an attempt to understand how complex high-level attributes can emerge from the low-level components of our brains. Hopfield reported an interesting model of a hippocampus-like network that can learn mental maps of the environment and which enables movement planning within a given environment [5]. The network consists of a set of place cells with all-to-all excitatory connections. Spike-timing-dependent potentiation is used to strengthen connections between cells activated in a close temporal proximity. This mechanism is subsequently used for path planning. Several models of spiking neurocontrollers have been proposed for the trajectory tracking and set point control tasks. Di Paolo [6] applied an evolutionary approach to train an SNN controller in a mobile robot navigation task by evolving only the plasticity models and the time properties of each neuron. It demonstrated that SNN controllers with spike-timing-dependent plasticity (STDP) rules were able to reach a stable state more rapidly and reliably than the rate- based counterparts under the same conditions. The research in this field is still at the embryonic stage with many elements not understood. Examples of the incomplete understanding of the brain include the belief that there might be more than 100 different neurotransmitters that exist in the brain and the function of many is not fully understood [7]. There may also be quantum effects in the synapses [8] and neurons [9], [10] in the brain, but to what extent is not known. SNNs have the ability to learn and make decisions despite the incomplete understanding of neurological functionality, and to this end, wall-following applications using SNN controllers have been reported. Hybrid systems have been published where a genetic algorithm (GA) based on the evolutionary traits of a species modifies the properties of an SNN. A wall-following SNN with synaptic values that are modified by a GA was created in [11]. A better response was reported using the SNN than was obtained by using a fuzzy logic (FL) controller, and the SNN also outperformed the use of a standard GA controller. A GA was also used in another publication [12] to optimize the connectivity from two neurons in one layer into ten neurons in the following layer for the purpose of robotic movement without colliding into a wall. A simulated robot was used, and the hybrid SNN had better performance than using a GA on its own. A hybrid SNN and FL controller was implemented in [13] where a simulated robot learned behaviors based on an FL rule set and the behaviors were implemented using spiking neurons. Experiments demonstrated successful obstacle avoidance and target-following behaviors. Hybrid systems offer a temporal benefit to the slow process of SNNs in software, as they offer a faster pace of synaptic modification, connectivity, and learned behaviors. Although 2168-2267/$31.00 © 2012 IEEE

Biologically Inspired SNN for Robot Control

  • Upload
    nazmul

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Biologically Inspired SNN for Robot Control

IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013 115

Biologically Inspired SNN for Robot ControlEric Nichols, Liam J. McDaid, and Nazmul Siddique, Senior Member, IEEE

Abstract—This paper proposes a spiking-neural-network-basedrobot controller inspired by the control structures of biologicalsystems. Information is routed through the network using fa-cilitating dynamic synapses with short-term plasticity. Learningoccurs through long-term synaptic plasticity which is implementedusing the temporal difference learning rule to enable the robotto learn to associate the correct movement with the appropriateinput conditions. The network self-organizes to provide memoriesof environments that the robot encounters. A Pioneer robot simu-lator with laser and sonar proximity sensors is used to verify theperformance of the network with a wall-following task, and theresults are presented.

Index Terms—Dynamic synapses, self-organization, spikingneural network (SNN), temporal difference (TD) learning rule.

I. INTRODUCTION

MANY MOBILE robot applications require the capa-bility to maneuver in complex environments, and to

this end, robots need to perceive and respond to events intheir environment. Biological organisms perform such tasksremarkably well, and so, researchers have focused on thedevelopment of models based on biological nervous systems.Due to their functional similarity to biological neurons, spikingneural networks (SNNs) can simulate elementary processes inthe brain, including neural information processing, plasticity,and learning. SNNs provide a biologically inspired way ofmanipulating data for different sensory modalities and com-putations [1] where it has been shown that the timing ofspikes is significant in neuronal information processing [2].By mimicking the chemical and electrical functions of theperipheral and central nervous systems, SNNs can provide morebiologically realistic behaviors than could be achieved throughother methods, and it is for this reason that an SNN is used inthis paper. The spiking nature of the neurons that convey infor-mation by individual spike is computationally more powerfulthan sigmoidal neural networks, and hence, SNNs have founda broad range of applications such as signal processing, real-world data classification, speech recognition, spatial navigation,trajectory tracking, path planning, decision making and actionselection, or motor control.

The Blue Brain Project [3] cites that a better understandingof the brain from the molecular level to the emergence of

Manuscript received June 22, 2011; revised January 5, 2012 and April 7,2012; accepted April 28, 2012. Date of publication June 18, 2012; date ofcurrent version January 11, 2013. This work was supported by the IntelligentSystems Research Centre, University of Ulster Magee Campus. This paper wasrecommended by Associate Editor C.-M. Lin.

The authors are with the Intelligent Systems Research Centre, Universityof Ulster Magee Campus, BT48 J7L Derry, U.K. (e-mail: [email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSMCB.2012.2200674

intelligence can be understood by testing hypotheses on acomputer simulation. The Cortex Project [4] is using SNNs inan attempt to understand how complex high-level attributes canemerge from the low-level components of our brains. Hopfieldreported an interesting model of a hippocampus-like networkthat can learn mental maps of the environment and whichenables movement planning within a given environment [5].The network consists of a set of place cells with all-to-allexcitatory connections. Spike-timing-dependent potentiation isused to strengthen connections between cells activated in aclose temporal proximity. This mechanism is subsequently usedfor path planning. Several models of spiking neurocontrollershave been proposed for the trajectory tracking and set pointcontrol tasks. Di Paolo [6] applied an evolutionary approachto train an SNN controller in a mobile robot navigation taskby evolving only the plasticity models and the time propertiesof each neuron. It demonstrated that SNN controllers withspike-timing-dependent plasticity (STDP) rules were able toreach a stable state more rapidly and reliably than the rate-based counterparts under the same conditions. The research inthis field is still at the embryonic stage with many elementsnot understood. Examples of the incomplete understanding ofthe brain include the belief that there might be more than100 different neurotransmitters that exist in the brain and thefunction of many is not fully understood [7]. There may also bequantum effects in the synapses [8] and neurons [9], [10] in thebrain, but to what extent is not known.

SNNs have the ability to learn and make decisions despitethe incomplete understanding of neurological functionality, andto this end, wall-following applications using SNN controllershave been reported. Hybrid systems have been published wherea genetic algorithm (GA) based on the evolutionary traits of aspecies modifies the properties of an SNN. A wall-followingSNN with synaptic values that are modified by a GA wascreated in [11]. A better response was reported using the SNNthan was obtained by using a fuzzy logic (FL) controller, andthe SNN also outperformed the use of a standard GA controller.A GA was also used in another publication [12] to optimize theconnectivity from two neurons in one layer into ten neuronsin the following layer for the purpose of robotic movementwithout colliding into a wall. A simulated robot was used, andthe hybrid SNN had better performance than using a GA on itsown. A hybrid SNN and FL controller was implemented in [13]where a simulated robot learned behaviors based on an FL ruleset and the behaviors were implemented using spiking neurons.Experiments demonstrated successful obstacle avoidance andtarget-following behaviors.

Hybrid systems offer a temporal benefit to the slow processof SNNs in software, as they offer a faster pace of synapticmodification, connectivity, and learned behaviors. Although

2168-2267/$31.00 © 2012 IEEE

Page 2: Biologically Inspired SNN for Robot Control

116 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

functional systems have been created using SNNs combinedwith other methods, hybrid systems are unrealistic for use inmodeling an organism’s nervous system. For example, GAs donot modify values in a nervous system during a single life span,and FL is without a basis in synaptic and neuronal structures.

Wall-following applications using an SNN without a hybridsystem have been reported in the literature. A static SNNstructure was used in [14] consisting of 13 neurons dividedacross three layers. Experimental results were carried out ona simulated robot and show successful wall-following behavior.Another SNN uses the STDP rule to change its synaptic efficacyin a biologically inspired manner [15]. The SNN interacts withits environment and consequently learns to navigate a roomby moving toward or away from objects. The SNN in [16] isalso capable of learning by using biologically inspired long-term synaptic plasticity, and experiments show that the SNNlearned to navigate through a maze. The SNNs previouslydescribed have a learning capability, and motor decisions aremade through their network structures. However, they canbe implemented in a more biologically precise manner withsuch mechanisms as short-term plasticity, self-organization,and clustering of neurons for different tasks.

While the wall-following task can be performed using sim-ple if–then rules, the task can also be used to test if higherlevel behavior emerges from a lower level (SNN) architecture.This paper applies an SNN to a wall-following task to testif information is routed through the network correctly viafacilitating synapses and if the functionality of decision makingthrough learned expectations can also be seen. The SNN, in thispaper, uses 1-D sensors for inputs, which has limitations forapplications.

The main objectives of the current research are to implementshort-term and long-term synaptic plasticity. Biologically basedrouting of information actuated with the use of facilitatingdynamic synapses is implemented with short-term plasticity.Dynamic synapses have been used in prior publications [17],[18]. However, they were performed using if–then rules toswitch between the use of facilitation and depression at eachsynapse. The parameters of synapses in this paper do not switchby contrived rules. Instead, they are implemented in a morebiologic way to produce the proper routing of spikes throughthe network. Long-term synaptic plasticity is implemented us-ing the temporal difference (TD) rule to learn wall-followingbehavior. There are examples of the TD learning rule beingapplied to SNN applications [19], [20]. However, it has notbeen implemented for synaptic modification in an SNN to learnthe expectation of movements in a wall-following task. The TDparameters required for this task are unique to a wall-followingapplication and are described in detail. This paper’s focus istherefore in developing a robot controller with the objective ofbiological precision in its design.

The rest of this paper is organized as follows. The spikingneuron model is presented in Section II. The self-organizingSNN is subsequently described in Section III. A simulator ofthe Pioneer robot is used to test the SNN and is discussed inSection IV. Experimental results and analysis are shown in V,and after a discussion in Section VI, this paper is concluded inSection VII.

II. NEURON MODEL

Computational models of neurons such as Hodgkin andHuxley [21], Izhikevich [22], and FitzHugh–Nagumo [23] ex-ist with disparate amounts of biological realism. The leakyintegrate-and-fire (LIF) neuron model is chosen for use in thispaper because it requires minimal computation while beingsimilar to biological neurons in vivo and in vitro [24]. The LIFmodel is expressed in [25] as

τmemdu

dt= −u+RinItot(t) (1)

where τmem(= 40 ms) is the membrane time constant andRin(= 100 MΩ) is the input resistance. u is the displacementof neuron voltage from the resting potential, and Itot(t) is thetotal current from all synapses given by

Itot =

n∑i=1

Isyni(2)

where Isyn is the current from an individual synapse and isdefined by

Isyn = y ·ASE (3)

where the absolute synaptic efficacy (ASE) is the total electricalcharge (measured in the base unit nanoamperes in this paper)that the synapse can provide to the postsynaptic neuron. y isthe active fraction of a synapse’s neurotransmitters bindingto postsynaptic receptors, causing ion channels to open and acurrent of ions to flow into the postsynaptic neuron until thechannels close [26].

Repetitive activation of synapses can result in an increasedpostsynaptic current in excitatory facilitating synapses or a de-creased current in depressing synapses [27]. Kinetic equationsdescribing y in depressing synapses were formulated by [25]and are expressed as

dx

dt=

z

τrec− USE · x · sp (4)

dy

dt= − y

τin+ USE · x · sp (5)

dz

dt=

y

τin− z

τrec(6)

where sp is a binary value denoting the presence of a spikeas implemented in [17]. The parameters x and z are the frac-tions of neurotransmitters in the recovered and inactive states,respectively. The recovery time constant is τrec, and the timeconstant for the neurotransmitters to inactivate is τin. USE is theutilization of synaptic efficacy, which is the maximum fractionof resources that can be activated.

The USE is allowed to grow with successive input spikes infacilitating synapses. Referring to (7), U1

SE is a running totalof USE over time, reflecting the accumulation and depletion ofcalcium ions [28], replaces USE in (4) and (5), and is describedin [25] as

dU1SE

dt= − U1

SE

τfacil+ USE ·

(1− U1

SE

)· sp (7)

where τfacil is the facilitating time constant.

Page 3: Biologically Inspired SNN for Robot Control

NICHOLS et al.: BIOLOGICALLY INSPIRED SNN FOR ROBOT CONTROL 117

Facilitating synapses route information in biological SNNsbased on the temporal interspike interval (ISI). The ISI is theamount of time that passes from one spike to the followingspike on the same connection between neurons. This facilitatingsynaptic routing is shown in [29] using the neocortex of a ratwhere spikes from a single axon cause a different responsewhen passed to two separate facilitating synapses. They reportthat the values for the USE, τrec, and τfacil are different at vari-ous biological facilitating synapses and this difference resultsin an output from a neuron affecting its connected neuronswith disparate amounts of voltage. This routing mechanism isimplemented in the proposed SNN to increase the biologicalplausibility of the network and thereby increase the biologicalprecision of the controller.

To find the appropriate USE, τrec, and τfacil values for anISI range, a program was coded in C with a spike train inputinto a facilitating synapse for several seconds. After a coupleof seconds (time for y to grow and settle), the maximum y wasrecorded and added to a Matlab file. This was repeated for everyISI through the given range. The file was opened in Matlab,and the curve of y over an ISI range was plotted. The USE,τrec, and τfacil values were adjusted, and the process repeateduntil an ideal apex of the plotted curve was found. The τinfacilitating value and every depressing value in the proposedSNN are sourced from the phenomenological synapses in [25].

III. SELF-ORGANIZING SNN

The application of wall following requires knowledge of dis-tance and orientation to implement the appropriate movement.A distance arbitrarily chosen to be too close to a wall requiresorienting away from the wall, and a distance chosen to be toofar from a wall necessitates orienting toward the wall. A dis-tance chosen to be ideal performs wall following with forwardmovement if the orientation is parallel to the wall. Otherwise,forward and turning movements toward or away from the wallcontinue the wall-following process if oriented away or towardthe wall, respectively. When approaching a wall, a choice hasto be made regarding whether to wall follow with the wall onthe left side or to wall follow with the wall on the right side.This has implications for appropriate movement as movingtoward or away from a wall can require movement in differentdirections depending on which side the wall is on. Additionally,the robot must acquire knowledge regarding the expectationsfrom different movements, and the correct movement can beimplemented from those learned expectations.

The proposed self-organizing SNN incorporates theseelements and builds upon previous work [18] where the SNNcontinually self-organizes its architecture for every novel en-vironmental condition encountered by the robot. The SNN isshown in Fig. 1 and consists of five layers. Sensor receptorsconvert environmental information into spike trains. Layer 1separates sensor receptor inputs into distances and orientation.Layer 2 fuses the front and side distance information. Layer 3fuses distance and orientation data, while layer 4 isolates thedata necessary for the robot to perform discrete actions. Learn-ing takes place at synapses in layer 5, which contains threeneurons, each leading to different movements. The connectivity

Fig. 1. Self-organizing SNN. There are four sensor receptors: three fordistances and the other for orientation. Connections between layers 1 and 5 areself-organized at run time. Layer 5 has three neurons for different movements:left, forward, and right.

Fig. 2. Connectivity from the front distance receptor to two neurons in layer 1.

from the first layer to the fifth layer is self-organized in thatthere is no connectivity at the beginning of the network’slife and connections are formed as the robot experiences itsenvironment. Conversely, the connectivity from the receptors tolayer 1 and from layer 5 to the motors is hard wired. Synapsesused in this paper are based on biological synapses [25] withshort-term plasticity responses. Facilitating synapses stimulateneurons in layers 1 to 4 with the function of activating aunique layer 4 neuron for different environmental conditionsand decisions for tasks. The neurons in layer 4 cooperatewith depressing synapses that activate layer 5 neurons withoutwaiting for y to increase.

Note that a neuron is defined as being active in this paperwhen its output spike frequency is higher than all other neuronsin its cluster. A neuron is otherwise defined as being inactivebecause facilitating synapses filter out high-ISI spike trains insubsequent layers.

A. Layer 1

There are 11 neurons in layer 1, and each is hard wired toone of the four sensor receptors. The dfront receptor, shown inFig. 2, receives input regarding the distance from the front ofthe robot to an object. It converts the information into a spiketrain leading to two neurons, D and E, in layer 1.

The percentage of active neurotransmitters [y in (5)] in thesynapses at neurons D and E is shown in Fig. 3 where plasticitydependent on the ISI can be seen. Every layer 1 synapse has aUSE value of 0.05. The synapse associated with neuron D has

Page 4: Biologically Inspired SNN for Robot Control

118 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

Fig. 3. Fraction of active neurotransmitters (y) at layer 1 front synapses overa range of 160–200-ms ISIs. (a) Synapse at neuron D. (b) Synapse at neuron E.

Fig. 4. Connectivity from the left and right distance receptors to neurons inlayer 1.

τrec = 200 ms and τfacil = 915 ms, which results in a higheractive state with smaller ISIs, and these values are shown inFig. 3(a). The synapse associated with neuron E has τrec =804 ms and τfacil = 4000 ms, causing a higher active state withlarger ISIs as shown in Fig. 3(b).

The ASE values at these synapses are set to a strength whereone presynaptic spike results in one output spike from thepostsynaptic neuron if greater than 16.76% of neurotransmittersare utilized. A smaller utilization of synaptic resources requiresat least one additional presynaptic spike for a postsynaptic spiketo occur. The synaptic parameters consequently result in neuronD being active when a robot is less than 0.9 m from an object[the ISI is less than 178 ms, using (15)], and neuron E is activefor distances greater than or equal to 0.9 m (ISI greater than orequal to 178 ms). The value 0.9 m (178-ms ISI) is arbitrarilychosen to be an acceptable distance from the front of the robotto the closest sensed object.

Six neurons in layer 1 receive inputs from two receptorssensitive to the distance between the robot and the closest objecton each of its sides. Three neurons are connected to the dleftreceptor, and three neurons are connected to the dright receptor.Fig. 4 shows the connectivity from these receptors to neuronslabeled A, B, C and F, G, H.

The synapses in the pathways to neurons A and F haveτrec = 200 ms and τfacil = 915 ms. The active state with theseparameters is shown for ISIs in the range of 160–200 ms inFig. 5(a) where the synaptic parameters show increased activitywith decreasing ISIs. The synapses associated with neurons Band G have τrec = 505 ms and τfacil = 1455 ms. These valueswere chosen to ensure that the synapses stimulating neurons Band G become maximally active at an ISI of 180 ms, as shown

Fig. 5. Fraction of active neurotransmitters (y) at layer 1 synapses from dleft,dright, and orient receptors over a range of 160–200-ms ISIs. (a) Synapsesat neurons A, F, and I. (b) Synapses at neurons B, G, and J. (c) Synapses atneurons C, H, and K.

Fig. 6. Connectivity from the orientation receptor to the three neurons inlayer 1.

in Fig. 5(b). The synapses in the pathways to neurons C andH have τrec = 826 ms and τfacil = 4000 ms, causing a higheractive state with increasing ISIs as shown in Fig. 5(c).

The ASE values at the side synapses are such that onepresynaptic spike results in one output spike from the postsy-naptic neuron if the amount of neurotransmitter utilized exceeds16.728%, and further presynaptic spikes are required for apostsynaptic spike if fewer neurotransmitters are active. Theside synaptic parameters consequently result in neurons A andF being active when the respective left and right sides of therobot are less than 0.9 m (ISI less than 178 ms) from an object.Neurons B and G are active for middle distances greater than orequal to 0.9 m and less than or equal to 1.1 m (ISI greater thanor equal to 178 ms and less than or equal to 182 ms). Neurons Cand H are active for far distances greater than 1.1 m (ISI greaterthan 182 ms) from an object. Distances greater than or equalto 0.9 m (178-ms ISI) and less than or equal to 1.1 m (182-msISI) are arbitrarily chosen to be acceptable proximities from thesides of the robot to the nearest sensed object.

The orientation receptor, orient in Fig. 6, is sensitive to theside of the robot that is closest to an object, and a linear ISIfrom the receptor is given by

ISI =

⎧⎨⎩

160 ms if orient is at the front180 ms orient is at wall-follow side200 ms orient is at the back.

(8)

The receptor stimulates a synapse at three neurons labeled I,J, and K, and this can be seen in Fig. 6. The synapses at eachneuron have the same parameters as each side distance synapse,and these synaptic values activate one of the three neuronsfor every possible environmental condition. Neuron I is activewhen the front of the robot is oriented toward the nearest wall.Neuron J is active when the robot is alongside the wall, andneuron K is active when the robot’s orientation is facing away

Page 5: Biologically Inspired SNN for Robot Control

NICHOLS et al.: BIOLOGICALLY INSPIRED SNN FOR ROBOT CONTROL 119

Fig. 7. Layer 1 to layer 2 self-organizing structure.

Fig. 8. Fraction of layer 2 synaptic neurotransmitters that are active (y) overa range of 160–320-ms ISIs.

from the wall. The ISI values in (8) are chosen to produce themaximum active state y for each of the three synapses as shownin Fig. 5.

B. Layer 2: Front and Side Distance Fusion

Layer 2 consists of left and right clusters of neurons. Thefusion of two different distance receptors occurs in this layer.The left cluster processes information from the front and leftside of the robot, while the right cluster processes informationfrom the front and right side. This mapping is similar to thehuman nervous system where information from the front andleft of the visual field is passed to the visual cortex in onehemisphere of the brain and information from the front andright is processed in the visual cortex of the other hemisphereof the brain. None of the neurons in layer 2 is connected at thebeginning of the network’s execution.

The connectivity into layer 2 is shown in Fig. 7 where arandom neuron from each receptor represents the active layer 1neurons. When two neurons are active from receptors dleftand dfront that have not previously been active together, theseneurons are connected to an unconnected left layer 2 neuron.Likewise, two active neurons from receptors dright and dfrontare connected to an unconnected right layer 2 neuron the firsttime that these layer 1 neurons are active together. This self-organization of the connectivity is how the network learns andretains knowledge regarding its immediate environment.

The facilitating synaptic parameters associated with layer 2synapses are USE = 0.03, τrec = 600 ms, and τfacil = 1950 ms.The percentage of active neurotransmitters using these valuesover a range of ISIs is shown in Fig. 8 where the activity can be

Fig. 9. Example layer 2 to layer 3 self-organizing structure.

Fig. 10. Fraction of active synaptic neurotransmitters (y) in layer 3 over arange of 160–320-ms ISIs.

seen to be at a maximum value with little change when the ISI isgreater than or equal to 160 ms and less than or equal to 200 ms.There is a depression in activity when the ISI is greater than200 ms. These synaptic parameters filter out information fromlayer 1 neurons that are not active and maintain informationwhere there are two connected and active neurons in layer 1.

C. Layer 3: Distance and Orientation Fusion

The fusion of two different sensor types (distance and orien-tation) occurs in layer 3. Neurons in this layer begin withoutconnectivity. Fig. 9 shows connections to layer 3 where arandom neuron from the orientation receptor and a randomneuron from each side of layer 2 represent the active orientationand layer 2 neurons. Neurons from the orientation receptor andthe left layer 2 cluster that are active simultaneously for thefirst time are connected to an unconnected left layer 3 neuron.Connectivity also occurs to an unconnected right layer 3 neuronwhich is the first occasion that a neuron from the orientationreceptor and the right layer 2 cluster are simultaneously active.This results in a unique layer 3 output firing state for everyexperienced distance and orientation of the robot to the wall.

Every synapse in layer 3 has the following parameters:USE = 0.05, τrec = 250 ms, and τfacil = 4000 ms. These val-ues produce a greater depression for large ISIs compared to thelayer 2 parameters and can be seen in Fig. 10. The USE in thislayer is less than the layer 2 USE as the percentage of activityis greater and results in one active layer 3 neuron on the leftside and one active layer 3 neuron on the right side with ISIsbetween 160 and 200 ms.

Page 6: Biologically Inspired SNN for Robot Control

120 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

Fig. 11. Internal decision process output ISIs to two neurons L and M asshown in Fig. 1.

Fig. 12. Fraction of active neurotransmitters (y) at the decision synapses overa range of 160–200-ms ISIs. (a) Neuron L synapse. (b) Neuron M synapse.

D. Layer 4: Decision Activation

The optimal movement of the robot depends on the task tobe performed. Movement to the left brings the robot closer tothe wall if the robot is following a wall on its left side, whereasmovement to the right brings the robot closer to the wall if therobot is following a wall on its right side. Therefore, given thesame sensor input, different movements should be performedfor various tasks. It is therefore necessary to activate the clusterof neurons that will implement a decided task and not activateclusters carrying out other tasks. Fig. 1 shows layer 4 containingtwo clusters of neurons, one for following a wall on the robot’sleft side and the other for wall following on its right side.

To activate the respective cluster, spikes are sent from aninternal decision to two neurons in order to specify the side ofthe robot to wall follow. Fig. 11 shows the process of activatingeach decision where both neurons L and M receive spikes at anISI of 160 ms when the robot’s task is to follow the wall on itsleft side and 200 ms when the task is to follow on its right side.

Note that the internal decision process receives no input. Thedecision as to the action to participate in is chosen outside ofthe network. In future work, this decision can be implementedas a judgment by the network (i.e., the wall is closer to oneside) or as an instruction such as a voice-activated direction. Inthe current work, the decision, ISI, is chosen prior to the SNN’sexecution.

The short-term synaptic parameters for synapses that stim-ulate neurons L and M and layer 4 neurons have USE = 0.05.The synapse in the pathway to neuron L has parameters τrec =250 ms and τfacil = 980 ms, and Fig. 12(a) shows more activityat an ISI of 160 ms than at an ISI of 200 ms. The synapse asso-ciated with neuron M has τrec = 820 ms and τfacil = 4000 ms,and Fig. 12(b) shows these parameters to cause more activitywith an ISI of 200 ms than 160 ms. These synaptic parameters

Fig. 13. Layer 3 to layer 4 self-organizing structure.

Fig. 14. Fraction of active layer 4 neurotransmitters (y) over a range of160–320 ms.

result in an activation of neuron L when the decision is to followa wall on its left side, and neuron M is active when the decisionis for following on the right side.

The neurons in layer 4 begin without connectivity. When aconnection is formed between a layer 2 neuron and a layer3 neuron, the newly connected layer 3 neuron connects to itsnearest geographical neighbor in layer 4. A connection is alsoconcurrently formed to the layer 4 neuron from the internaldecision neuron stimulating the related cluster of neurons. Thisis shown in Fig. 13 where layer 3 neurons on the left andright sides are connected to a layer 4 neuron and the decisionneurons stimulating each cluster are also connected to the layer4 neurons. A unique neuron in layer 4 is consequently active foreach unique front proximity, side proximity, wall orientation,and decision for a task.

Synapses stimulating layer 4 neurons from layer 3 neu-ron outputs have facilitating parameters of τrec = 10 ms andτfacil = 2000 ms. The ASE value at these synapses is such thatactive inputs greatly increase the postsynaptic potential and thesynaptic parameters provide a significant decrease in activitywith rising ISIs which filter out large ISI spike trains as shownin Fig. 14.

The synaptic parameters from the internal decision neuronsare the same as the synapses into layer 3, and their effect uponthe synapses in relation to the presynaptic ISI is shown inFig. 10. The activity of these synapses increases with increasingISIs of 160–200 ms which counters the decreasing activity withthe same range of ISIs in the synapses from layer 3 neurons. Inevery layer 4 synapse, from decision and layer 3 neurons, largerISIs cause reduced activity which filters out large ISI trains.

Page 7: Biologically Inspired SNN for Robot Control

NICHOLS et al.: BIOLOGICALLY INSPIRED SNN FOR ROBOT CONTROL 121

E. Layer 5: Motor Control

There are three neurons in layer 5 where each controls adifferent movement: forward, left turn, and right turn. In thefirst occasion, a layer 4 neuron outputs a spike, and that neuronbecomes fully connected to every layer 5 neuron. The networkthen begins to learn which movement to perform when the samelayer 4 neuron outputs subsequent spikes. This is achieved bylearning what to expect from every movement given the task,distance, and orientation. As a result of the connectivity andmutual exclusivity of left and right turns, the robot can learnfive different movements: forward, forward–left, forward–right,left turn, and right turn.

F. Layer 6: Learning

In the motor areas of the brain, learning is influenced bycontinuous feedback from sensory receptors [30]. It is statedin [31] that “There is now overwhelming evidence, both fromdirect measurements of dopaminergic spike trains and fromfast electrochemical measurements of dopamine transients, thatphasic changes in dopamine delivery carry a reward predictionerror signal.” Dopaminergic neurons in the midbrain receivesensory inputs and supply dopamine neurotransmitters to non-local synapses involved in the control of voluntary movement[32]. When dopamine is released into a synaptic cleft followinga spike, it binds to a dopamine receptor and becomes “aneuromodulator that alters the responses of target neurons toother neurotransmitters” [33]. After unbinding from its recep-tor, dopamine can be carried away into the bloodstream. Tomaintain a constant level of dopamine, dopaminergic neuronsoutput a baseline amount of the neurotransmitter. Every motormovement has an expected outcome, and if the outcome isbetter than expected, a larger than baseline amount of dopamineis released causing long-term potentiation [34]. A worse thanexpected outcome results in a less than baseline amount ofdopamine release, causing long-term depression. The dopamin-ergic system provides a means of learning the expectationfrom movements based on sensory feedback after a motoroutput.

Dopamine influences the value of ASE, as activating thedopaminergic neurotransmitter in synapses results in a higherpostsynaptic voltage. An increase in midbrain dopaminergicneuron activity corresponds to rewarding behavior, while adecrease of dopamine is a result of erroneous behavior [32],[35]. The reward and suppression of dopamine have beenreported to precisely conform to the behavior of the TD learningrule [31]. Comparable results between the TD model and thephasic activities in midbrain dopamine neurons are shownin [36].

The TD learning rule is based upon the expected value V offuture rewards r and can be expressed as

Vt = rt+1 +

∞∑t=t+2

γtrt (9)

where γ is a discount rate between zero and one. V is updatedat every time step t, where a reward for movement is calculated

and future rewards are modified. The TD rule, denoted as δt, iswritten in [37] as

δt = r(St) + γV (St)− V (St−1) (10)

where S is the state at each t. γ is a discount rate set to0.99, which has been found to reproduce dopamine neuronoutputs [38].

At each t, ASE is modified by

ASE(t+1) = ASE(t) + ηδt (11)

where η is the learning rate coefficient set to 10 nA in theseexperiments. ASE is strengthened when the reward is greaterthan the expectation and weakened when the reward is less.There is no modification to the ASE when the expectationmatches the reward.

Note that the reward r is unique to the learning context. Ex-amples of TD learning in an SNN include robotic grid mappingby Potjans et al. [19] where the r is dependent on finding thequickest path to a target square in a 5 × 5 grid. Another exampleis environmental learning for location-dependent memory byKubota and Wakisaka [20] where r is calculated from specificemotions occurring to a robot. Long-term synaptic plasticity isimplemented in this paper where r is progressively larger as therobot moves closer to a target, which needs to be defined for thepurpose of learning to wall follow.

Forward movement brings the robot closer to objects in frontof it. It is therefore not ideal for the robot to move forwardwhen the front of the robot is close to a wall. After a forwardmovement during learning, the robot reaches its full reward(r = 1.0) if the front of the robot is far from a wall. Far isarbitrarily set to 0.9 m for the experiments. Closer distancesfrom the front of the robot to an object result in a smaller rewarddescribed by

rforward = 0.1 + dfront. (12)

The side of the robot should be at the correct alignment fromthe wall for turning movement. The r for connections fromlayer 4 neurons to each of the layer 5 neurons, causing turnsto the left and to the right, is dependent on the distance. Therobot scans the wall-following side region (dleft or dright inFig. 18) and the dfront region and uses the closest proximityfor the distance. If the robot is too close to the wall, thenthe robot’s ideal orientation is facing away from the wall.If the robot is at an arbitrary optimal distance from the wall,then the robot’s wall-following side should be alongside thewall. If the robot is too far from the wall, then the robot shouldbe oriented toward the wall.

Orientation sensors in Fig. 17 are used to calculate r forconnections to the turning neurons. When the sensor in closestproximity to a wall matches the ideal orientation, the robotreaches its full reward (r = 1.0). Otherwise, r becomes pro-gressively smaller for each sensor that separates the ideal sensorfrom the closest sensor. This is shown in Fig. 15 where therobot’s ideal orientation is for the wall to be on its left sidewhen the robot is learning to wall follow on its left and when itis at the optimal distance from a wall.

Page 8: Biologically Inspired SNN for Robot Control

122 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

Fig. 15. Orientation values during learning. The values are in the range ofzero to seven, and the values shown here are for wall follow on the left sidewhen the robot is in ideal distance from an object.

The sensor with the minimum reading smin is given a valuefrom zero to seven, and the reward for the turning neurons isthen calculated as

rturn = smin/7. (13)

After every movement, r is given a value based on theproximity to the forward or turning target which becomesprogressively smaller as the robot moves further from the targetor larger as the target approaches. The state r(S) is in the rangeof zero to one and is calculated before the first movement andevery second thereafter.V is the summation of every future r, and because a future

r has not been experienced, expectations are made based onr(St) and r(St−1). V begins as a constant before the firstmovement at t = 0 and is subsequently updated after everysecond according to

V (St) =

⎧⎪⎨⎪⎩

1.98 if t = 05r(St)− 3r(St−1) t = 12r(St)− r(St−1) t = 2

0 t = 3.

(14)

The beginning value of 1.98 was arbitrarily chosen to be alittle less than two-thirds of three rewards; however, it can bedifferent. The TD learning rule for ASE modification is given by(10) and begins at t = 1. Up to three calculations are performedfor every layer 4 to layer 5 connection. V (S3) is set to zero afterthe third and final second of learning, as no future expectationsremain.

The robot learns what to expect from moving in three direc-tions, one direction at a time. There is full connectivity betweena newly connected layer 4 neuron and the three layer 5 neurons.However, during the learning process from each layer 4 neuron,only one of the three connections is active, and the synapse atthe active connection is subsequently modified after movement.The movements last for 3 s but can be interrupted after eachsecond if the environment changes. This is important for theeligibility trace to properly reward the layer 4 to layer 5 synapsecausing a given movement. To explain further, suppose that therobot is too close to the wall and at the same time turning away

from the wall. This positions the robot at the target distanceand orientation from the wall, and therefore, the robot wouldbe fully rewarded for this as it is the correct movement. As theenvironment has changed, there would be a different layer 4neuron firing. Further turning movement leads to a negative re-ward, as the robot is turning away from its target. To implementthe correct eligibility trace, there are 3 s of movement in anyunlearned direction unless the environment changes, wherebythe learning stops. The learning process continues at a layer 5synapse that has had its learning interrupted on the subsequentoccasion by a spike that passes the synapse. The ASE updateafter every second of learning is described by (11).

During learning of forward movements and after ASE up-dates at t = 3, the connection to the forward neuron is cutwhere the expectation is less than the original expectation,ASE(t3) < ASE(t0). The connection is otherwise maintained.

Left and right turning movements are mutually exclusive andnot collectively exhaustive; either one or neither can occur, butnot both. After the learning sequences are complete from asingle layer 4 neuron to both layer 5 left and right neurons, thestrength of the ASE values is compared. Both turning connec-tions are cut where the difference between each ASE is small,as both turns cancel each other. Otherwise, the connection withthe weaker ASE is cut.

Learning occurs online for each novel environment. Neuronsreceiving input from sensor receptors become active over arange of ISIs. Different ISIs within any given range can conse-quently result in synaptic modification that is not sourced fromthe global minima for an active layer 4 to layer 5 connection.However, movement in the wrong direction will always result ina lower ASE than the ASE from movement in the ideal direction,and the range of ISIs activating each neuron attached to a sensorreceptor is small enough that a local minima is not a problem.

A movement is required after a single spike from a layer 4neuron so that, after processing the decision of movement, thenetwork can focus on the task of looking for sensor changes.To achieve a layer 5 output spike, depressing synapses areused with standard depressing synaptic parameters [25]. Theconnectivity between layer 5 neurons and the wheel motors ishard wired where each of the three neurons moves the robotin the appropriate direction: forward, left turn, and right turn.Movement begins at the firing of a layer 5 neuron(s) and thatmovement continues until the robot senses a change in the en-vironment that will cause a change in the active layer 1 neurons.

Additional spikes from layer 5 neurons are not required forrepeated movement to continue, and this has biologic founda-tions in a central pattern generator (CPG). There are two itemsin the robot’s working memory: processing for a motor decisionand searching for changes in the environment. In biologicalworking memory, the focus of attention can contain only oneitem at any time [39]. Motor processing can be achievedwhile the mind focuses on other tasks, such as walking whileconversing, as a result of the CPG [40], [41]. When the motorregion of the brain sends signals to the spinal cord for repetitivemovement, that movement is repeated until another signal issent to the CPG to change the movement, freeing the brain’sattention to focus on other items. Similarly, in this SNN, whena motor decision is made, that movement is continued while the

Page 9: Biologically Inspired SNN for Robot Control

NICHOLS et al.: BIOLOGICALLY INSPIRED SNN FOR ROBOT CONTROL 123

Fig. 16. Pioneer 3 robot schema with laser and sonar inputs and motor output.

Fig. 17. Pioneer robot’s 16 sonar sensors shown with the angle of each sensor.

task of searching for changes in the environment has the focusof attention. When a change occurs, the focus returns to theSNN for a motor decision based on the present environment.

IV. PIONEER ROBOT

A simulated Pioneer 3 robot, shown in Fig. 16, is used toverify the performance of the self-organizing SNN. The robotprovides the SNN with sonar and laser inputs, and the SNNprovides the robot with motor actions. A description of thesethree aspects (sonar, laser, and motor) of the robot is given nextto provide an understanding of the SNN’s connection to therobot’s external environment.

There are two sonar arrays used to ascertain the orientationof the robot to the wall. One array is at the front of the robot,and one is at the rear. Each array contains eight sonar sensors,as shown in Fig. 17. The sensors are positioned around theperimeter of the robot with every one separated by a 20◦ intervalat the front and back, and there are 40◦ intervals to the sidesensors [42]. The 16 sonar sensors surrounding the robot givesa rough 360◦ proximity to the surrounding environment. Thesensors return their proximity to objects in the form of a metricdistance. The sensor that is closest to the wall reveals whatorientation the wall is to the robot, i.e., at the robot’s front,at its back, or alongside. Equation (8) uses this information tocompute an ISI from the orient receptor. The Pioneer robotalso has a laser scanner that is capable of detecting proximities

Fig. 18. Pioneer robot laser scanning regions dleft, dfront, and dright.

to objects. However, the laser scanner is not able to sense theback half of the robot, which is necessary for judging the robot’sorientation to the wall.

A laser is attached to the top of the robot and is positionednear the front of the robot to facilitate scanning a range of0◦–180◦ from right to left. As with sonar, the laser readingreturns the metric proximity to an object. There is an advantageof scanning with the laser rather than the sonar in that thesonar sensors’ positioning is intermittent and therefore does notprovide a full sense of the surrounding area, particularly at closerange, while the laser scans every half degree. A more accuratereading of the environment is therefore known with the laser,particularly when the robot is close to an object as in the wall-following environments.

The task of wall following requires knowledge regarding dis-tance from the wall. The area in front of the robot is arbitrarilydivided into three sensor regions that the laser scanner will readas shown in Fig. 18.

The minimum reading from each region is used to givethe SNN the closest proximity from the laser scanner to anobject for the three regions. The sensor readings represent thedistance to an object in meters. The Pioneer robot is sensitive todistances up to 9 m. For the purpose of wall following, distancesgreater than 2 m do not need to be distinguished, and therefore,the distance is set to 2 m where the sensor reading is greater.

The laser input is converted into a linear ISI spike train byeach of the three d receptors using the following mapping:

ISI = 20d+ 160 (15)

resulting in inputs into layer 1 with ISIs in the range of160–200 ms.

The robot has three wheels. There is a nonpowered rotatingcaster wheel at the rear of the robot to support the weight of therobot and provide balance while enabling movement in everydirection. The two other wheels, on the left and right sides ofthe robot, are movable by motors. Each motored wheel can bemoved in forward and backward directions at varying speeds,although the speed is constant in this paper. When the wheelsare combined, they can maneuver the robot to any positionon a floor.

V. EXPERIMENTAL RESULTS AND ANALYSIS

The simulated environment consists of three parallel wallsperpendicular to one wall. The middle parallel wall is consider-ably shorter than the others and can be seen in Fig. 19. The SNNwas tested with three experiments. In the first experiment, the

Page 10: Biologically Inspired SNN for Robot Control

124 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

Fig. 19. Robot with a decision to follow a wall with the wall on its right sidefrom the starting point, SP.

SNN was given the task of learning to wall follow on the rightside of the robot. The objective of this task was to ascertain ifthe robot learned to navigate within the enclosure.

The robot was positioned alongside a wall with the wall to therobot’s right side. Every distance sense was experienced duringthe experiment, and connections were formed from layer 1 tosix neurons in both the left and right clusters of layer 2. Twenty-three layer 3 neurons formed connections to layer 2, with ten inthe left cluster and 13 in the right cluster. Each of these neuronswas then connected to its nearest geographical neighbor in layer4. As the internal decision was to follow on the right side, thetwo decision neurons received spikes at an ISI of 200 ms whichactivated the neuron stimulating the follow-on-right-side clusterof neurons. Thirteen layer 4 neurons in the cluster fired duringthe experiment and were each fully connected to the layer 5neurons. Twelve neurons completed learning the expectationfrom the three movements. The other neuron did not experienceits environment enough times to try all three movements. Tenlayer 4 neurons connected in the left cluster, and as none ofthese fired, no connections formed between them and layer 5.

Of the 12 layer 4 neurons that completed learning, two neu-rons learned to move the robot forward. Four layer 4 neuronslearned to turn the robot left and none learned to turn the robotright. Three neurons learned to move the robot in a forwardand left direction, and three neurons learned to move the robotforward and right. Fig. 20 shows the learned ASE distributionon the y-axis from every neuron in the layer 4 follow-on-right-side cluster. The + symbol represents the ASE value at theforward neuron, o represents the ASE at the neuron causing aleft turn, and x is the value for the synapses at the right turnneuron. ASE values at cut connections and connections wherelearning has not completed are not shown.

Fig. 21 shows the ASE modification from the eighth layer 4neuron stimulating the three layer 5 neurons during the learningsequences in experiment 1. The solid line represents the ASE tothe forward neuron. The dotted line (highest ASE at learningepoch 1) is the ASE to the right neuron, and the dashed line(lowest ASE at epoch 1) is the ASE to the left neuron. All ASE

values are initialized to 100. The movements (forward, left turn,and right turn) are selected in a random order. The layer 4 toforward layer 5 ASE progressed from 100 to 110 and then to

Fig. 20. Learned ASE distribution from each of the 13 follow-on-right-sidecluster of layer 4 neurons to the layer 5 neurons causing forward (+), left (o),and right (x) movements during experiment 1.

Fig. 21. ASE values from every movement over each epoch of learning froma single layer 4 neuron.

109.9 using the TD learning rule. There was no error at t = 3,as the learning rule had converged on the correct expectation.The connection was maintained after t = 3 because the ASE

was not weaker than it had been at t = 0. The ASE to theright neuron progressed using the TD learning rule from 100 to114.243 then to 109.857 and finally to 108.429. Smaller updatesoccurred as the robot progressively learned the expectationfrom turning to its right. The ASE to the left neuron decreasedto 97.2429, and the receptor inputs changed enough to stop thelearning sequence. The weight did not change again until bothleft and right turns had completed. When this occurred, the ASE

to the left neuron was 11.186 less than the ASE to the rightneuron, leading to the left ASE being cut. On each occasionthat the fifth layer 4 neuron fires after learning, depressingsynaptic connections to the forward and right neurons causepostsynaptic potentiation, and the robot consequently moves ina forward–right direction.

The proximity of the robot to the wall is shown in Fig. 22 fora duration of 1500 s (25 min). The target distance is showninside two horizontal lines at 0.8 and 1.2 m. The robot wasoutside the target distance for the majority of the first 500 sas the process of learning began. Most of the learning hadconcluded by 800 s. The robot strayed outside the target areatwice immediately before 1000 s, and this coincides with the180◦ turn in Fig. 19. However, the robot previously learned thecorrect action when away from a wall and quickly returned tothe target distance. During the final 500 s, the robot did not

Page 11: Biologically Inspired SNN for Robot Control

NICHOLS et al.: BIOLOGICALLY INSPIRED SNN FOR ROBOT CONTROL 125

Fig. 22. Follow-on-the-right-side proximity in meters over 1500 s. Theboundaries of close distance (0.8 m) and far distance (1.2 m) are outlined onthe y-axis.

Fig. 23. Self-organized SNN structure after experiment 1.

stray outside the target distance as the robot had learned thecorrect movements for each of its different orientations to thewall when at the target distance.

The robot successfully adapted to the new environment andlearned to wall follow on its right side. Fig. 19 shows the robot’spath as it learned the correct decisions for movement in the firstexperiment, and Fig. 23 shows the SNN connectivity at the endof the experiment.

The second experiment involved wall following on the leftside, and the robot’s path is shown in Fig. 24. The connectivitywas maintained from the previous experiment. Seven layer 3

Fig. 24. Robot with the given decision to wall follow with the wall on itsleft side.

Fig. 25. Follow-on-the-left-side proximity in meters over 1500 s. The bound-aries of close distance (0.8 m) and far distance (1.2 m) are outlined onthe y-axis.

neurons are connected to layer two neurons during the secondexperiment; five of the neurons were on the left side, and twowere on the right side. The added connectivity from the firstexperiment results in 30 connected neurons in layers 3 and 4.The internal decision in this experiment was to follow the wallon the left side, and so, the decision neurons received spikesat an ISI of 160 ms. This caused only the left side of layer 4to become active. Twelve layer 4 neurons completed learning.Three layer 4 neurons fired, but not enough for the expectationof every movement to be learned and consequently remainedfully connected. There were two neurons that connected on theright side of layers 3 and 4 while following on the left butdid not fire while following on the right and remained with noconnection to layer 5.

The proximity of the robot to the wall while following on theleft side is shown in Fig. 25. The robot was outside the targetdistance for the majority of the first 400 s as the process oflearning to wall follow on the left side began. The robot stayedwithin the target distance from 400 to 800 s when it reached the180◦ turn shown in Fig. 24. During this turn, the robot twicewent outside the target distance; however, it quickly returnedusing the knowledge it acquired when previously far from thewall. At 1100 s, the robot was very close to a wall at its frontand left sides and had to learn the correct movements to returnto the target distance and orientation from the wall. The robotdid not subsequently stray from its target.

Page 12: Biologically Inspired SNN for Robot Control

126 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

Fig. 26. Learned ASE distribution from each of the 15 follow-on-left-sidecluster of layer 4 neurons to the layer 5 neurons causing forward (+), left (o),and right (x) movements during experiment 2.

Fig. 26 shows the learned ASE (y-axis) distribution for everylayer 4 neuron. Of the 12 layer 4 neurons that completedlearning to wall follow on the left side, two neurons learned tomove the robot forward. No neurons learned that a left turn hasthe greatest expectation. However, four layer 4 neurons learnedto turn the robot to the right. Three neurons learned to move therobot in a forward and left direction, and three layer 4 neuronslearned to move the robot forward and right.

The connectivity of the self-organizing SNN at the end of thesecond experiment is shown in Fig. 27.

Many researchers in mobile robotics use a cul-de-sac en-vironment as a bench mark experiment for wall-followingbehavior. The final experiment involved a room containing twopaths into cul-de-sacs. The robot’s movements were chaoticwhen it moved toward the first cul-de-sac as the expectationfrom every movement had to be learned. The SNN learnedmany expectations before the robot reached the second cul-de-sac, and it successfully wall followed into this subsequent cul-de-sac. The path of the robot’s movement in this experimentis shown in Fig. 28. It is also evident from this experimentthat the robot successfully escaped both cul-de-sacs, whichdemonstrates the robustness of the proposed control method.

VI. DISCUSSION

The SNN does not distinguish between a wall and anyobstacle or other object. The proximity sensors only relate thatan object is a particular distance from the robot. Thus, theSNN’s approach to an object, such as a box or a table, is thesame as its approach to a wall in the experiments, and the robotmoves around it on its left or right side.

The room used in the experiment caused the robot to en-counter every possible distance range, and this knowledge waslearned and retained in the full connectivity of neurons betweenlayer 1 and layer 2. There were 30 unique distances andorientations denoted by the connected neurons in layer 3. Thelarge number of distances and angles encountered by the robotis due to the robot putting itself in many different proximitiesand orientations during the learning process.

The distances that the robot needs to observe for wall follow-ing relate to whether the front of the robot is of sufficient dis-tance to an object, if the wall-following side is an ideal distance,and if the trajectory of the robot is good given the orientation.

Fig. 27. Self-organized SNN structure after experiment 2.

Fig. 28. Robot following a wall into and out of two cul-de-sacs.

This knowledge is passed through only two distance neuronsin the front and three neurons each for the side distances andorientation. The results of the proposed design consist of thefollowing: The robot only knows if an object at the front is closeor far; if the side of the robot is close to, medium distance to,or far from an object; and if the orientation is facing, alongside,or away from an object. Adding further neurons to layer 1 willresult in the following: It will give varying degrees of close,

Page 13: Biologically Inspired SNN for Robot Control

NICHOLS et al.: BIOLOGICALLY INSPIRED SNN FOR ROBOT CONTROL 127

ideal, and far; the robot should be able to adjust itself moreprecisely to an object; and the trajectories will not be as crookedas shown in the experiments.

The SNN is capable of dealing with noise in the sensor data.Noise introduced to the d or orient receptors could only corruptthe activity in the SNN when the neurons receiving input fromthe sensors become active at a borderline area of proximity.For example, suppose a sensor is too close and is very nearthe border of too close and target distance. Noise could causea neuron representing the target distance to become active andrender a too close neuron inactive. However, if this causes therobot to move incorrectly further into the too close region, thefollowing reading will be far enough into the too close regionthat noise cannot push the reading into the target distance andthere is sufficient room within each region that the robot wouldstill be far enough from an object for a layer 5 neuron to move itaway. Figs. 19 and 24 show that the robot was able to carry outits tasks successfully without problems from noise, uncertainty,and imprecision.

VII. CONCLUSION

This paper has presented a biologically inspired SNN forrobotic control with sensor fusion, dynamic synapses for rout-ing, self-organization, and TD learning. Experiments show thatthe robot learned correct movements through long-term synap-tic modification. The SNN recognized the behavior that bringsthe greatest expectations through its self-organizing structureof memories from its experienced environments. Facilitatingsynapses applied the correct routing of information through theSNN, and the robot successfully learned to wall follow with thewall on each of its sides.

The proposed SNN is able to sense its environment and per-form a task using expectations learned from prior experiencesin the same environment. Humans are able to take this a stepfurther and recall an exact map of a room without being inthe room. This would present a very interesting research forfuture work in artificial SNNs. From an exact map of a room,a comparison could be made against current sensor input toascertain which room a robot is in, or if elements within a roomhave been moved.

High-level algorithms such as the Kalman filter are able topredict the trajectory of a dynamic object. This is not incorpo-rated into the proposed SNN as the authors want to implementtheir work at a low level and the way that the brain makespredictions is not fully understood. In general, for an SNN toaccomplish this, a feedback loop may be required to give atemporal dimension to data, and from there, a prediction mightbe made. If this can be accomplished in future work, it wouldmake the proposed SNN better by projecting the movements ofdynamic objects and moving the robot to avoid them.

The SNN presented in this paper is proven by wall-followingtests to function under the limited computation of software. Theproposed SNN is slower than other wall-following applications,due to the temporal cost of computing each neuron and dynamicstate of every synapse sequentially. Modeling the SNN with ev-ery neuron and synapse functioning in parallel (as occurs in thebrain) in hardware will improve the speed of the computation.

The next item for future work will put the proposed SNN as ison a field-programmable gate array attached to a real robot.

Following the transfer onto hardware, the SNN can ulti-mately be used for superior applications. Increasing the numberof layer 1 neurons to the current design of the network assuggested in Section VI will allow processing from a richersensor such as a camera which augments the sensitivity of theenvironment. It will then be possible for the SNN to learn moreadvanced behaviors based on object recognition, rather thanlearning based only on distance recognition which is currentlyin place. The decision as to which side of the wall to followwill be improved by adding a sense such as sound, wherebythe robot can perform an action in response to aural instruction.Increased motor actions with more precise movement can beimplemented with further output neurons. These additions canuse the mechanisms described in this paper such as short-term plasticity for routing and long-term plasticity for learning.However, increasing the biologic realism of the SNN is onlyrecommended in hardware due to the added computational cost.

REFERENCES

[1] J. J. Hopfield, “Pattern recognition computation using action potentialtiming for stimulus representation,” Nature, vol. 376, no. 6535, pp. 33–36, Jul. 1995.

[2] P. C. Bressloff and P. S. Coombes, “Dynamics of strongly coupled spikingneurons,” Neural Comput., vol. 12, no. 1, pp. 91–129, Jan. 2000.

[3] H. Markram, “The Blue Brain Project,” Nat. Rev. Neurosci., vol. 7,pp. 153–160, 2006.

[4] A. Tonnelier, H. Belmabrouk, and D. Martinez, “Event-driven simulationsof nonlinear integrate-and-fire neurons,” Neural Comput., vol. 19, no. 12,pp. 3226–3238, Dec. 2007.

[5] J. J. Hopfield, “Neurodynamics of mental exploration,” Proc. Nat. Acad.Sci., vol. 107, no. 4, pp. 1648–1653, Jan. 2010.

[6] E. A. Di Paolo, “Spike-timing dependent plasticity for evolved robots,”Adaptive Behav., vol. 10, no. 3/4, pp. 243–263, Jul. 2002.

[7] G. L. Kovacs, “The endocrine brain—Pathophysiological role ofneuropeptide–neurotransmitter interactions,” J. Int. Fed. Clin. Chem. Lab.Med., vol. 15, no. 3, pp. 1–6, 2004.

[8] F. Beck and J. C. Eccles, “Quantum processes in the brain: A scientificbasis of consciousness,” Cognitive Stud., vol. 5, no. 2, pp. 2.95–2.109,1998.

[9] J. Summhammer and G. Bernroider, “Quantum entanglement in the volt-age dependent sodium channel can reproduce the salient features of neu-ronal action potential initiation,” ArXiv E-Prints, Dec. 2007.

[10] G. Bernroider and S. Roy, “Quantum entanglement of K ions, multiplechannel states and the role of noise in the brain,” in Proc. SPIE, 2005,pp. 205–213.

[11] H. Hagras, A. Pounds-Cornish, M. Colley, V. Callaghan, and G. Clarke,“Evolving spiking neural network controllers for autonomous robots,” inProc. IEEE Int. Conf. Robot. Autom., Apr. 2004, vol. 5, pp. 4620–4626.

[12] F. Alnajjar and K. Murase, “Self-organization of spiking neural networkgenerating autonomous behavior in a real mobile robot,” in Int. Conf.Comput. Intell. Modelling, Control, Autom., 2005, vol. 1, pp. 1134–1139.

[13] N. Kubota, “A spiking neural network for behavior learning of a mobilerobot in a dynamic environment,” in Proc. IEEE Int. Conf. Syst., Man,Cybern., Oct. 2004, vol. 6, pp. 5783–5788.

[14] X. Wang, Z. G. Hou, M. Tan, Y. Wang, and L. Hu, “The wall-followingcontroller for the mobile robot using spiking neurons,” in Proc. Int. Conf.Artif. Intell. Comput. Intell., 2009, vol. 1, pp. 194–199.

[15] P. Arena, S. DeFiore, L. Patane, M. Pollino, and C. Ventura, “Insect in-spired unsupervised learning for tactic and phobic behavior enhancementin a hybrid robot,” in Proc. IJCNN, Jul. 2010, pp. 1–8.

[16] M. Mokhtar, D. M. Halliday, and A. M. Tyrrell, “Autonomous nav-igational controller inspired by the hippocampus,” in Proc. IJCNN,Aug. 2007, pp. 813–818.

[17] J. J. Wade, L. J. McDaid, J. A. Santos, and H. M. Sayers, “SWAT: Aspiking neural network training algorithm for classification problems,”IEEE Trans. Neural Netw., vol. 21, no. 11, pp. 1817–1830, Nov. 2010.

Page 14: Biologically Inspired SNN for Robot Control

128 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 1, FEBRUARY 2013

[18] E. Nichols, L. J. McDaid, and N. H. Siddique, “Case study on a self-organizing spiking neural network for robot navigation,” Int. J. NeuralSyst., vol. 20, no. 6, pp. 501–508, Dec. 2010.

[19] W. Potjans, A. Morrison, and M. Diesmann, “A spiking neural networkmodel of an actor-critic learning agent,” Neural Comput., vol. 21, no. 2,pp. 301–339, Feb. 2009.

[20] N. Kubota and S. Wakisaka, “Location-dependent emotional memory fornatural communication of partner robots,” in Proc. IEEE Workshop Robot.Intell. Inf. Struct. Space, 2009, pp. 107–113.

[21] A. L. Hodgkin and A. F. Huxley, “A quantitative description of mem-brane current and its application to conduction and excitation in nerve,”J. Physiol., vol. 117, no. 4, pp. 500–544, Aug. 1952.

[22] E. M. Izhikevich, “Simple model of spiking neurons,” IEEE Trans. NeuralNetw., vol. 14, no. 6, pp. 1569–1572, Nov. 2003.

[23] J. Nagumo, S. Arimoto, and S. Yoshizawa, “An active pulse transmissionline simulating nerve axon,” Proc. IRE, vol. 50, no. 10, pp. 2061–2070,Oct. 1962.

[24] V. A. Makarov, F. Panetsos, and O. de Feo, “A method for determiningneural connectivity and inferring the underlying network dynamics usingextracellular spike recordings,” J. Neurosci. Methods, vol. 144, no. 2,pp. 265–279, Jun. 2005.

[25] M. V. Tsodyks, K. Pawelzik, and H. Markram, “Neural networks with dy-namic synapses,” Neural Comput., vol. 10, no. 4, pp. 821–835, May 1998.

[26] L. F. Abbott and W. G. Regehr, “Synaptic computation,” Nature, vol. 431,no. 7010, pp. 796–803, Oct. 2004.

[27] R. A. Nicoll and D. Schmitz, “Synaptic plasticity at hippocampal mossyfibre synapses,” Nat. Rev. Neurosci., vol. 6, no. 11, pp. 863–876,Nov. 2005.

[28] J. F. Mejias and J. J. Torres, “Improvement of spike coincidence de-tection with facilitating synapses,” Neurocomputing, vol. 70, no. 10–12,pp. 2026–2029, Jun. 2007.

[29] H. Markram, Y. Wang, and M. Tsodyks, “Differential signaling via thesame axon of neocortical pyramidal neurons,” Proc. Nat. Acad. Sci. USA,vol. 95, no. 9, pp. 5323–5328, Apr. 1998.

[30] S. Greenfield, The Human Brain: A Guided Tour. London, U.K.:Phoenix, 1998.

[31] P. R. Montague, B. King-Casas, and J. D. Cohen, “Imaging valuationmodels in human choice,” Annu. Rev. Neurosci., vol. 29, no. 1, pp. 417–448, 2006.

[32] O. Arias-Carrion and E. Poppel, “Dopamine, learning, and reward-seeking behavior,” Acta Neurobiol. Experimentalis, vol. 67, no. 4,pp. 481–488, 2007.

[33] J. A. Girault and P. Greengard, “The neurobiology of dopamine signal-ing,” Arch. Neurol., vol. 61, no. 5, pp. 641–644, May 2004.

[34] D. M. Egelman, C. Person, and P. R. Montague, “A computational rolefor dopamine delivery in human decision-making,” J. Cognit. Neurosci.,vol. 10, no. 5, pp. 623–630, Sep. 1998.

[35] W. Schultz, “Getting formal with dopamine and reward,” Neuron, vol. 36,no. 2, pp. 241–263, Oct. 2002.

[36] R. E. Suri and W. E. Schultz, “Temporal difference model reproducesanticipatory neural activity,” Neural Comput., vol. 13, no. 4, pp. 841–862,Apr. 2001.

[37] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.Cambridge, MA: MIT Press, 1998.

[38] R. E. Suri, “TD models of reward predictive responses in dopamine neu-rons,” Neural Netw., vol. 15, no. 4–6, pp. 523–533, Jun./Jul. 2002.

[39] K. Oberauer, “Access to information in working memory: Exploring thefocus of attention,” J. Exp. Psychol. Learn., Memory, Cognit., vol. 28,no. 3, pp. 411–421, May 2002.

[40] M. R. Dimitrijevic, Y. Gerasimenko, and M. M. Pinter, “Evidence for aspinal central pattern generator in humans,” Ann. New York Acad. Sci.,vol. 860, no. 1, pp. 360–376, Nov. 1998.

[41] S. L. Hooper, “Central pattern generators,” Current Biol., vol. 10, no. 5,pp. R176–R179, Mar. 2000.

[42] S. Cui, X. Su, L. Zhao, Z. Bing, and G. Yang, “Study on ultrasonicobstacle avoidance of mobile robot based on fuzzy controller,” in Proc.Int. Conf. Comput. Appl. Syst. Model., 2010, vol. 4, pp. 233–237.

Eric Nichols was born in Arlington, MA. He re-ceived the B.Sc.(Hons.) degree in computer scienceand Dipl. in industrial studies, the M.Sc. degree incomputing and intelligent systems, and the Ph.D.degree in engineering from the University of UlsterMagee Campus, Derry, U.K., in 2006, 2007, and2011, respectively.

He was with Symantec Limited, Dublin. He iscurrently a Consultant with the University of Ulster.His research interests include neural field theory,spiking neural networks, and synaptic plasticity.

Liam J. McDaid received the B.Eng.(Hons.) degreein electrical and electronics engineering in 1985 andthe Ph.D. degree in solid-state devices from theUniversity of Liverpool, Liverpool, U.K.

He is currently a Reader with the School of Com-puting and Intelligent Systems, University of UlsterMagee Campus, Derry, U.K. He is currently a GuestEditor for a special topic entitled “Biophysicallybased Computational Models of Astrocyte–NeuronCoupling and their Functional Significance” to ap-pear in Frontiers in Neuroscience, and he has coau-

thored over 100 publications in his career to date. He is a Founder Memberof the Nanoelectronics Research Group, Intelligent Systems Research Centre,University of Ulster Magee Campus. His main research interest is software/hardware implementations of neural-based computational systems, and he hasseveral research grants in this domain. His ultimate vision is to understandand model the mechanisms that underpin self-repair in the human brain, thusproviding the blueprint for advanced architectures that exhibit a fault-tolerantcapability well beyond existing computational systems.

Nazmul Siddique (M’99–SM’08) received theDipl.-Ing. degree in cybernetics and automation en-gineering from Dresden University of Technology,Dresden, Germany, in 1989, the M.Sc.Eng. degree incomputer science and engineering from BangladeshUniversity of Engineering and Technology, Dhaka,Bangladesh, in 1995, and the Ph.D. degree in in-telligent control from the Department of AutomaticControl and Systems Engineering, The University ofSheffield, Sheffield, U.K., in 2003.

Since 2001, he has been a Lecturer with the Schoolof Computing and Intelligent Systems, University of Ulster Magee Campus,Derry, U.K. Prior to this, he was with Computer Science and Engineering Dis-cipline, Khulna University, Khulna, Bangladesh. His research interests relate tocybernetics, intelligent systems, computational intelligence, stochastic systems,Markov modeling, and neuro-fuzzy-evolutionary systems. He is an Editor ofthe Journal of Behavioural Robotics and an Associate Editor of EngineeringLetters and is also on the Editorial Advisory Board of the International Journalof Neural Systems.

Dr. Siddique is a member of executive committee of IEEE Systems, Man,and Cybernetics, United Kingdom and Republic of Ireland chapter and hasbeen involved in organizing many international conferences. He has publishedover 110 research papers in the broad area of intelligent control, computationalintelligence, and robotics. He coauthored seven book chapters and two booksto be published by John Wiley and Springer in 2012. He guest edited fivespecial issues of reputed journals on cybernetic intelligence, computationalintelligence, neural networks, and robotics. He secured funding from EuropeanUnion and Daiwa Anglo-Japanese Foundation for various research projects.