View
215
Download
0
Category
Preview:
Citation preview
+ model ARTICLE IN PRESS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
PROOFMirror neurons and imitation: A computationally guided review
Erhan Oztop a,b,*, Mitsuo Kawato a,b, Michael Arbib c
a JST-ICORP Computational Brain Project, Kyoto, Japanb ATR, Computational Neuroscience Laboratories, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
c Computer Science, Neuroscience and USC Brain Project, University of Southern California, Los Angeles, CA 90089-2520, USA
Abstract
Neurophysiology reveals the properties of individual mirror neurons in the macaque while brain imaging reveals the presence of ‘mirror
systems’ (not individual neurons) in the human. Current conceptual models attribute high level functions such as action understanding, imitation,
and language to mirror neurons. However, only the first of these three functions is well-developed in monkeys. We thus distinguish current
opinions (conceptual models) on mirror neuron function from more detailed computational models. We assess the strengths and weaknesses of
current computational models in addressing the data and speculations on mirror neurons (macaque) and mirror systems (human). In particular, our
mirror neuron system (MNS), mental state inference (MSI) and modular selection and identification for control (MOSAIC) models are analyzed in
more detail. Conceptual models often overlook the computational requirements for posited functions, while too many computational models adopt
the erroneous hypothesis that mirror neurons are interchangeable with imitation ability. Our meta-analysis underlines the gap between conceptual
and computational models and points out the research effort required from both sides to reduce this gap.
q 2006 Published by Elsevier Ltd.
Keywords: Mirror neuron; Action understanding; Imitation; Language; Computational model
T88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
ORREC1. Introduction
Many neurons in the ventral premotor area F5 in macaque
monkeys show activity in correlation with the grasp1 type
being executed (Rizzolatti, 1988). A subpopulation of these
neurons, the mirror neurons (MNs), exhibit multi-modal
properties responding to the observation of goal directed
movements performed by another monkey or an experimenter
(e.g. precision or power grasping) for grasps more or less
congruent with those associated with the motor activity of the
neuron (Gallese, Fadiga, Fogassi, & Rizzolatti, 1996;
Rizzolatti, Fadiga, Gallese, & Fogassi, 1996). The same area
includes auditory mirror neurons (Kohler et al., 2002) that
respond not only to the view but also to the sound of actions
with typical sounds (e.g. breaking a peanut, tearing paper).
The actions associated with mirror neurons in the monkey seem
UNC
0893-6080/$ - see front matter q 2006 Published by Elsevier Ltd.
doi:10.1016/j.neunet.2006.02.002
* Corresponding author. Address: Department of Cognitive Neuroscience,
ATR, Computational Neuroscience Laboratories, 2-2-2 Hikaridai Seika-cho,
Soraku-gun, Kyoto 619-0288, Japan. Tel.: C81 774 95 1215; fax: C81 774 95
1236.
E-mail address: erhan@atr.jp (E. Oztop).1 We restrict our discussion to hand-related neurons; F5 contains mouth-
related neurons as well.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
105
106
107
108
109
110
111
112
113
114
ED to be transitive, i.e. to involve action upon an object and apply
even to an object just recently hidden from view (e.g. Umilta
et al., 2001).
It is not possible to find individual mirror neurons in humans
since electrophysiology is only possible in very rare cases and
at specific brain sites in humans. Therefore, one usually talks
about a ‘mirror region’ or a ‘mirror system’ for grasping
identified by brain imaging (PET, fMRI, MEG, etc.). Other
regions of the brain may support mirror systems for other
classes of actions. An increasing number of human brain
mapping studies now refer to a mirror system (although not all
are conclusive). Collectively these data indicate that action
observation activates certain regions involved in the execution
of actions of the same class. However, in contrast to monkeys,
intransitive actions have also been shown to activate motor
regions in humans. The existence of a (transitive and
intransitive) mirror system in the human brain has also been
supported by behavioral experiments illustrating the so-called
‘motor interference’ effect where observation of a movement
degrades the performance of a concurrently executed incon-
gruent movement (Brass, Bekkering, Wohlschlager, & Prinz,
2000; Kilner, Paulignan, & Blakemore, 2003; see also Sauser
& Billard, this issue, for functional models addressing this
phenomenon). Because of the overlapping neural substrate for
action execution and observation in humans as well as other
primates, many researchers have attributed high level cognitive
Neural Networks xx (xxxx) 1–18
www.elsevier.com/locate/neunet
T
E. Oztop et al. / Neural Networks xx (xxxx) 1–182
+ model ARTICLE IN PRESS
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
UNCORRECfunctions to MNs such as imitation (e.g. Carr, Iacoboni,
Dubeau, Mazziotta, & Lenzi, 2003; Miall, 2003), action
understanding (e.g. Umilta et al., 2001), intention attribution
(Iacoboni et al., 2005) and—on the finding of a mirror system
for grasping in or near human Broca’s area—(evolution of)
language (Rizzolatti & Arbib, 1998).2
We stress that, although statements are often made about
mirror neurons in humans, we have data only on what might be
called mirror systems in humans—connected regions that are
active in imaging studies both when the subject observes an
action from some set and executes an action from that set, but
not during an appropriate set of control tasks. Brain imaging
results show that mirror regions in human may be associated
with imitation and language (Carr et al., 2003; Fadiga,
Craighero, Buccino, & Rizzolatti, 2002; Iacoboni et al.,
1999; Skipper, Nusbaum, & Small, 2005), but there are no
corresponding data on mirror neurons. Moreover, monkeys do
not imitate (but see below) or learn language and so any
account of the role of mirror neurons in imitation and language
must include an account of the evolution of the human mirror
system (Rizzolatti & Arbib, 1998) or at least the biological
triggers that can unleash in monkeys a rudimentary imitation
capability that goes beyond those they normally exhibit, though
still being quite limited compared to those of humans
(Kumashiro et al., 2003). We thus argue that imitation and
language are not inherent in a macaque-like mirror system but
instead depend on the embedding of circuitry homologous to
that of the macaque in more extended systems within the
human brain.
A general pitfall in conceptual modeling is that an innocent
looking phrase thrown in the description may render the model
implausible or trivial from a computational perspective, hiding
the real difficulty of the problem. For example, terms like
‘direct matching’ and ‘resonance’ are used as if they were
atomic processes that allow one to build hypotheses about
higher cognitive functions of mirror neurons (Gallese, Keysers,
& Rizzolatti, 2004; Rizzolatti, Fogassi, & Gallese, 2001). One
must explain the cortical mechanisms which support the
several processing stages that transform retinal stimulation
caused by an action observation into the mirror neuron
responses. Another issue is to clarify what is encoded by the
mirror neuron activity. Is it the motor command, the meaning
or the intention of the observed action? In an attempt to explain
the multiplicity of functions attributed to mirror neurons, it has
been recently speculated that different set of mirror neurons are
involved in different aspects of the observed action (Rizzolatti,
2005).
We will review various ‘conceptual models’ that pay little
attention to this crucial reservation, and then review several
computational models in some detail: a learning architecture
with parametric biases (Tani, Ito, & Sugita, 2004); a genetic
algorithm model which develops networks for imitation while
2 Recent reviews of the mirror neuron and mirror system literature are
provided by Buccino, Binkofski, and Riggio (2004), Fadiga and Craighero
(2004) and Rizzolatti and Craighero (2004).
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
yielding mirror neurons as a by-product of the evolutionary
process (Borenstein & Ruppin, 2005); the mirror neuron
system (MNS) model that can learn to ‘mirror’ via self-
observation of grasp actions (Oztop & Arbib, 2002) and is
closely linked to macaque behavior and (somewhat more
loosely) neurophysiology; and models which are not restricted
in this fashion: the mental state inference (MSI) model that
builds on the forward model hypothesis of mirror neurons
(Oztop, Wolpert, & Kawato, 2005), the modular selection and
identification for control (MOSAIC) model that utilizes
multiple predictor–controller pairs (Haruno, Wolpert, &
Kawato, 2001; Wolpert & Kawato, 1998), and the imitation
architecture of Demiris and Hayes (2002) and Demiris and
Johnson (2003).
2. Mirror neurons and action understanding
Mirror neurons, when initially discovered in macaques,
were thought to be involved in action recognition (Fogassi
et al., 1992; Gallese et al., 1996; Rizzolatti et al., 1996) though
this laid the basis for later work ascribing a role in imitation to
the human mirror system. Although the term ‘action under-
standing’ was often used, the exact meaning of ‘understanding’
as used here is not clear. It can range from ‘act according to
what you see’, to ‘infer the intentions/mental states leading to
the observed action’. In fact, the neurophysiological data
simply show that a mirror neuron fires both when the monkey
executes a certain action and when he observes more or less
congruent actions. In these experiments, he is given no
opportunity to show by his behavior that he understands the
action in either of the above senses.
Gallese and Goldman (1998) suggested that the purpose of
MNs is to enable an organism to detect certain mental states of
observed conspecifics via mental simulation. According to this
view, mirror neurons could be the precursor of mind-reading
ability, being compatible with the simulation theory
hypothesis.3 Again, this involves considerable extrapolation
beyond the available data. In particular, mind-reading might
involve a quite separate mirror system for facial expression as
much as a mirror system for manual actions. Although the
suggestion has achieved some positive reception, no details
have been provided on how this could be implemented as a
computational model. (The MSI model does address this issue,
see below).
In spite of the computational differences between recog-
nition (and imitation) of facial gestures and recognition
(imitation) of hand actions, many cognitive neuroscientists
address them both under a ‘generic mirror system’. The insula
has been found to be a common face-emotion region for both
production and understanding of facial gestures and emotions
(Carr et al., 2003; Wicker et al., 2003). Although it is tempting
to consequently take the insula as formed by instantiating
3 Two predominant accounts of mindreading exist in the literature. ‘Theory
theory’ asserts that mental states are represented as inferred conjectures of a
naive theory whereas according to ‘simulation theory’, mental states of others
are represented by representing their states in terms of one’s own.
225
226
227
228
T
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 3
+ model ARTICLE IN PRESS
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
UNCORREC
an F5-like mirror system for emotion processing, there are
crucial differences in how these circuits develop in infancy
which shed doubt on the idea of a generic neural mirror
mechanism to unify social cognition (Gallese et al., 2004).
Manual actions can be compared visually and vocalizations
can be compared auditorially—the commonality being the
matching of an observed action with the output from an internal
motor representation through a comparison in the same
domain. Unlike hand actions, one’s own facial gestures can
only be seen with the help of a reflective material or otherwise
must be inferred. In fact, infants may learn much about their
own facial expressions from the propensity of caregivers to
imitate the child. We argue that the learning mechanism of
facial imitation is different from hand imitation, which
involves learning via social interaction of the kinds we
suggested elsewhere (Oztop et al., 2005):
You eat A/you have face expression X (visual).
I eat A/I feel disgust Y (internal state).
Therefore, X (visual) must be Y (feeling of disgust).
The elaboration of this mechanism in the brain is beyond the
scope of this review. However, the important message here is
that manual action understanding and facial emotion under-
standing pose different problems to the primate brain, which
might have found solutions in different organizational
principles.
3. Mirror neurons and imitation
Learning by imitation is an important part of human motor
behavior, which requires complex set of mechanisms (Schaal,
Ijspeert, & Billard, 2003). Wolpert, Doya, and Kawato (2003)
underline some of those as (i) mapping the sensory variables
into corresponding motor variables, (ii) compensation for the
physical difference of the imitator from the demonstrator, and
(iii) understanding the intention (goal) causing the observed
movement. Many cognitive neuroscientists view imitation as
mediated by mirror neurons in humans. Although this is a
plausible hypothesis, we stress again that this is not within the
normal repertoire of the macaque mirror system (but note the
discussion below of Kumashiro et al., 2003) and so must—if
true—rest on evolutionary developments in the mirror system.
It should also be emphasized that there is a considerable
amount of literature addressing imitation without explicit
reference to mirror neurons (e.g. see Byrne & Russon, 1998).
Since most mirror neurons are found in motor areas it is
reasonable to envision a motor control role for mirror neurons.
One possibility is that these neurons implement an internal
model for control. Current evidence suggests that the central
nervous system uses internal models for movement planning,
control, and learning (Kawato & Wolpert, 1998; Wolpert &
Kawato, 1998). A forward model is one that predicts the
sensory consequences of a motor command (Miall & Wolpert,
1996; Wolpert, Ghahramani, & Flanagan, 2001); while an
inverse model transforms a desired sensory state into a motor
command that can achieve it. The proposal of Arbib and
Rizzolatti (1997) that mirror neurons may be involved in
inverse modeling plays a central role in recent hypotheses
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
about the neural mechanisms of imitation, which were
accelerated by the increasing number of functional brain
imaging studies.
One recent proposal is that the mirror neurons may provide
not only an inverse model but a forward model of the body
which can generate action candidates in the superior temporal
sulcus (STS) where neurons have been found with selectivity
for biological movement (e.g. of arms, whole body) (Carr et al.,
2003; Iacoboni et al., 1999). The idea is that STS acts as a
comparator that can be used within a search mechanism that
finds the mirror neuron-projected action code that matches the
observed action best that code is in turn used for subsequent
imitation. It is further suggested that the STS-F5 circuit can be
run in the reverse direction (inverse modeling) to map the
observed action into motor codes (mirror activity) so that a
rough motor representation of the observed act becomes
available for imitation. According to this hypothesis, the F5-
STS circuitry must be capable of producing detailed visual
representation of the self-actions. From a computational point
of view, if we accept that an observed act can be transformed
into motor codes or if we accept the availability of an elaborate
motor/visual forward model then imitation becomes trivial.
However, the above conceptual model limits imitation to
actions already in the observer’s repertoire. It is one thing to
recognize a familiar action and quite another to see a novel
action and consequently add it to one’s repertoire. Another
concern is that the mirror neurons found with electrophysio-
logical recordings in monkeys are limited to goal directed
actions. Indeed, behavioral studies show that chimpanzees
cannot handle imitation tasks which do not involve any target
objects (Myowa-Yamakoshi & Matsuzawa, 1999). In their
efforts to release rudimentary imitation capability in monkeys
(Kumashiro et al., 2003) used four tests based on objects
(cotton-separation, knob-touching, latched box opening and
removing the lid from a conical tube) and (with highly variable
success) three tests based on facilitating use of a specific
effector (tongue-protrusion model, hand-clench and thumb-
extension) and two based on the movement of a hand relative to
the body (hand to nose, hand clap, and hand to ear).
Miall (2003) suggested amending the conceptual model of
Iacoboni et al. by including the cerebellum. He proposed that
the forward and inverse computations required can be carried
out by the cerebellum and PPC (posterior parietal cortex). The
cerebellum has often been considered the likely candidate for
(forward and inverse) internal models (Kawato & Gomi, 1992;
Wolpert, Miall, & Kawato, 1998), but an alternative view
(related to Miall’s) is that cerebellar models act in parallel with
models implemented in cerebral cortex, rather than replacing
them (Arbib, Erdi, & Szentagothai, 1998). However, the
computational problem is still there: how the retinal image is
transformed to motor commands in a precise way (inverse
problem), and how a precise visual description of one’s own
body is mentally produced given a motor command (forward
problem). For inverse and forward models whose sensory data
are limited to the visual domain, the problems are quite severe
when the whole body is considered because one cannot
completely observe all of one’s own body. However,
E. Oztop et al. / Neural Networks xx (xxxx) 1–184
+ model ARTICLE IN PRESS
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
observation of the hand in action is possible, enabling forward
and inverse model learning. Presumably, the brain does not rely
on visual data alone, but integrates it with proprioceptive cues.
Although inverse learning is harder than forward learning, in
general this does not pose a huge problem, and by using certain
invariant representations in the visual domain, the learned
internal models can be applied to other individuals’
movements.4
T
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
UNCORREC4. Mirror neurons and language
Rizzolatti and Arbib (1998) built on evidence that macaque
F5 is homologous to human Broca’s area (an area often
associated with speech production) and that human brain
imaging reveals a mirror system for grasping in or around
Broca’s area to propose that brain mechanisms supporting
language in humans evolved atop a primitive mirror neuron
system similar to that found in monkeys. According to their
mirror system hypothesis, human Broca’s area’s mirror system
properties provide the evolutionary basis for language parity
(the approximate meaning-equivalence of an utterance for both
speaker and hearer). Arbib (2002, 2005) has expanded the
hypothesis into seven evolutionary stages:
S1: Simple grasping,
S2: A mirror system for grasping,
S3: A simple imitation system for grasping,
S4: A complex imitation system for grasping,
S5: Protosign, a hand-based communication system,
S6: Protospeech, a vocalization-based communication
system,
S7: Language, which required little or no biological evolution
beyond S6, but which resulted from cultural evolution in
Homo sapiens.
We do not have space here to comment on all the above
steps but one quite distinctive feature of the analysis of stage
S2 deserves attention. It asks the question ‘why do mirror
neurons exist?’ and answers ‘because mirror neurons are
(originally) involved in motor control’. These neurons are
located in the premotor cortex, interleaved with other motor
neurons, where distal hand movements are controlled. Indeed,
one may argue that the visual feedback required for manual
dexterity—based on observation of the relation of hand to goal
object—provided mechanisms that were exapted in primate
and hominid evolution first for action recognition and then for
imitation.
Although it is a daunting task to computationally realize this
evolutionary model in its complete form (an attempt not
without its critics—see the commentaries pro and con in Arbib
(2005)), the transitions from one stage to the next can be
potentially studied from a computational perspective.
4 In general it is not possible to infer the full dynamics from a kinematics
observation, but we may assume some approximate solution that yields similar
kinematics when applied by the observer.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
5. Computational models involving MNs
We still lack a systematic neurophysiological study that
correlates mirror neuron activity with the kinematics of the
monkey or the demonstrator which would allow the compu-
tational modeler to test ideas about such correlations. More-
over, if (as we believe) the mirror properties of these neurons—
as distinct from the conditions which make their acquisition
possible—are not innate, then the study of the developmental
course of these neurons and their function would test
computational models of development that could help us better
understand the functional role of MNs.
A general but wrong assumption in many computational
studies of imitation is that mirror neurons are responsible for
generating actions (and even sometimes that area F5 is
composed only of mirror neurons). Indeed, F5 can be
anatomically subdivided into two distinct regions, one containing
the mirror neurons, and one containing canonical neurons. The
latter are like mirror neurons in their motor properties, but do
not respond to action observation. Muscimol injections
(muscimol causes reversible neural inactivation) to the part
of area F5 that includes mirror neurons do not impair grasping
and control ability, but causes only a slowing down of the
action (Fogassi et al., 2001).5 However, when the area that
includes the canonical neurons is the target of injection, the
hand shaping preceding grasping is impaired and the hand
posture is not appropriate for the object size and shape (Fogassi
et al., 2001).
In the following sections, we will review various
computational studies that relate (often at some conceptual
remove) to mirror neurons. Unfortunately, most of the
modeling is targeted at imitation. Only one, the MNS model
of Oztop and Arbib (2002) directly claims to be a model for
mirror neurons (although it does not provide computational
modules for motor control).
6. A dynamical system approach
The first model we review, due to Jun Tani et al. (2004), is
aimed at learning, imitation and autonomous behavior
generation. The proposed network is a generative learning
architecture called recurrent neural network with parametric
biases (RNNPB). In this architecture, the spatio-temporal
patterns are associated with so-called parametric bias vectors
(PB). RNNPB self-organizes the mapping between PBs and the
spatio-temporal patterns (behaviors) during the learning phase.
From a functional point of view the goal of RNNPB is similar
to dynamical movement primitive learning (Ijspeert, Naka-
nishi, & Schaal, 2003; Schaal, Peters, Nakanishi, & Ijspeert,
2004) in that the behaviors are learned as dynamical systems.
However, in the dynamical movement primitive approach,
dynamical systems are not constructed from scratch as
5 This may at first appear inconsistent with the view that mirror neurons may
assist the feedback control for dexterous movements, but note that the
muscimol studies were only carried out for highly familiar grasps whose
successful completion would require little if any visual feedback.
453
454
455
456
Hidden Layer
Sensory Motor (input) Parametric Bias Vector Context Units (input)
Sensory Motor (prediction) Context Units (output)
w
W
c(t+1)
c(t)
Sensory Motor ( t+1)
Sensory Motor (t)
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 5
+ model ARTICLE IN PRESS
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
in RNNPB; but rather (core) primitive dynamical systems are
adapted to match the demonstrated movements using local
learning techniques (Ijspeert et al., 2003; Schaal et al., 2004).
In this article, we focus on RNNPB as the representative of
dynamical system approaches since the authors have already
hinted that RNNPB captures some properties of the mirror
neurons (Tani et al., 2004). The RNNPB has three operational
modes; we review each of them starting from the learning
mode.
PB valueexternally setFig. 2. The RNNPB in behavior generation mode. Given a fixed PB vector (thick
arrow) the network produces the corresponding stored sensory–motor stream.
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
6.1. Learning mode
The learning is performed in an off-line fashion by
providing the sensory-motor training stimuli (e.g. two
trajectories—one for the position of a moving hand and the
other for the joint angles of the arm) for each behavior in the
training set. The goal of the training is twofold (see Fig. 1): (1)
to adapt weight sets w and W such that the network becomes a
time series predictor for the sensory-motor stimuli, and (2) to
create PB vectors for each training behavior. Both adaptations
are based on the prediction error; the weights w and W are
adapted by back-propagation over all the training patterns as in
a usual recurrent neural network. The PB vectors, however, are
updated separately for each training pattern for reducing the
prediction error. Furthermore, the modulation of PB vectors are
kept slow to obtain a fixed PB vector for each learnt behavior.
TE
540
541
542
543
544
545
546
547
548
549
550
551
552
C
6.2. Action generation mode
After learning, the model represents a set of behaviors as
dynamical systems tagged by the PB vectors created during the
learning phase. The behaviors are generated via the associated
PB vectors. Given a fixed PB vector, the network autono-
mously produces a sensory–motor stream corresponding to the
behavior associated with the PB vector (see Fig. 2). Note that
the behavior generation mode of RNNPB requires the sensory–
motor prediction to be fed back into the sensory–motor input as
shown in Fig. 2.
UNCORRE
Hidden Layer
Sensory Motor (input) Parametric Bias Vector Context Units (input)
Sensory Motor (prediction) Context Units (output)
Sensory Motor (t)(Training Data)
Sensory Motor ( t+1)(Training Data) –
+ Prediction Error (Used to updateParametric Bias Vector and w & W)
w
W
c(t+1)
c(t)
PB values for thesensory-motor data
Fig. 1. The RNNPB network in learning phase. The weights W and w are
adapted so as to reduce the prediction error over all the behaviors to be learnt
(temporal sensory–motor patterns). The PB vectors are also updated to reduce
the prediction error, albeit separately for each behavior and at a slower rate to
ensure representative PB values for each behavior.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
D PROOF
6.3. Action recognition mode
The task of the network in this mode is to observe an
ongoing behavior (sensory data) and compute a PB vector that
is associated with a behavior that matches the observed one as
much as possible. The arrival of a sensory input generates a
prediction of the next sensory stimulus at the output layer.
Then the actual next sensory input is compared with this
prediction creating a prediction error (see Fig. 3). The
prediction error is back-propagated to the PB vectors (i.e. PB
vectors are updated such that the prediction error is reduced).
The actual computation of the PB vectors is performed using
the so-called regression window of the past steps so that the
change of PB vectors can be constrained to be smooth over the
window. The readers are referred to Tani et al. (2004) for
further details of this step. If sensory input matches one of the
learned behaviors, the PB vectors tend to converge to the
values determined during learning (Tani et al., 2004). Note that
in this mode, the feedback from sensory–motor output to
sensory-motor input is restricted only to the motor component.
6.4. Relation to MNs and imitation
The model has been shown to allow a humanoid robot
to imitate and learn actions via demonstration. The logical link
Hidden Layer
Sensory Motor (input) Parametric Bias Vector Context Units (input)
Sensory Motor (prediction) Context Units (output)
Sensory Data (t)
Sensory Data (t+1) –
+ Prediction Error (Used to updateParametric Bias Vector)
MotorPrediction
c(t+1)
c(t)
PB valuecorresponding to thesensory observation
Fig. 3. The RNNPB in behavior recognition mode. In the recognition mode the
sensory input is obtained from external observation (thick arrows). The
feedback from sensory–motor output to sensory–motor input is restricted only
to the motor component. The prediction error is used to compute the parametric
bias vector corresponding to the incoming sensory data.
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
E. Oztop et al. / Neural Networks xx (xxxx) 1–186
+ model ARTICLE IN PRESS
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
to mirror neurons comes from the fact that the system works as
both a behavior recognizer and generator after learning. PB
units are tightly linked with the behavior being executed or
observed. During execution, a fixed PB vector selects one of
the stored motor patterns. For recognition PB unit outputs
iteratively converge to the action observed. Although mirror
neurons do not determine the action to be executed in monkeys
(Fogassi et al., 2001), the firing patterns of mirror neurons are
correlated with the action being executed. Thus, PB vector
units may be considered analogous to mirror neurons. Ito and
Tani (2004) suggest that the PB units’ activities should be
under control of a higher mechanism to avoid unwanted
imitation, such as for dangerous movements. One prediction, or
rather a question posed to neurophysiology, is what happens
when a dangerous movement is observed by a monkey.
Although it is known that (initially) F5 mirror neurons do not
respond to unfamiliar actions, no data on the parietal mirror
neurons exists to rule out this possibility. However, we again
emphasize that mirror response does not automatically involve
movement imitation and so it is unlikely that the monkey
mirror neuron system is ‘an inhibited imitation system’; rather
imitation ability must have been developed on top of the mirror
neuron system along the course of primate evolution.
T
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
REC7. Motor learning and imitation: a modular architecture
The RNNBP of the previous section represents multiple
behaviors as a distributed code. Demiris et al. (2002, 2003)
chose the opposite approach representing each behavior as a
separate module in their proposed imitation system. Following
the organization of the MOSAIC model (Wolpert & Kawato,
1998) (see Section 12), the key structure of the proposed
architecture is formed by a battery of behaviors (modules)
paired with forward models, where each behavior module
receives information about the current state (and possibly the
target goal), and outputs the motor commands that is necessary
to achieve the associated behavior (see Fig. 4). A forward
model receives output of the paired behavior module and
estimates the next state which is fed back to the behavior
module for parameter adjustments. A behavior is similar to an
‘inverse model’, although inverse models do not usually utilize
UNCORBehavior 1 ForwardModel 1
–+Current State
Current State
Next State
Next State
MotorOutput
Behavior N ForwardModel N
–+
next stateprediction
next stateprediction
MotorOutput
Fig. 4. The imitation architecture proposed by Demiris et al. (2002, 2003) is
composed of a set of paired behavior and forward models. During imitation
mode, the comparison of the predicted next state (of the demonstrator) with the
actual observed state gives an indication of which behavior module should be
active for the correct imitation of the observed action.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
D PROOF
feedback, but output commands in a feed-forward manner.
However, the boundary between behavior and inverse model is
not a rigid one (Demiris & Hayes, 2002).
The architecture implements imitation by assuming that the
demonstrator’s current state (e.g. joint angles of a robot) is
available to it. When the demonstrator executes a behavior, the
perceived states are fed into the imitator’s available behavior
modules in parallel which generate motor commands that are
sent to the forward models. The forward models predict the
next state based on the incoming motor commands, which are
then compared with the actual demonstrator’s state at the next
time step. The error signal resulting from this comparison is
used to derive a confidence value for each behavior (module).
The behavior with the highest confidence value (i.e. the one
that best matches the demonstrator’s behavior) is selected for
imitation. When an observed behavior is not in the existing
repertoire, none of the existing behaviors reach a high
confidence value, thus indicating that a new behavior should
be added to the existing behavior set. This is achieved by
extracting representative postures while the unknown behavior
is demonstrated, and constructing a behavior module (e.g. a
PID controller) to go through the representative postures
extracted. This computational procedure to estimate the other
agent’s behavioral module by simulating one’s own forward
model and controller is essentially identical to the proposal by
Doya, Katagiri, Wolpert, and Kawato (2000). Demiris and
Simmons (this issue) describe a hierarchical architecture that
employs similar principles at its core. Although at a conceptual
level, this architecture has strong parallels with the MOSAIC
model (Haruno et al., 2001; Wolpert & Kawato, 1998),
MOSAIC takes learning and control as the core focus by
providing explicit learning mechanisms (see Section 12).
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
E7.1. Relation to MNs and imitation
The architecture can be related to mirror neurons because
the behavior modules are active during both movement
generation and observation. However, all the modules are run
in parallel in the proposed architecture, so it is more reasonable
to take the confidence values as the mirror neuron responses.
Demiris et al. (2002, 2003) arrived at several predictions about
mirror neurons albeit considering ‘imitation-ability’ and
‘mirror neuron activity’ interchangeable, while we think they
must be analyzed separately. One interesting prediction which
has also been predicted by the MNS model (Oztop & Arbib,
2002) (see Section 10) is the following. ‘A mirror neuron
which is active during the demonstration of an action should
not be active (or possibly be less active) if the demonstration is
done at speeds unattainable by the monkey’. A further
prediction states that ‘mirror neurons that remain active for a
period of time after the end of the demonstration are encoding
more complex sequences that incorporate the demonstration as
their first part’ (Demiris & Hayes, 2002). The other
predictions—implied by the structure of the architecture—are
‘the existence of other goal directed mirror neurons and the
trainability of new mirror neurons’.
T
Motor Code
Others (Vestibular,Auditory etc.)
Vision
Somatosensory
Fig. 5. A generic associative memory for an agent (a robot or an organism).
When an agent generates a movement using motor code, the sensed stimuli are
associated with this code. At a later time a partial representation of associated
stimuli (e.g. vision) can be used to retrieve the whole (including the motor
code). The connectivity among the units representing different modalities could
be full or sparse.
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 7
+ model ARTICLE IN PRESS
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
UNCORREC
8. An evolutionary approach
The evolutionary algorithms that Borenstein and Ruppin
(2005) used to explain mirror neurons and imitation are quite
different from the previous approaches. Evolutionary algor-
ithms incorporate aspects of natural selection (survival of the
fittest) to solve an optimization problem. An evolutionary
algorithm maintains a population of structures (‘individuals’)
that evolves according to rules of selection, recombination,
mutation and survival. A shared ‘world’ determines the fitness
or performance of each individual and identifies the optimiz-
ation problem. Each ‘generation’ is composed of fitter
individuals and their variants, while fewer not-fit individuals
and their variants are allowed to reproduce their traits. After
many generations one expects to find a set of high performing
individuals which represent close-to-optimal solutions to the
original problem. Within this framework, Borenstein and
Ruppin (2005) defined individuals as simple neuro-controllers
that could sense the state of the world and the action of a
teaching agent (inputs) and generate actions (outputs).
Individuals generated output with a simple 1-hidden-layer
feedforward neural network. The fitness of an individual was
defined as a random mapping from world state to actions,
which was kept fixed for the lifetime of an individual. The
evolutionary encoding (genes) determined the properties of
individual network connections (synapses): type of learning,
initial strength, either inhibitory or excitatory type, and rate of
plasticity.
8.1. Relation to MNs and imitation
The simulation with populations of 200 individuals
evolving for 2000 generations showed interesting results. The
best individuals developed neural controllers that could learn to
imitate the teacher. Furthermore, the analysis of the units in the
hidden layer of these neuro-controllers revealed units which
were active both when observing the teacher and when
executing the correct action, although not all the actions
were mirrored. The conclusion drawn was that there is an
‘essential link between the ability to imitate and a mirror
system’. Despite the interest of the demonstration that evolving
neural circuitry to imitate yields something like mirror neurons
as a byproduct, we note again that even though monkeys have
mirror neurons they are not natural imitators. Thus, the
‘evolution’ of mirror systems on the basis of ‘evolutionary
pressure’ to imitate does not seem to capture the time course of
primate evolution.
9. Associative memory hypothesis of mirror neurons
In this section, we avoid choosing a single architecture for
extensive review since the core mechanism employed in all the
candidate models relies on a very simple principle deriving
from the classical view of Hebbian synaptic plasticity in the
cerebral cortex. Implementation of this view results in
connectionist architectures referred as associative or content
addressable memories (Hassoun, 1993), to which the models
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOFreviewed in this section more or less conform. The crucial
feature of an associative memory is that a partial representation
of a stored pattern is sufficient to reconstruct the whole. In
general a neural network which does not distinguish input and
output channels can be considered as an associative memory
(with possibly hidden units). Fig. 5 schematizes a possible
association that can be established when a biological or an
artificial agent acts. The association can take place among the
motor code, the somatosensory, vestibular, auditory and visual
stimuli sensed when the movement takes place with the
execution of the motor code. If we hypothesize that mirror
neurons are part of a similar mechanism then the mirror neuron
responses could be explained: when the organism generates
motor commands the representation of this command and the
sensed (somatosensory, visual and auditory) effects of the
command are associated within the mirror neuron system. Then
at a later time when the system is presented with a stimulus that
partially matches one of the stored patterns (i.e. vision or
audition of an action alone) the associated motor command
representation is retrieved automatically. This representation
can be used (with additional circuitry) to mimic the observed
movement. This line of thought has been explored through
robotic implementations of imitation using a range of
associative memory architectures. Elshaw, Weber, Zochios,
and Wermter (2004) implemented an associator network based
on the Helmholtz machine (Dayan, Hinton, Neal, & Zemel,
1995) where the motor action codes were associated with
vision and language representations. The learned association
enabled neurons of the hidden layer of the network to behave as
mirror neurons; the hidden units could become active with one
of motor, vision or language inputs. Kuniyoshi, Yorozu, Inaba,
and Inoue (2003) used a spatio-temporal associative memory
called the ‘non-monotone neural net’ (Morita, 1996) to
associate self generated arm movements of a robot with the
local visual flow generated. Billard and Mataric (2001) used
the DRAMA architecture (Billard & Hayes, 1999), which is a
time-delay recurrent neural network with Hebbian update
E. Oztop et al. / Neural Networks xx (xxxx) 1–188
+ model ARTICLE IN PRESS
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
dynamics at the core of their biologically inspired imitation
architecture. Oztop, Chaminade, Cheng, and Kawato (2005)
used an extension of a Hopfield net utilizing product terms to
implement a hand posture imitation system using a robotic
hand.
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
9.1. Relation to MNs and imitation
In spite of the differences in the implementation, the
common property among the aforementioned associative
memory models is the multi-modal activation of the associative
memory/network units. Thus, when these models are
considered as models of mirror neurons (note that not all
models claim to be models for mirror neurons, as the main
focus is on imitation) then the explanation of the existence of
mirror neurons becomes phenomenological rather than
functional (see Section 13). For example, the models of
Kuniyoshi et al. (2003) and Oztop et al. (2005) use self-
observation as the principle for bootstrapping imitation and
formation of units that respond to self-actions and observations
of others. We refer to this type approach as ASSOC so that we
can collectively refer to these models in Section 13, where we
propose a taxonomy based on the modeling methodology.
T
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
REC
10. Mirror neuron system (MNS) model:
a developmental view
The models reviewed so far related to the mirror neurons
indirectly, through imitation. Here, we present a computational
model with anatomically justified connectivity, which directly
explores how MNs develop during infancy. It is quite unlikely
that the MNs are innate because mirror neurons have been
observed for tearing paper, for instance (Kohler et al., 2002).
With this observation, the MNS model (Oztop & Arbib, 2002)
takes a developmental point of view and explains how the
mirror neurons are developed during infancy. The main
hypothesis of the model is that the temporal profile of the
features an infant experiences during self-executed grasps
provides the training stimuli for the mirror neuron system to
develop.6 Thus, developmentally, grasp learning precedes
initial mirror neuron formation.7 Although MNS proposes
that MNs are initially evolved to support motor control, it does
not provide computational mechanisms showing this.
R900901
902
903
904
CO10.1. The ModelMNS is a systems level model of the (monkey) mirror
neuron system for grasping. The computational focus of the
UN6 Only grasp related visual mirror neurons were addressed. A subsequent
study (Bonaiuto et al., 2005) has introduced a recurrent network learning
architecture that not only reproduces key results of Oztop and Arbib (2002) but
also addresses the data of Umilta et al. (2001) on grasping of recently obscured
objects and of Kohler et al. (2002) on audiovisual mirror neurons.7 Note again that monkeys have a mirror system but do not imitate. It is thus a
separate question to ask “How, in primates that do imitate, does the imitation
system build (both structurally and temporally) on the mirror system?”.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
905
906
907
908
ED PROOF
model is the development of mirror neurons by self-
observation; the motor production component of the system
is assumed to be in place and not modeled using neural
modules. The schemas8 (Arbib, 1981) of the model are
implemented with different level of granularity. Conceptually
those schemas correspond to brain regions as follows (see
Fig. 6). The inferior premotor cortex plays a crucial role when
the monkey itself reaches for an object. Within the inferior
premotor cortex area F4 is located more caudally than area F5,
and appears to be primarily involved in the control of proximal
movements (Gentilucci et al., 1988), whereas the neurons of F5
are involved in distal control (Rizzolatti et al., 1988). Areas IT
(inferotemporal cortex) and cIPS (caudal intraparietal sulcus)
provide visual input concerning the nature of the observed
object and the position and orientation of the object’s surfaces,
respectively, to AIP. The job of AIP is to extract the
affordances the object offers for grasping. By affordance we
mean the object properties that are relevant for grasping such as
the width, height and orientation. The upper diagonal in Fig. 6
corresponds to the basic pathway AIP/F5 canonical/M1
(primary motor cortex) for distal (reach) control. The lower
right diagonal (MIP/LIP/VIP/F4) of Fig. 6 provides the
proximal (reach) control portion of the MNS model. The
remaining modules of Fig. 6 constitute the sensory processing
(STS and area 7a) and the core mirror circuit (F5 mirror and
area 7b).
Mirror neurons do not fire when the monkey sees the hand
movement or the object in isolation; the sight of the hand
moving appropriately to grasp or manipulate a seen (or recently
seen) object is necessary for the mirror neurons tuned to the
given action to fire (Umilta et al., 2001). This requires schemas
for the recognition of the shape of the hand and the analysis of
its motion (performed by STS in the model), and for the
analysis of the hand-object relation (Fig. 7a). The information
gathered at STS and areas 7a are captured in the ‘hand state’ at
any instant during movement observation and serves as an
input to the core mirror circuit (F5 mirror and area Fig. 7b).
Although visual feedback control was not built into MNS, the
hand state components track the position of hand and fingers
relative to the object’s affordance (see Oztop and Arbib (2002)
for the full definition of the hand state) and can thus be used in
monitoring the successful progress of a grasping action
supporting motor control. The crucial point is that the
information provided by the hand state allows action
recognition because relations encoded in the hand state form
an invariant of the action regardless of the agent of the action.
This allows self-observation to train a system that can be used
for detecting the actions of others and recognizing them as one
of the actions of the self.
During training, the motor code represented by active F5
canonical neurons was used as the training signal for the
core mirror circuit to enable mirror neurons to learn which
8 A schema refers to a functional unit that can be instantiated as a modular
unit, or as a mode of operation of a network of modules, to fulfill a desired
input/output requirement (Arbib, 1981; Arbib et al., 1998).
909
910
911
912
OFFig. 6. A schematic view of the mirror neuron system (MNS) model. The MNS model learning mechanisms and simulations focus on the core mirror circuit marked
by the central diagonal rectangle (7b and F5 mirror), see text for details (Oztop & Arbib, 2002; reproduced with kind permission of Springer Science and Business
Media).
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 9
+ model ARTICLE IN PRESS
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
hand-object trajectories corresponded to the canonically
encoded grasps. We reiterate that the input to the F5 mirror
neurons is not the visual stimuli as created by the hand and the
object in the visual field but the ‘hand state trajectory’
(trajectory of the relation of the hand and the object) extracted
from these stimuli. Thus, training tunes the F5 mirror neurons
to respond to hand-object relational trajectories independent of
the owner of the action (‘self’ or ‘other’).
T1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
UNCORREC10.2. Relation to MNs
The focus of the simulations was the 7b-F5 complex (core
mirror circuit). The input and outputs of this circuit was
computed using various schemas providing a context to
analyze the circuit. The core mirror circuit was implemented
as a feedforward neural network (1-hidden layer back-
propagation network with sigmoidal activation units; hidden
layer: area 7b; output layer: F5 mirror) responding to
increasingly long initial segments of the hand-state trajectory.
The network could be trained to recognize the grasp type from
the hand state trajectory, with correct classification often being
achieved well before the hand reached the object. For the
preprocessing and training details the reader is referred to
Oztop and Arbib (2002).
Despite the use of a non-physiological neural network,
simulations with the model generated a range of predictions
about mirror neurons that suggest new neurophysiological
experiments. Notice that the trained network responded not
only to hand state trajectories from the training set, but also
showed interesting responses to novel grasping modes. For
example Fig. 7 shows one prediction of the MNS model. An
ambiguous precision pinch grasp activates multiple neurons
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PRO(power and precision grasp responsive neurons) during the
early portion of the movement observation. Only later does the
activity of the precision pinch neuron dominate and the power
grasp neuron’s activity diminish.
Other predictions were derived from the spatial perturbation
experiment where the hand did not reach the goal (i.e. a ‘fake’
grasp), and the altered kinematics experiment where the hand
moved with constant velocity profile. The former case showed
a non-sharp decrease in the mirror neuron activity while the
latter showed a sharp decrease. The reader is referred to Oztop
and Arbib (2002) for the details and other simulation
experiments.
Recently, Bonaiuto, Rosta, and Arbib (2005) developed the
MNS2 model, a new version of the MNS model of action
recognition learning by mirror neurons of the macaque brain,
using a recurrent architecture that is biologically more
plausible than that of the original model. Moreover, MNS2
extends the capacity of the model to address data on audio–
visual mirror neurons (Kohler et al., 2002) and on response of
mirror neurons when the target object was recently visible but
is currently hidden (Umilta et al., 2001).
11. The mental state inference (MSI) model: forward
model hypothesis for MNs
The anatomical location (i.e. premotor cortex) and motor
response of mirror neurons during grasping suggest that the
fundamental function of mirror neurons may be rooted in grasp
control. The higher cognitive functions of mirror neurons, then
should be seen as a later utilization of this system, augmented
with additional neural circuits. Although MNS of the previous
T
Powergrasp
Normalized time
1.00.0
Firi
ng r
ate
0.0
1.0
(a)
(b)
Precisiongrasp
Fig. 7. Power and precision grasp resolution. (a) The left panel shows the initial
configuration of the hand while the right panel shows the final configuration of
the hand, with circles showing positions of the wrist in consecutive frames of
the trajectory. (b) The distinctive feature of this trajectory is that the hand
initially opens wide to accommodate the length of the object, but then thumb
and forefinger move into position for a precision grip. Even though the model
had been trained only on precision grips and power grips separately, its
response to this input reflects the ambiguities of this novel trajectory—the
curves for power and precision cross towards the end of the action, showing the
resolution of the initial ambiguity by the network. (Oztop & Arbib, 2002,
reproduced with kind permission of Springer Science and Business Media).
E. Oztop et al. / Neural Networks xx (xxxx) 1–1810
+ model ARTICLE IN PRESS
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
section adopted this view, it did not model the motor
component, which is addressed by the MSI model.
C 11211122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
UNCORRE11.1. Visual feedback control of grasping and the forward
model hypothesis for the mirror neurons
The mental state inference (MSI) model builds upon a visual
feedback circuit involving the parietal and motor cortices, with
a predictive role assigned to mirror neurons in area F5. For
understanding others’ intentions, this circuit is extended into a
mental state inference mechanism (Oztop et al., 2005). The
global functioning of the model for visual feedback control
proceeds as follows. The parietal cortex extracts visual features
relevant to the control of a particular goal-directed action (X,
the control variable) and relays this information to the
premotor cortex. The premotor cortex computes the motor
signals to match the parietal cortex output (X) to the desired
neural code (Xdes) relayed by prefrontal cortex. The ‘desired
change’ generated by the premotor cortex is relayed to
dynamics related motor centers for execution (Fig. 8, upper
panel). The F5 mirror neurons implement a forward prediction
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
circuit (forward model) estimating the sensory consequences of
F5 motor output related to manipulation, thus compensating for
the sensory delays involved in a visual feedback circuit. This is
in contrast to the generally suggested idea that mirror neurons
serve solely to retrieve an action representation that matches
the observed movement. During observation mode, these F5
mirror neurons are used to create motor imagery or mental
simulation of the movement for mental state inference (see
below). Although MSI does not specify the region within
parietal cortex that performs control variable computation,
recent findings suggest that a more precise delineation is
possible. Experiments with macaque monkeys indicate that
parietal area PF may be involved in monitoring the relation of
the hand with respect to an object during grasping. Some of the
PF neurons that do not respond to vision of objects become
active when the monkey (without any arm movement) watches
movies of moving hands (of the experimenter or the monkey)
for manipulation, suggesting that the neural responses may
reflect the visual feedback during observed hand movements
(Murata, 2005). It is also possible that a part of AIP may be
involved in monitoring grasping as shown by transcranial
magnetic stimulation (TMS) with humans (Tunik, Frey, &
Grafton, 2005). As in the MNS model, area F5 (canonical) is
involved in converting the parietal output (PF/AIP) into motor
signals, which are used by primary motor cortex and spinal
cord for actual muscle activation. In other words, area F5 non-
mirror neurons implement a control policy (assumed to be
learned earlier) to reduce the error represented by area PF/AIP
output.
11.2. Mental state inference
The ability to predict enables the feedback circuit of Fig. 8
(upper panel) to be extended into a system for inferring the
intentions of others based on the kinematics of goal directed
actions (see Fig. 8 lower panel). In fact, the full MSI model
involves a ‘mental simulation loop’ that is built around a
forward model (Blakemore & Decety, 2001; Wolpert &
Kawato, 1998), which in turn is used by a ‘mental state
inference loop’ to estimate the intentions of others. The MSI
model is described for generic goal-directed actions, however
here we look at the model in relation to a tool grasping
framework where two agents can each grasp a virtual hammer
with different intentions (holding, nailing or prying a nail).
Depending on the planned subsequent use of a hammer,
grasping requires differential alignment of the hand and the
thumb. Thus, the kinematics of the action provides information
about the intention of the actor. For this task, the mental state
was modeled as the intention in grasping the hammer. Within
this framework an observer ‘guesses’ the target object (one of
various objects in the demonstrator’s workspace) and the type
of grasp and produce an appropriate F5 motor signal that is
inhibited for actual muscle activation but used by the forward
model (MNs) (see Fig. 8 lower panel). With the sensory
outcome predicted by the MNs, the movement can be
simulated as if it were executed in an online feedback mode.
The match of the simulated sensations of a simulated
TD P
ROOF
Fig. 8. Upper panel: the MSI model is based on the illustrated visual feedback control organization. Lower panel: observer’s mental state inference mechanism.
Mental simulation of movement is mediated by utilizing the sensory prediction from the forward model and by inhibiting motor output. The difference module
computes the difference between the visual control parameters of the simulated movement and the observed movement. The mental state estimate indicates the
current guess of the observer about the mental state of the actor. The difference output is used to update the estimate or to select the best mental state. (Adapted from
Oztop et al., 2005).
Knuckle vector
Handle vector
Distance to handle
Via point forhandle grasping
Hammer normal
Distance tometal headVia point for
A
B
Handle Grasping
Metal-head Grasping
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 11
+ model ARTICLE IN PRESS
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
ORRECmovement with the sensation of observed movement will then
signal the correctness of the guess. The simulated mental
sensations and actual perception of movement is compared in a
mental state search mechanism. If the observer model ‘knows’
the possible mental states in terms of discrete items an
exhaustive search in the mental state space can be performed.
However, if the mental state space is not discrete then a
gradient based search strategy must be applied. The mental
state correction (i.e. the gradient) requires the parietal output
(PF) based errors (the Difference box in Fig. 8) to be converted
into ‘mental state’ space adjustments, for which a stochastic
gradient search can be applied (see Oztop et al. (2005) for the
details).
CPalm normalmetal-head grasping
Fig. 9. (A) The features used for nailing task (orientation and normalized distance)
are depicted in the right two arm drawings. The path of the hand is constrained with
appropriate via-points avoiding collision. The arm drawing on the left shows an
example of a handle grasping for driving a nail. The prying task is same as A except
7 that the Handle vector points towards the opposite direction (not shown). (B) The
features extracted for metal-head grasping is depicted (conventions are the same as
the upper panel). (Adapted from Oztop et al., 2005).
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
UN11.3. Relation to MNs and Imitation
A tool-use experiment was set-up in a kinematics simulation
where two agents could grasp a virtual hammer. The visual
parameters used to implement the feedback servo for grasping
(i.e. normalized distance and the orientation difference—see
Fig. 9) were object centered and provided generalization
regardless of the owner of the action (self vs. other). The time
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
Evarying (mental simulation)!(observation) matrix shown in
Fig. 10 represents the dynamics of an agent observing an actor
performing tasks of holding, nailing and prying (rows of the
matrix). Each column of the matrix represents the belief of the
observer as to whether the other is holding, nailing or prying.
TD P
ROOF
Fig. 10. The degree of similarity between visually extracted control variables and control variables obtained by mental simulation can be used to infer the intention of
an actor. Each subplot shows the probability that the observed movement (rows) is the same as the mentally simulated one (columns). The horizontal axis represents
the simulation time from movement start to end. The control variables extracted for the comparison are based on the mentally simulated movement. Thus, for
example, the first column inferences require the control parameters for holding (normalized distance to metal head and the angle between the palm normal and
hammer plane). The convergence to unity of the belief curves on the main diagonal indicates correct mental state inference. (Adapted from Oztop et al., 2005).
E. Oztop et al. / Neural Networks xx (xxxx) 1–1812
+ model ARTICLE IN PRESS
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
CORRECEach cell shows the degree of similarity between mentally
simulated movement and the observed movement from move-
ment onset to movement end (as a belief or probability). The
observer can infer the mental state of the actor before the midpoint
of the observed movements as evidenced by the convergence of
belief curves to unity along the diagonal plots. Thus, although not
the focus of the study, the MSI model offers a basic imitation
ability that is based on reproducing the inferred intention (mental
state) of an observer. However, the actions that can be imitated are
thus limited to the ones in the existing repertoire and may not
respect the full details of the observed act. With MSI the dual
activation of MNs (forward model) is explained by the automatic
engagement of mental state inference during an action
observation, and by the forward prediction task undertaken by
the MNs for motor control during action execution.
N 13611362
1363
1364
1365
1366
1367
U12. Modular selection and identification for control
(MOSAIC) model
The MOSAIC model (Haruno et al., 2001; Wolpert &
Kawato, 1998) was introduced initially for motor control,
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
Eproviding mechanisms for decentralized automatic module
selection so as to achieve best control for the current task. In
this sense, compared to the earlier models surveyed,
MOSAIC is a sophisticated motor control architecture. The
key ingredients of MOSAIC are modularity and the
distributed cooperation and competition of the internal
models. The basic functional units of the model are multiple
predictor–controller (forward–inverse model) pairs where
each pair competes to contribute to the overall control (cf.
Jacobs, Jordan, Nowlan, and Hinton (1991), where the
emphasis is on selection of a single processor). The
controllers with better predicting forward models (i.e. with
higher responsibility signals) become more influential in the
overall control (Fig. 11). The responsibility signals are
computed with the normalization step shown in Fig. 11 based
on the prediction errors of the forward models via the
softmax function (see Table 1, last row). The responsibility
signals are constrained to be between 0 and 1, and add up to
1, so that that they can be considered as probabilities
indicating the likelihood of the controllers being effective for
the current task.
1368
T
Controller 1
Controller 2
Controller 3
Predictor 1
Predictor 2
Predictor 3
Body
norm
aliz
atio
n
+
+
+
+
x
x
x
Motorcommand
Sensory feedback (state)
Predicted state
Predictionerror
Responsibility signalsDesiredtrajectory
Fig. 11. The functioning of the MOSAIC model in the control mode. The responsibility signals indicate how well the control modules are suited for the control task at
hand. The overall control output is the sum of the output of controller modules as weighted by the responsibility signals.
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 13
+ model ARTICLE IN PRESS
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
The aim of motor control is to produce motor commands
G(t) at time t such that a desired state9 xdes(t) is attained by the
controlled system dynamics j. The net motor output G of the
MOSAIC model is determined by a set of adaptive controller–
predictor pairs (ji,fi) via the responsibility signals li which are
computed using the predictor outputs and the current state of
the system. The equations given in Table 1 describe the control
mechanism more rigorously (for simplicity, we use a discrete
time representation). The adaptive nature of the controller–
predictor pairs is shown with semicolons as ji($;wi) fi($;yi)meaning that ji and fi are functions determined by the
parameters wi and yi, which are typically the weights of a
function approximator or a neural network.
Rather than presenting the details of how the controller–
predictor pairs can be adapted (trained) for a variety of the
tasks we note that MOSAIC is described without strict
attachment to a particular learning method, so it is possible
to derive various learning algorithms for adapting controller–
predictor pairs. In particular, gradient descent (Wolpert &
Kawato, 1998) and expectation maximization (Haruno et al.,
2001) learning algorithms have been derived and applied for
motor control learning.
Table 1
The equations describing the control function of the MOSAIC model
Dynamics of the controlled system xðtC1ÞZJðxðtÞ;GðtÞÞ
The MOSAIC (net) control output GðtC1ÞZP
i lijiðxdðtC1Þ;xðtÞ;wiÞ
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
NCORREC12.1. Imitation and action recognition with MOSAICAlthough MOSAIC was initially proposed for motor
control, it is possible to utilize it for imitation and action
recognition. This dual use of the model establishes some
parallels between the model and the mirror neuron system. The
realization of imitation (and action recognition) with MOSAIC
requires three stages: first, the visual information of the actor’s
movement must be converted into a format that can be used as
inputs to the motor system of the imitator (Wolpert et al.,
2003). This requires that the visual processing system extracts
variables akin to state (e.g. joint angles) which can be fed to the
imitator’s MOSAIC as the ‘desired state’ of the demonstrator
(Wolpert et al., 2003). The second stage is that each controller
U9 The term ‘state’ generally represents the vector of variables that arenecessary to encapsulate the history of the system as a basis for describing the
system’s response to the external inputs, which then involves specification of
current output and of the updating of the state. For a point mass physical system
the state combines the position and velocity of the mass.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
OOF
generates the motor command required to achieve the observed
trajectory (i.e. the desired trajectory obtained from the
observation). In this ‘observation mode’, the outputs of the
controllers are not used for actual movement generation, but
serve as input to the predictors paired with the controllers (see
Fig. 12). Thus, the next likely states (of the observer) become
available as the output of the forward predictions. These
predictions then can be compared with the demonstrator’s
actual next state to provide prediction errors that indicate, via
responsibility signals, which of the controller modules of the
imitator must be active to generate the movement observed
(Wolpert et al., 2003).
ED PR12.2. Relation to MNs and imitation
The responsibility signals are computed by (softmax)
normalizing the prediction errors as shown in Fig. 12. Notice
that the responsibility signals can be treated as symbolic
representation describing the observed (continuous) action.
The temporal stream of responsibility signals representing the
observed action then can be used immediately or stored for
later reproduction of the observed action. Simulations with the
MOSAIC model indicate that the aforementioned imitation
mechanism can be used to imitate the task of swinging up a one
degree of freedom jointed stick against gravity through
observation of successful swing ups (Doya et al., 2000).
Within the MOSAIC framework the output of predictors
might be considered analogous to mirror neuron activity. This
would be compatible with the view of the MSI model, where it
is suggested that the mirror neurons may implement a motor-
to-sensory forward model. It is however necessary to point out
one difference. The MSI model deals with motor control
relying only on visual input and kinematics (leaving the
Individual controller (inverse model)
outputs
uiðtÞZjiðxdðtC1Þ;xðtÞ;wiÞ
Individual predictions (forward
model)
xiðtC1ÞZfiðxðtÞ;uðtÞ;yiÞ
The responsibility signals liðtÞZeðxðtÞKxi ðtÞÞ
2 =d2
Pk
eðxðtÞKxk ðtÞÞ2 =d2
1477
1478
1479
1480
1481
1482
T
Controller 1
Controller 2
Controller 3
Predictor 1
Predictor 2
Predictor 3
ACTOR
norm
aliz
atio
n+
+
+
Current state
Predicted state Prediction error
Responsibilitysignals
DesiredTrajectory State trajectory
Fig. 12. The functioning of the MOSAIC model in the observation mode is illustrated. For imitation, the responsibility signals indicate which of the controller
modules must be active to generate the movement observed.
E. Oztop et al. / Neural Networks xx (xxxx) 1–1814
+ model ARTICLE IN PRESS
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
dynamics to the lower motor areas). In contrast, MOSAIC is a
true control system that deals with dynamics. Thus, the nature
of forward models in the two architectures is slightly different.
The output of the forward model required by MSI is in visual-
like coordinates (e.g. the orientation difference of the hand axis
and the target object), whereas for MOSAIC the output of the
forward models are more closely related to the intrinsic
variables of the controlled limb (e.g. joint position and
velocities). However, it is possible to envision an additional
dynamics-to-visual forward model that can take MOSAIC
forward model output and convert it to some extrinsic
coordinates (e.g. distance to the goal). In a way, the MSI’s
forward model can be such an integrated prediction circuit
implemented by several brain areas.
A final note here is that the internal models envisioned in the
neuroscience literature (e.g. Carr et al., 2003; Iacoboni et al.,
1999; Miall, 2003) are usually at a much higher level than the
internal models of the MOSAIC or the MSI model introduced
here, which are much harder to learn from a computational
point of view.
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
UNCORREC13. A Taxonomy of models based on modeling methodology
When the system to be modeled is complex it is often
necessary to focus on one or two features of the system in any
one model. The focus of course is partly determined by the
modeling methodology followed by the modeler. Here, we
present a taxonomy of modeling methodologies one can
follow, and compare the models we have presented
accordingly.
The utility of a model increases with its generality and
ability to explain and predict observed and unobserved
behavior of the system modeled. The validity and utility of a
model is leveraged when all the known facts are incorporated
into the model. This is called data driven modeling where the
modeler’s task is to develop a computational mechanism
(equations and computer simulations) that replicates the
observed data with the hope that some interesting, non-trivial
predictions can be made. This is the main modeling approach
for cellular level neuron modeling. Although one would expect
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
that the neurophysiological data collected so far is to be widely
used as the basis for computational modeling, we unfortunately
lack sufficient quantitative data on the neurophysiology of the
mirror system. Most mirror system (related) modeling assumes
the generic properties of mirror neurons to build imitation
systems rather than addressing hard data. We have included
this type of model in the taxonomy because there is certain
utility of those models as they lead to questions about the
relation between mirror neurons and imitation.
Models based on evolutionary algorithms have been used in
modeling the behaviors of organisms, and developing neural
circuits to achieve a prespecified goal (e.g. central pattern
generators) in a simplified simulated environment. Although
this type of modeling, in general, does not make use of the
available data, the model of Borenstein and Ruppin (2005)
suggests how mirror neurons might have come to be involved
in imitation or other cognitive tasks. However, as we have
already noted, ‘real’ evolution may have exapted the mirror
system to support imitation, rather than starting from the need
to imitate and ‘discovering’ mirror neurons as a necessary tool.
An evolutionary point of view can also be adopted to build
models that do not employ evolutionary algorithms, which start
off by postulating a logical reason for the existence of the
mirror neurons. The logic can be based on the location of the
mirror neurons, or on the known general properties of neural
function. The former logic dictates that mirror neurons must be
involved in motor control. The latter logic (phenomenal) draws
on Hebbian plasticity mechanisms and dictates that represen-
tations of contingent events are associated in the cerebral
cortex. Fig. 13 illustrates how the models we have presented
fall into our taxonomy. However, note that this taxonomy
should not be taken as defining sharp borders between models.
Models focusing on imitation can be cast as being developed
following a ‘reason of existence’. For example DRAMA
architecture (Billard & Hayes, 1999) employs a Hebbian like
learning mechanism and thus can be considered in the ASSOC
category in Fig. 13. Similarly, although no motor control role
was emphasized for mirror neurons in Demiris’s imitation
system (Demiris & Johnson, 2003), it employs mechanisms
similar to those of the MOSAIC model.
UNCORRECTED PROOF
Modeling Methodology
AssumeExistence
Reason forExistence
MNS&
MNS2
MOSAIC
MSI
Data DrivenEvolutionaryAlgorithm
Phenomenal Motor ControlArtificial and
PhysicalImitation Systems
Virtual World andAgent Systems
?Borenstein &Ruppin
AnatomyPhysiology Behavior
ASSOC Demiris
RNNPB
DRAMA
Fig. 13. A taxonomy of modeling methodologies and the relation of the models presented. Dashed arrows indicate that the linked models are similar or can be cast to
be similar (see text).
Table 2
A very brief summary of models presented in terms of biological relevance, architecture and the relevant results and predictions
Biological relevance Architecture Results/predictions
RNNBP N/A (general) Distributed representation Imitation, action recognition
Recurrent network with a
complex generative model
There should be MN inhibition for undesired
action imitation
MOSAIC and
Demiris
N/A (general) Modular, localist Demiris: no
module learning
Modular control allows imitation and action
recognition
MOSAIC: Modules learn with
EM or gradient based learning
MN response limited to certain speed range
Existence of subaction encoding units
Borenstein and
Ruppin
N/A (evolutionary model) Virtual world, agent system Imitation and MN formation is favored by natural
selection
MSI Based on general organization of system level
anatomy (premotor and parietal)
Connectionist schemas Mental state Inference
Kinematics of observed action should correlate
with mirror activity
ASSOC and
DRAMA
N/A (general. However, Billard & Mataric, 2003
embedded DRAMA in a biologically inspired
imitation architecture)
Recurrent, Hopfield like Imitation and MN formation
Self-observation may be involved in formation of
MNs and development of imitation
MNS Uses system level anatomy (premotor and parietal)
and addresses mirror neuron firing
Hidden layer back-propagation
network with supporting schemas
Prediction on neural firing patterns of MNs for:
Altered kinematics
Spatial mismatch
Novel object grasping
MNS2 Uses system level anatomy (premotor and parietal)
and addresses mirror neuron firing
Recurrent back-propagation
network for visual input; Hebbian
synapses for auditory input;
working memory and dynamic
remapping
Addresses data on mirror neurons for grasping
including audio–visual mirror neurons and
response to grasps with hidden end-state
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 15
+ model ARTICLE IN PRESS
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
T
E. Oztop et al. / Neural Networks xx (xxxx) 1–1816
+ model ARTICLE IN PRESS
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
UNCORREC
We now contrast and define common characteristic of some
of the models presented in this article with the hope that future
(especially data driven) modeling can embrace a larger set of
mirror neuron functions. Note that the taxonomy we present is
orthogonal to the granularity of the modeling (i.e. cell, cell
population, brain region or functional/abstract). The MOSAIC
model in its original form does not advocate any brain structure
for its functional components. Similarly the associative
memory models of the mirror system (ASSOC in Fig. 13) in
general do not strongly adhere to any brain area. Therefore, we
can consider them as ‘general models’. On the other hand, MSI
and—to a larger extent—MNS associate (macaque) brain
regions with the functional components. The two models are
compatible in terms of brain areas supporting the mirror
system. MNS conceptually accepts that mirror neurons must
have a role in motor control. However, it does not provide
mechanisms to simulate this because the focus of the modeling
in MNS is the development of mirror neurons. (MNS 2 extends
MNS by addressing data on audiovisual mirror neurons and on
grasps with hidden end-state). The key to both MNS and MSI is
the object-centered representation of the actions and the self-
observation principle, which together allow actions to be
recognized irrespective of the agent performing it. However,
there is one difference: in MSI, self-observation is purposeful;
it is used to implement a visual feedback control loop for action
execution. On the other hand, the MNS model, despite the
differences in the adaptation mechanisms, resembles the
associative memory models that operate on the principle of
association of stimuli produced as the result of movement
execution. The purposeful self-observation (visual feedback)
principle of the MSI model establishes the logical link between
the MSI and the MOSAIC model in spite the differences of two
models in terms of motor control ability. The MOSAIC model
is crafted for true motor control (i.e. it considers dynamics)
whereas MSI deals only with kinematics. Table 2 provides a
succinct account of the presented models by enlisting the key
properties and results relevant for this article.
14. Conclusion
In spite the accumulating evidence that humans are
endowed with a mirror system (Buccino et al., 2001; Hari
et al., 1998; Iacoboni et al., 1999), it is still an open question
how our brains make use of this system. Is it really used for
imitation or mental state estimation? Or is it simply an action
recognition system? A predominant assumption among
computational modelers is that human mirror neurons subserve
imitation. Although there are many imitation models based on
this view, there are virtually no studies addressing the
assumption itself. We need biologically grounded compu-
tational models to justify this view; the models of this sort
should address the computational requirements and the
possible evolutionary changes in neural circuitry necessary to
allow mirror neurons to undertake an important role for
imitation.
Our view is that, the location of mirror neurons (MNs)
indicates that the function of mirror neurons must be rooted in
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
motor control. We emphasize that future computational models
of MNs that share the same view, regardless of whether they
address imitation or not, must explain the dual role of the
mirror system by showing computationally that MNs perform a
useful function for motor control. Note that this need not mean
that all mirror neurons in the human brain that are
‘evolutionary cousins’ of macaque mirror neurons for grasping
need themselves be involved in manual control—one can
accept that mirror neurons for language have this cousinage
without denying that lesions can differentially yield aphasia
and apraxia (see Barrett, Foundas, & Heilman, 2005, and also
the Response in Arbib, 2005).
Imitation is just one way to look at the problem; the mirror
neurons must also be analyzed and modeled in an imitation-
decoupled way. For example, is it possible that MNs are simply
the consequence of Hebbian learning, i.e. an automatic
association of the corollary discharge and the subsequent
sensory stimuli generated as the associative memory hypoth-
esis claims? The MNS (Oztop & Arbib, 2002) and MNS2
(Bonaiuto et al., 2005) models postulate that some extra
structure is required, both to constrain the variables relevant for
the system, and to track trajectories of these relevant variables.
Most models assume that the relevant variables are indeed
supplied as input. The work of Kumashiro et al. (2003) reminds
us that in fact the mirror system must be augmented by an
attentional system that can ensure that the appropriate variables
concerning agent and object are made available.
The MNS model shows how mirror neurons may learn to
recognize the hand-state trajectories (hand-object relation-
ships) for an action already within the repertoire. We have
made clear that extra machinery is required to go from a novel
observed action to a motor control regime which will satisfy it.
MOSAIC approaches this by generating a sequence of
responsibility signals representing the observed action as a
string of segments of known actions whose visual appearance
will match the observed behavior. Arbib and Rizzolatti (1997)
set forth the equation ActionZMovementCGoal to stress the
importance of seeing movements in relation to goals, rather
than treating them in isolation. Arbib (2002) then argued that
humans master complex imitation by approximating novel
actions by a combination (sequential as well as co-temporal) of
known actions, and then making improvements both by
attending to missing subactions, and by tuning and coordinat-
ing the resulting substructures. This is much the same as what
Wohlschlager, Gattis, and Bekkering (2003) call goal-directed
imitation. On their view, the imitator does not imitate the
observed movement as a whole, but rather decomposes it into
hierarchically ordered aspects (Byrne & Russon, 1998), with
the highest aspect becoming the imitator’s main goal while
others become subgoals. The main goal activates the motor
schema that is most strongly associated with the achievement
of that goal. Of course, there is no ‘magic’ in complex imitation
which automatically yields the right hierarchical decompo-
sition of a movement. Rather, it may be the success or failure of
a ‘high-level approximation’ of the observed action that leads
to attention to crucial subgoals which were not observed at first,
and thus leads, perhaps somewhat circuitously, to successful
T
E. Oztop et al. / Neural Networks xx (xxxx) 1–18 17
+ model ARTICLE IN PRESS
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
imitation. But the point remains that this process is in general
much faster than the time-consuming extraction of statistical
regularities in Byrne’s (2003) ‘imitation by (implicit) behavior
parsing’—we might refer to complex (goal-directed) imitation
as imitation by explicit behavior parsing. Finally, we note that
this process is postulated to result in a new motor schema
(forwardCinverse model) which may be linked to those
previously available, but which now constitutes a new action
which is henceforth available to further refinement of inverse
and forward models separate from those which have been
acquired before.
Although a decade has passed since the first reports of
mirror neurons came out, the reciprocal lack of full knowledge
(or interest) between the sides of conceptual and computational
modeling is evident. To close this gap experimentalists should
conduct experiments that involve quantitative measurement,
e.g. relating neuronal activity to synchronized kinematics
recordings of the experimenter and monkey during action
demonstration and execution. This could shed light on the
debate between those who believe a mirror neuron encodes a
specific action, and those who seek to understand how mirror
neurons may provide population codes for action-related
variables. The correlation between the discharge profiles of
mirror neurons with various visual feedback parameters will
provide modelers invaluable information to construct models
that capture neurophysiological facts. Likewise, we do not
know anything about the developmental stages of mirror
neurons. Are they innate? Probably not, so what circuitry and
adaptation mechanisms are involved? To get answers for these
questions, computational modeling that can provide a causally
complete account of mirror neurons and the larger system of
which they are part is crucial. Only then would it be possible to
develop models of motor control, imitation, mental state
inference, etc. that assign various roles to mirror neurons and
their interactions with diverse brain regions in an empirically
justified way.
1919
1920
1921
1922
1923
1924
1925
1926
RRECAcknowledgements
Writing of this paper was supported in part by JST-ICORP
Computational Brain Project and in part by NIH under grant 1
P20 RR020700-01 to the USC/UT Center for the Interdisci-
plinary Study of Neuroplasticity and Stroke Rehabilitation
(ISNSR). We thank Jun Tani for his feedback on Section 6.
19271928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
UNCOReferences
Arbib, M. (2002). The mirror system, imitation, and the evolution of language.
In C. Nehaniv, & K. Dautenhahn (Eds.), Imitation in animals and artifacts
(pp. 229–280). Cambridge, MA: MIT Press.
Arbib, M., & Rizzolatti, G. (1997). Neural expectations: A possible
evolutionary path from manual skills to language. Communication and
Cognition, 29, 393–424.
Arbib, M. A. (1981). Perceptual structures and distributed motor control. In
V. B. Brooks (Vol. Ed.), Handbook of physiology, section 2: The nervous
system. Motor control, part 1: Vol. II (pp. 1449–1480). Bethesda, MD:
American Physiological Society.
NN 2126—13/3/2006—02:10—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
Arbib, M. A. (2005). From monkey-like action recognition to human language:
An evolutionary framework for neurolinguistics. Behavioral and Brain
Sciences, 28(2), 105–124 (discussion 125–167).
Arbib, M. A., Erdi, P., & Szentagothai, J. (1998). Neural organization:
Structure, function, and dynamics. Cambridge, MA: MIT Press.
Barrett, A. M., Foundas, A. L., & Heilman, K. M. (2005). Speech and gesture
are mediated by independent systems. Behavioral and Brain Sciences, 28,
125–126.
Billard, A., & Hayes, G. (1999). DRAMA, a connectionist architecture for
control and learning in autonomous robots. Adaptive Behavior, 7(1), 35–63.
Billard, A., & Mataric, M. J. (2001). Learning human arm movements by
imitation: Evaluation of a biologically inspired connectionist architecture.
Robotics and Autonomous Systems, 37(2–3), 145–160.
Blakemore, S. J., & Decety, J. (2001). From the perception of action to the
understanding of intention. Nature Reviews Neuroscience, 2(8), 561–567.
Bonaiuto, J., Rosta, E., & Arbib, M. A. (2005). Recognizing invisible actions,
workshop on modeling natural action selection. Workshop on modeling
natural action selection, Edinburgh.
Borenstein, E., & Ruppin, E. (2005). The evolution of imitation and mirror
neurons in adaptive agents. Cognitive Systems Research, 6(3).
Brass, M., Bekkering, H., Wohlschlager, A., & Prinz, W. (2000). Compatibility
between observed and executed finger movements: Comparing symbolic,
spatial, and imitative cues. Brain and Cognition, 44, 124–143.
Buccino, G., Binkofski, F., Fink, G. R., Fadiga, L., Fogassi, L., Gallese, V.,
et al. (2001). Action observation activates premotor and parietal areas in a
somatotopic manner: An fMRI study. European Journal of Neuroscience,
13(2), 400–404.
Buccino, G., Binkofski, F., & Riggio, L. (2004). The mirror neuron system and
action recognition. Brain and Language, 89(2), 370–376.
Byrne, R. W. (2003). Imitation as behaviour parsing. Philosophical
Transactions of the Royal Society of London Series B—Biological Sciences,
358(1431), 529–536.
Byrne, R. W., & Russon, A. E. (1998). Learning by imitation: A hierarchical
approach. Behavioral and Brain Sciences, 21(5), 667–684 (discussion
684–721).
Carr, L., Iacoboni, M., Dubeau, M.-C., Mazziotta, J. C., & Lenzi, G. L. (2003).
Neural mechanisms of empathy in humans: A relay from neural systems for
imitation to limbic areas. PNAS, 100(9), 5497–5502.
Dayan, P., Hinton, G. E., Neal, R. M., & Zemel, R. S. (1995). The Helmholtz
machine. Neural Computation, 7(5), 889–904.
Demiris, Y., & Hayes, G. (2002). Imitation as a dual-route process featuring
predictive and learning components: A biologically-plausible compu-
tational model. In K. Dautenhahn, & C. Nehaniv (Eds.), Imitation in
animals and artifacts. Cambridge, MA: MIT Press.
Demiris, Y., & Johnson, M. (2003). Distributed, predictive perception of
actions: A biologically inspired robotics architecture for imitation and
learning. Connection Science, 15(4).
Doya, K., Katagiri, K., Wolpert, D., & Kawato, M. (2000). Recognition and
imitation of movement patterns by a multiple predictor–controller
architecture. Technical Report of IEICE TL2000-11, (33–40).
Elshaw, M., Weber, C., Zochios, A., & Wermter, S. (2004). An associator
network approach to robot learning by imitation through vision, motor
control and language. International joint conference on neural networks,
Budapest, Hungary.
Fadiga, L., & Craighero, L. (2004). Electrophysiology of action representation.
Journal of Clinical Neurophysiology, 21(3), 157–169.
Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech
listening specifically modulates the excitability of tongue muscles: A TMS
study. European Journal of Neuroscience, 15(2), 399–402.
Fogassi, L., Gallese, V., Buccino, G., Craighero, L., Fadiga, L., & Rizzolatti, G.
(2001). Cortical mechanism for the visual guidance of hand grasping
movements in the monkey—a reversible inactivation study. Brain, 124,
571–586.
Fogassi, L., Gallese, V., Dipellegrino, G., Fadiga, L., Gentilucci, M., Luppino,
G., et al. (1992). Space coding by premotor cortex. Experimental Brain
Research, 89(3), 686–690.
Gallese, V., Fadiga, L., Fogassi, L., & Rizzolatti, G. (1996). Action recognition
in the premotor cortex. Brain, 119, 593–609.
T
E. Oztop et al. / Neural Networks xx (xxxx) 1–1818
+ model ARTICLE IN PRESS
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
CORRECGallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory
of mind-reading. Trends in Cognitive Sciences, 2(12), 493–501.
Gallese, V., Keysers, C., & Rizzolatti, G. (2004). A unifying view of the basis
of social cognition. Trends in Cognitive Sciences, 8(9), 396–403.
Hari, R., Forss, N., Avikainen, S., Kirveskari, E., Salenius, S., & Rizzolatti, G.
(1998). Activation of human primary motor cortex during action
observation: A neuromagnetic study. Proceedings of the National Academy
of Sciences of the United States of America, 95(25), 15061–15065.
Haruno, M., Wolpert, D. M., & Kawato, M. (2001). Mosaic model for
sensorimotor learning and control. Neural Computation, 13(10),
2201–2220.
Hassoun, M. (1993). Associative neural memories: Theory and implementation.
New York: Oxford University Press.
Iacoboni, M., Molnar-Szakacs, I., Gallese, V., Buccino, G., Mazziotta, J. C., &
Rizzolatti, G. (2005). Grasping the intentions of others with one’s own
mirror neuron system. PLoS Biology, 3(3), e79.
Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., &
Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science,
286(5449), 2526–2528.
Ijspeert, A., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes
for learning motor primitives. In S. Becker, S. Thrun, & K. Obermayer
(Vol. Eds.), Advances in neural information processing systems: Vol. 15
(pp. 1547–1554). Cambridge, MA: MIT Press.
Ito, M., & Tani, J. (2004). On-line imitative interaction with a humanoid robot
using a dynamic neural network model of a mirror system. Adaptive
Behavior, 12(2), 93–115.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive
mixtures of local experts. Neural Computation, 3, 79–87.
Kawato, M., & Gomi, H. (1992). A computational model of 4 regions of the
cerebellum based on feedback-error learning. Biological Cybernetics,
68(2), 95–103.
Kawato, M., & Wolpert, D. (1998). Internal models for motor control. Sensory
Guidance of Movement, 218, 291–307.
Kilner, J. M., Paulignan, Y., & Blakemore, S. J. (2003). An interference effect
of observed biological movement on action. Current Biology, 13, 522–525.
Kohler, E., Keysers, C., Umilta, M. A., Fogassi, L., Gallese, V., & Rizzolatti,
G. (2002). Hearing sounds, understanding actions: Action representation in
mirror neurons. Science, 297(5582), 846–848.
Kumashiro, M., Ishibashi, H., Uchiyama, Y., Itakura, S., Murata, A., & Iriki, A.
(2003). Natural imitation induced by joint attention in Japanese monkeys.
International Journal of Psychophysiology, 50(1–2), 81–99.
Kuniyoshi,Y., Yorozu, Y., Inaba, M., & Inoue, H. (2003). From visuo-motor
self learning to early imitation—a neural architecture for humanoid
learning. International conference on robotics & automation, Taipei,
Taiwan.
Miall, R. C. (2003). Connecting mirror neurons and forward models.
Neuroreport, 14(17), 2135–2137.
Miall, R. C., & Wolpert, D. M. (1996). Forward models for physiological motor
control. Neural Networks, 9(8), 1265–1279.
Morita, M. (1996). Memory and learning of sequential patterns by
nonmonotone neural networks. Neural Networks, 9(8), 1477–1489.
Murata, A. (2005). Function of mirror neurons originated from motor control
system (in Japanese). Journal of Japanese Neural Network Society, 12(1),
52–60.
Myowa-Yamakoshi, M., & Matsuzawa, T. (1999). Factors influencing
imitation of manipulatory actions in chimpanzees (Pan troglodytes).
Journal of Comparative Psychology, 113(2), 128–136.
UN
NN 2126—13/3/2006—02:11—RAGHAVAN—203067—XML MODEL 5+ – pp. 1–18
ED PROOF
Oztop, E., & Arbib, M. A. (2002). Schema design and implementation of
the grasp-related mirror neuron system. Biological Cybernetics, 87(2),
116–140.
Oztop, E., Chaminade, T., Cheng, G., Kawato, M. (2005). Imitation
bootstrapping: Experiments on a robotic hand. IEEE-RAS international
conference on humanoid robots, Tsukuba, Japan.
Oztop, E., Wolpert, D., & Kawato, M. (2005). Mental state inference using
visual control parameters. Brain Research Cognitive Brain Research,
22(2), 129–151.
Rizzolatti, G. (1988). Functional organization of inferior area 6 in the macaque
monkey. II. Area F5 and the control of distal movements. Experimental
Brain Research, 71(3), 491–507.
Rizzolatti, G. (2005). The mirror neuron system and its function in humans.
Anatomy and Embryology (Berl), (1–3).
Rizzolatti, G., & Arbib, M. A. (1998). Language within our grasp. Trends in
Neurosciences, 21(5), 188–194.
Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual
Review of Neuroscience, 27, 169–192.
Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cortex
and the recognition of motor actions. Cognitive Brain Research, 3(2),
131–141.
Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological
mechanisms underlying the understanding and imitation of action. Nature
Reviews Neuroscience, 2(9), 661–670.
Schaal, S., Ijspeert, A., & Billard, A. (2003). Computational approaches to
motor learning by imitation. Philosophical Transactions of the Royal
Society of London. Series B, Biological Sciences, 358(1431), 537–547.
Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. (2004). Learning movement
primitives. International symposium on robotics research, Ciena, Italy.
Skipper, J. I., Nusbaum, H. C., & Small, S. L. (2005). Listening to talking faces:
Motor cortical activation during speech perception. Neuroimage, 25(1),
76–89.
Tani, J., Ito, M., & Sugita, Y. (2004). Self-organization of distributedly
represented multiple behavior schemata in a mirror system: Reviews of
robot experiments using RNNPB. Neural Networks, 17(8–9), 1273–1289.
Tunik, E., Frey, S. H., & Grafton, S. T. (2005). Virtual lesions of the anterior
intraparietal area disrupt goal-dependent on-line adjustments of grasp.
Nature Neuroscience, 8(4), 505–511.
Umilta, M. A., Kohler, E., Gallese, V., Fogassi, L., Fadiga, L., Keysers, C.,
et al. (2001). I know what you are doing: A neurophysiological study.
Neuron, 31(1), 155–165.
Wicker, B., Keysers, C., Plailly, J., Royet, J. P., Gallese, V., & Rizzolatti, G.
(2003). Both of us disgusted in my insula: The common neural basis of
seeing and feeling disgust. Neuron, 40(3), 655–664.
Wohlschlager, A., Gattis, M., & Bekkering, H. (2003). Action generation and
action perception in imitation: An instance of the ideomotor principle.
Philosophical Transactions of the Royal Society of London. Series B,
Biological Sciences, 358(1431), 501–515.
Wolpert, D. M., Doya, K., & Kawato, M. (2003). A unifying computational
framework for motor control and social interaction. Philosophical
Transactions of the Royal Society of London. Series B, Biological Sciences,
358(1431), 593–602.
Wolpert, D. M., Ghahramani, Z., & Flanagan, J. R. (2001). Perspectives and
problems in motor learning. Trends in Cognitive Sciences, 5(11), 487–494.
Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse
models for motor control. Neural Networks, 11(7–8), 1317–1329.
Wolpert, D. M., Miall, R. C., & Kawato, M. (1998). Internal models in the
cerebellum. Trends in Cognitive Sciences, 2(9), 338–347.
20442045
2046
2047
2048
2049
2050
2051
2052
Recommended