ロボティクスから探る乳幼児の社会性認知発達(長井志江)

Cognitive Neuroscience Robotics

ロボティクスから探る乳幼児の社会性認知発達

長井志江大阪大学大学院工学研究科

「認知的インタラクションデザイン学」公開講演会 2014年12月5日, 京都工芸繊維大学

Development of joint attention [Nagai et al., 2003; 2006; Nagai, 2005]

Imitation based on mirror neuron system [Nagai et al., 2011; Kawai et al., 2012]

Saliency-based attention for action learning [Nagai et al., 2008; 2010]

Infant-caregiver interaction [Nagai et al., 2012]

認知発達ロボティクス [Asada et al., 2001; 2009; Lungarella et al., 2003]

•  人間の乳幼児のように発達・学習するロボットの設計を通して，認知発達の原理を探る． – 神経科学と発達心理学の橋渡し

– 解析的アプローチから構成的アプローチへ

認知発達

多様な認知機能を生み出す原理とは？

言語使用

摸倣

自他認知

心の理論

共同注意

目標指向動作

意図理解

発達原理（生得的機能）？

利他的行動

認知発達の鍵：感覚・運動情報の予測学習 •  予測性（随伴性）Pi = 事象 i に関して，自己の運動 a(t) が感覚状態の変化 si(t) à si(t+1) に与える影響

1. 自己の発見と制御

Pself ≈1.0

2. 自他認知

Pself > Pother > 0.0

3. 他者との相互作用

Pother0 ,!, Potherm ≈ 0.9

Potherm+1,!, Pothern−1 ≈ 0.0

#$%

&%

Pother =1n

Pothern∑

[Nagai, in press]

(t : time)Pi si, a( ) = p si t +1( ) si t( ), a t( )( )

本日のトピック

1.  予測学習に基づく乳幼児の認知発達モデル

–  自他認知

–  目標指向動作

–  共同注意

2.  自閉症スペクトラム障害のメカニズム – 予測誤差への異常な感度が引き起こす社会性障害

乳幼児の自他認知の発達

(Adapted from “The Baby Human 2” Discovery Channel)

乳幼児の自他認知の発達

[Rochat & Mogan, 1995]

[Bahrick & Watson, 1985]

[Rochat & Striano, 2002]

仮説：予測性に基づく自他認知 [Nagai et al., ICDL-EpiRob 2011]

•  感覚・運動情報の時空間的予測性に基づく自他識別 – 自己 = 高予測性，他者 = 低予測性

•  知覚の精緻化に伴うミラーニューロンシステムの創発 – 初期：未分化な自他 à 後期：自他の識別

自己他者

時間的予測性空間的

予測性高

低

低

高

生起確率

G. Rizzolatti et al. / Cognitice Brain Research 3 (1996) 131-141 133

Geometric solids of different size and shape were placed inside the box. The monkey started each trial by pressing a switch. Switch lit the box and made the object visible. After a delay of 1.2-1.5 s, the box front door opened, thus allowing the monkey to reach for and grasp the object. The

A

2°t__ B

20

......, i s iiiitlJ 71: !

f

b , " , ' . ' . ' . ' , ' . ' . k . . . . , ,

C

20

, ?,, ,,,,,,,,;-,,

, !

I s

animal was rewarded with a piece of food located in a well under the object. Arm and hand movements were recorded using a computerized movement recording system (ELITE System, see [14]). This system consists of two infrared TV-cameras and a processor which elaborates the video images in real time and reconstructs the 3D position of infrared reflecting markers. The markers used for recon- structing the monkey's hand and arm movements were placed on the first phalanges of the thumb and the index finger and on the radial apophysis.

2.3. Testing of 'mirror' properties

'Mirror' properties were tested by performing a series of motor actions in front of the monkey. These actions were related to food grasping (e.g. presenting the food to the monkey, putting it on a surface, grasping it, giving it to a second experimenter or taking it away from him), to manipulation of food or other objects (e.g. breaking, tear- ing, folding), or were intransitive gestures (non-object related) with or without 'emotional' content (e.g. threaten- ing gestures, lifting the arms, waving the hand, etc.).

In order to verify whether the recorded neuron coded specifically hand-object interactions, the following actions were also performed: movements of the hand mimicking grasping in the absence of the object; prehension movements of food or other objects performed with tools (e.g. forceps, pincers); simultaneous combined movements of the food and hand, spatially separated one from the other. All experimenter's actions were repeated on the right and on the left of the monkey at various distances (50 cm, 1 m and 2 m).

The animal's behavior and the experimenters' actions during testing of complex visual properties were recorded on one track of a videotape. The neural activity was simultaneously recorded on a second track, in order to correlate the monkey's behavior or the experimenters' actions to the neuron's discharge. When possible, response histograms were also constructed using a contact detecting circuit for aligning behavioral events and neuron's discharge.

Fig. 2. Visual and motor responses of a mirror neuron. The behavioral situations are schematically represented in the upper part of each panel. In the lower part are shown a series of consecutive rasters and the relative peristimulus response histograms. A, the experimenter grasps a piece of food with his hand and moves it towards the monkey who, at the end of the trial, grasps it. The neuron discharges during grasping observation, ceases to fire when the food is moved and discharges again when the monkey grasps it. B, the experimenter grasps the food with a tool. Subsequent sequence of events as in A. The neuron response during action observation is absent. C, the monkey grasps food in the darkness. In A and B the rasters are aligned with the moment in which the food is grasped by the experimenter (vertical line across the rasters). In C the alignment is with the approximate beginning of the grasping movement. Histogram bin width: 20 ms. Ordinates, spikes/bin; abscissae, time.

ミラーニューロンシステム (MNS: Mirror Neuron System)

•  サルのF5野（腹側運動前野）に発見された「鏡」のように反応するニューロン群 [Rizzolatti et al., 1996]

•  MNSの発火条件 – 自分で動作を実行しているとき – 他者が同じ動作をしているのを観察するとき

•  MNSの役割 – 他者運動の理解 – 模倣 –  etc.

132 G. Rizzolatti et aL/ Cognitive Brain Research 3 (1996) 131-141

Recently we discovered that a particular subset of F5 neurons, which from the motor point of view are undistin- guishable from the rest of the population, discharge when the monkey observes meaningful hand movements made by the experimenter ( 'mirror neurons') [9]. The effective experimenter's movements included, among others, plac- ing or taking away objects from a table, grasping food from another experimenter, manipulating objects. There was always a link between the effective observed movement and the effective executed movement.

These data suggest that area F5 is endowed with an observation/execution matching system. When the monkey observes a motor action that belongs (or resembles) its movement repertoire, this action is automatically retrieved. The retrieved action is not necessarily executed. It is only represented in the motor system. We speculated that this observation/execution mechanism plays a role in understanding the meaning of motor events [9,22].

The main aim of the present article is to discuss this proposal, taking into consideration some recent data showing that an observation/execution matching system does exists in man [13] and that the cortical region involved in this matching is a part of the region usually referred to as Broca's area [53]. Since this article means to be essentially a theoretical article, in the Results section we will present only a description of the most important features of 'mirror' neurons and will show some examples of them. A detailed description of these neurons and all the control experiments (e.g. EMG recordings, recordings from F1 neurons) that we performed in order to exclude that 'mirror' effect could be due to monkey's movements or other spurious factors will be presented elsewhere.

2. Materials and methods

2.1. Recording

Single neurons were recorded from two unanesthetized, behaving monkeys (Macaca nemestrina). All experimental protocols were approved by the Veterinarian Animal Care and Use Committee of the University of Parma and com- plied with the European law on the humane care and use of laboratory animals.

The surgical procedures for neuron recordings were the same as previously described [17,54]. The head implant included a head holder and a chamber for single-unit recordings. Neurons were recorded using tungsten micro- electrodes inserted through the dura which was left intact. Neuronal activity was amplified and monitored on an oscilloscope. Individual action potentials were isolated with a time-amplitude voltage discriminator. The output signal from the voltage discriminator was monitored and fed to a PC for analysis.

2.2. 'Clinical' testing and behavioral paradigm

All neurons were first informally tested by showing the monkey objects of different size and shape, and by letting him grasp them (for details see [17,52]). Every time a neuron became active during the monkey's hand movements, its properties were studied in a behaviorally con- trolled situation. A testing box was placed in front of the monkey. The box front door was formed by a one-way mirror. The room illumination was such that the monkey could not see inside the box during intertrial periods.

rip LIP" ~ ' ~

7b

s

Fig. 1. Lateral view of the monkey brain. The shaded area shows the anatomical localization of the recorded neurons. Frontal agranular cortical areas are classified according to Matelli et al. [33]. Abbreviations: AlP, anterior intraparietal area; AIs, inferior arcuate sulcus; ASs, superior arcuate sulcus; Cs, central sulcus; IPs, intraparietal sulcus; LIP, lateral intraparietal area; Ls, lateral sulcus; MIP, medial intraparietal area; Ps, principal sulcus; SI, primary somatosensory area; SII, secondary somatosensory area; STs, superior temporal sulcus; VIP, ventral intraparietal area. Note that IPs and Ls have beeen opened to show hidden areas. [Rizzolatti et al., 1996]

仮説：予測性に基づく自他認知 [Nagai et al., ICDL-EpiRob 2011]

•  感覚・運動情報の時空間的予測性に基づく自他識別 – 自己 = 高予測性，他者 = 低予測性

•  知覚の精緻化に伴うミラーニューロンシステムの創発 – 初期：未分化な自他 à 後期：自他の識別

(3) 成熟した知覚 à 自他の識別

(2)

自己他者

時間的予測性空間的

予測性高

低

低

高

生起確率

(1) 未熟な知覚 à 未分化な自他

MNSの創発メカニズム：発達「初期」

未分化な自己と他者

運動出力

視覚入力

[Nagai et al., ICDL-EpiRob 2011]

MNSの創発メカニズム：発達「後期」

MNS

自己運動

他者運動

運動出力

視覚入力


結果1：視覚の精緻化にともなう自他の分化

未分化な自他自己の運動表象他者の運動表象

視覚発達


自己

他者

結果2：感覚・運動マップに獲得されたMNS 自己の運動表象

他者の運動表象

自己の運動指令

(a) 視覚発達あり

強

弱 MNS

(b) 視覚発達なし


乳幼児の目標指向動作の発達

[Carpenter et al., 2005]

[Bekkering et al., 2000]

Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008

158 BEKKERING, WOHLSCHLAÈ GER, GATTIS

children’s imitation of these gestures. We reasoned that limiting the set of movements to

only one ear would eliminate the problem of choosing an ear. As a consequence, another

goal–using the correct hand–might be ful®lled.

Method

Subjects

Participants were nine pre-school children, aged 4:0 to 5:11 years (mean age of 4:4 years). Each

child was tested individu ally in a quiet room.

Design and Procedure

Experiment 2 was similar to Experiment 1, with the exception that now only two movements were

modelled : an ipsilateral and a contralateral movement, both to the same ear. Right and left ear were

counterbalanced between-participants. For four children, the model always moved with either the

left (thus with the ipsilateral) hand or the right (thus with the contralateral) hand to the left ear; for

the other ®ve participants the model always moved to the right ear. The two movements were

repeated 12 times in total, in a random order, resulting in 6 ipsi- and 6 contralateral hand movements.

This time, all children were simply instructed, ``You do what I do’’.2

FIG. 2. Percentage of errors for the different conditions in Experiments 1±3. The blank bar represents errors

on contralateral movement trials, and the striped bar represents errors on ipsilateral movment trials.

2Although in Experiment 1 we instructed the children with the words, ``Try to imitate me as if you were my

mirror. You do what I do’’, in Experiments 2 and 3 we used the minimal instruction ``You do what I do’’, because

in a pilot experiment we observed that for children this automatically implies that they will copy the movements

as if they were looking in a mirror, as previously observed by Scho®eld (1976).

Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008


paused for a few seconds before initiating the next trial. Both model and participant returned hands

to standard position between items. Each response was followed by encouragement. A video camera

placed behind the experimenter, focused on the upper body parts of both the child and the experi-

menter, recorded the movements for each action. The ®nal hand position was then analysed from the

video recording. Although latency data were also collected, they yielded results that mimic those

obtained with errors and will therefore not be reported.

Results and Discussion

Children always produced one of the six possible movements, but not always the match-

ing movement. Overall, participants produced errors in 24.5% of the trials, most of which

occurred in response to contralateral modelled movements. In 40.0% of the contralateral

trials, children produced ipsilateral movements instead–a so-called contra-ipsi error (CI-

error). In contrast, ipsilateral movements were usually imitated correctly: children made a

FIG. 1. An illustration of the six hand movements used as target actions in Experiment 1.

Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008















Dow

nloa

ded

By:

[Uni

vers

ity o

f Bie

lefe

ld] A

t: 15

:44

28 J

uly

2008















F16 Malinda Carpenter et al.

© Blackwell Publishing Ltd. 2005

related if the hopping or sliding action was accompaniedby a task-related sound, or if the infant looked at theaction or at E while performing it. Infants also had tohold the mouse by its body (not tail or hair) for hop-ping, and put pressure on the mouse while sliding. Foreach trial, infants’ behavior was coded as making themouse hop or slide (i.e. matching or mismatching E’sdemonstration), moving the mouse directly to the loca-tion (i.e. picking it up and putting it there withouttouching the mat in-between), some other action style(e.g. throwing the mouse to the location), or no relevantresponse (e.g. throwing the mouse off the table).

We also coded whether infants copied E’s soundeffects. Here, a match with the adult sound effect wasscored when infants made any repeated syllables (e.g.‘dedede’) in the hopping condition or any long, singlesyllable (e.g. ‘teeee’) in the sliding condition. Finally, wecoded whether infants went to the same location (left orright) as E did. A match was scored in the House con-dition if infants put the mouse in, on top of, or directlyin front of the same house as E, and in the No Housecondition if infants put the mouse in the same left orright area of the middle of the mat as E.

Because infants did not always respond on every trial,we analyzed the percentage of matches (the number oftrials in which infants matched divided by the totalnumber of trials in which they produced a task-relatedresponse) for each of the measures. To determinewhether infants were more likely to produce a matchthan a mismatch, we also analyzed the percentage ofmatches after subtracting the percentage of mismatches(e.g. the hopping style or sound effect in the sliding con-dition, or the left location when E went to the right one).This corrected score produced the same overall results asthe percentage of matches. Only infants’ first responseson every trial were used in analyses.

An independent coder re-coded 20% of infants at eachage to assess inter-observer reliability. Excellent levels ofreliability were achieved: Cohen’s kappas were .91 foraction style (with .88 for task-relatedness), .82 for soundeffect, and .92 for location.

Results

Prior to analyses we investigated the effect of order ofpresentation of the House/No House condition on eachof the dependent variables for matching action style,sound, and location. There were no significant differ-ences between orders for any of the dependent variables(t-test: p > .10 in all cases). Therefore, we collapsed theorder of presentation in all subsequent analyses. All pvalues below are one-tailed.

Matching action style

Figure 2 presents the overall percentage of trials inwhich infants matched the adult’s action style in thetwo experimental conditions. Infants matched E’s actionstyle significantly more often in the No House than inthe House condition, F(1, 98) = 117.32, p < .001, and olderinfants produced significantly more matches thanyounger infants, F(1, 98) = 20.07, p < .001. There wasalso an interaction between age and condition, F(1, 98)= 20.19, p < .001: although the effect of condition wassignificant for both ages, the magnitude of the effect waslarger in 18-month-olds, t(65) = 13.2, p < .001, than in12-month-olds, t(33) = 3.85, p < .001. The analyses ofthe corrected percentage of matches mirrored theseresults and also showed that both 12-month-olds (t(34)= 1.82, p < .05) and 18-month-olds (t(68) = 7.59, p <.001) produced more matches than mismatches.

The same pattern of results held for each action style(hopping and sliding) separately. Figures 3 (a) and (b)present the percentage of matches of the hopping andthe sliding styles, respectively, across conditions. Infantsmatched each style significantly more often in the NoHouse than the House condition (F(1, 80) = 70.58,p < .001 for hopping and F(1, 70) = 34.84, p < .001for sliding). Older infants produced significantly morematches than younger infants (F(1, 80) = 12.42, p < .001for hopping and F(1, 70) = 9.37, p = .002 for sliding).There were significant age × condition effects for bothstyles (F(1, 80) = 11.47, p < .001 for hopping and F(1,70) = 9.64, p = .002 for sliding). For each style separ-ately, condition was a significant factor at both ages,although the magnitude of the effect was larger in 18-month-olds (t(53) = 10.11, p < .001 for hopping andt(48) = 7.61, p < .001 for sliding), than in 12-month-olds(t(27) = 3.07, p = .003 for hopping and t(22) = 1.91, p <.05 for sliding).

Figure 2 The overall percentage of matches in action style across conditions as a function of age.

Infants copy goals F15


Materials

A stuffed toy mouse and two identical black 51.5 cm ×64.5 cm mats were used. Attached to the top of one ofthe mats were two houses made of a rectangular 6 cm ×6 cm × 12 cm tube and a cardboard roof (see Figure 1).The houses were 18 cm apart from each other and cen-tered on the mat. The other mat was plain, with nothingattached.

Procedure

Infants sat on their parents’ laps across the table from afemale experimenter (E). First, E gave the mouse toinfants briefly so they could become familiar with it. Ethen placed the assigned mat (see below) on the table, letinfants explore it briefly, and then began the tests.

There were eight trials, four in each of two conditions:House and No House, corresponding to the two mats.The four trials of each condition were blocked and theorder of the blocks was counterbalanced. In both condi-tions, E performed exactly the same actions, to the same

location on the mat; the only difference between condi-tions was whether there was a house on the mat in thefinal location. For each trial, E obtained the infant’sattention and then moved the mouse to the final locationusing one of two action styles: hopping or sliding. Forthe hopping action style, E made the mouse jump in astraight line to the location, breaking contact with the matapproximately eight times. Each hop was accompaniedby a [bi] sound (so infants heard ‘beebeebeebee . . .’).For the sliding action style, E moved the mouse in astraight line to the location, never breaking contact withthe mat. This action was accompanied by one long‘beeeeeeeee’ sound.

Half the trials in each condition were to the left loca-tion and half were to the right. In half of each of thesetrials, E made the mouse hop and in half she made itslide. For each infant, the order of left/right and hop/slide trials was the same for each of the two conditions’blocks. E held the mouse in her right hand for actions toher right side and in her left hand for actions to her leftside. She started the action from the middle of her sideof the mat.

In each of the eight trials, after a single demonstrationof E moving the mouse to one of the locations and leav-ing it there for a few seconds, E picked up the mouseand placed it in front of infants in the middle of theirside of the mat. E told infants, ‘Now you.’ E waited untilinfants made a relevant response or else made it clearthat they did not wish to respond (e.g. by throwing themouse or giving it back to E), and then E took themouse and went on to the next demonstration.

Coding

It was not possible to code infants’ responses blind toexperimental condition (House/No House) because houseswere present or not throughout the response period.However, coders did not watch demonstrations and sowere blind to the action style and sound effects E used,and the left/right location in which she put the mouse.

The main measure of interest was whether infantscopied E’s action style differently in the House versusNo House conditions. All hopping and sliding move-ments were coded, even those not directed at the correctlocations. Hopping was coded if infants made the mousebreak contact with the mat more than once and slidingwas coded if infants pushed the mouse on the mat with-out breaking contact. However, because coders notedthat infants sometimes banged or slid the mouse aroundrandomly, we added the additional criterion that thehopping or sliding needed to be task-related, but thepattern of results was the same at the overall level whenall responses were included. Actions were coded as task-

Figure 1 (a) The mouse at the start location in the House condition. (b) The mouse at the end location in the No House condition.

Infants copy goals F17


Matching sound

Figure 4 presents the overall percentage of trials inwhich infants matched the adult’s sound effect in thetwo experimental conditions. Infants matched E’s soundeffect significantly more often in the No House than inthe House condition, F(1, 98) = 6.82, p < .01. This was thecase for the hopping sound separately as well, F(1, 80) =

7.70, p < .01. Overall, older infants produced significantlymore matches than younger infants, F(1, 98) = 15.96,p < .001, and this was the case for the hopping soundseparately again, F(1, 80) = 11.14, p < .001. There wasno significant age × condition effect, either overall or forthe hopping sound separately. For the sliding soundseparately, older infants produced significantly morematches than younger infants, F(1, 71) = 10.57, p = .001,but those matches occurred independently of condition(and there was no significant age × condition interaction).The analyses of the corrected percentage of matchesmirrored these overall results. Moreover, 18-month-oldsproduced more matches than mismatches overall, t(68) =5.11, p < .001, and 12-month-olds tended to do this, t(34)= 1.34, p = .095.

Matching location

Figure 5 presents the overall percentage of trials inwhich infants matched the adult’s location in the twoexperimental conditions. In this case – differently fromthe above analyses – infants matched E’s location signi-ficantly more often in the House than in the No House con-dition, F(1, 98) = 180.83, p < .001. When houses werepresent, infants very often placed the mouse in one ofthem, whereas when no houses were present, infants paidmuch less attention to the location that was the terminusof the mouse’s travels on the mat. There was no signi-ficant effect of age and no age × condition interaction. Theanalyses of the corrected percentage of matches mirroredthese results. Moreover, both age groups produced morematches than mismatches (18-month-olds: t(68) = 2.32,p < .02; 12-month-olds: t(34) = 4.52, p < .001).

Combinations of responses

Because we were also interested in infants’ social learn-ing skills above and beyond their understanding of the

Figure 3 The percentage of action style matches for (a) the hopping style and (b) the sliding style.

Figure 4 The overall percentage of matches of the demonstrated sound effect across conditions as a function of age.

Figure 5 The overall percentage of matches of location across conditions as a function of age.

Goal-match Style-match

仮説：予測学習に基づく目標指向動作の獲得[Park, Kim, & Nagai, SAB 2014]

•  感覚・運動情報の予測学習を通して，動作目標（ゴール）を先に獲得 – 動作 = ゴール + 軌跡 – 予測学習 = 予測誤差の最小化

•  ゴール: 時間スケール大 à 予測誤差大 à 学習初期

•  軌跡：時間スケール小 à 予測誤差小 à 学習後期 Path

Goal

Init

RNNPBを用いた動作学習モデル

•  RNNPB = Recurrent Neural Network with Parametric Bias [Tani & Ito, 2003]

–  PB値により複数の時系列データを1つのRNNで表現 –  Back-propagation through time で学習

PB1

PB2

学習前

PB1

PB2

学習後

[Park, Kim, & Nagai, SAB 2014] S(t), A(t) C(t)

C(t+1)

PB

S(t+1), A(t+1)

学習対象となる目標指向動作 [Carpenter et al., 2005]を再現

•  2リンクアームロボットのリーチング（6種類） –  2ゴール: A，B –  3軌跡：直線，サイン波 x 1，x 2

•  Kinesthetic teaching を通した教示 [Park, Kim, & Nagai, SAB 2014]

結果1：PB値の発達的変化と獲得された動作 (t = 0)

[Park, Kim, & Nagai, SAB 2014]

: Desired action : Acquired action

A, B: Goal 0, 1, 2: Path



結果1：PB値の発達的変化と獲得された動作 (t = 1,000)






















結果2：ロボットによる動作生成 •  学習中期 à ゴールのみ

•  学習後期 à ゴール + 軌跡

(t = 3,500) (t = 200,000)

B1 B1

: Desired action [Park, Kim, & Nagai, SAB 2014]

乳幼児の共同注意の発達

(Adapted from “The Baby Human 2” Discovery Channel)

仮説：顕著性に基づく注視と予測学習を通した共同注意の発達

[Nagai et al., Conn. Sci. 2003]

•  顕著性に基づく注視：共同注意に関連する視覚刺激（e.g., 顔，手，物体）の検出

•  感覚・運動情報の予測学習：顕著性に基づく注視経験を通した感覚・運動情報間の相関関係の獲得

[Nagai & Rohlfing, TAMD 2009]

ゲート

学習時間

運動出力

予測学習 Δθpre(t+1)

顕著性に基づく注視 Δθsal(t+1)

予測学習

顕著性に基づく注視カメラ画像: I(t) 視線方向: θ(t)

共同注意の発達モデル

視線変化 Δθ(t+1)

I(t)

θ(t)

Δθ(t+1)


結果1：複数対象物を置いた場合の学習曲線


結果2：共同注意の生成


本日のトピック

1.  予測学習に基づく乳幼児の認知発達モデル

–  自他認知

–  目標指向動作

–  共同注意

2.  自閉症スペクトラム障害のメカニズム – 予測誤差への異常な感度が引き起こす社会性障害

自閉症スペクトラム障害 (ASD: Autism Spectrum Disorder)

•  従来の考え：社会性の障害 •  新たな仮説：感覚・運動情報の統合の困難さ

[Happé & Frith, 2006; 綾屋 & 熊谷, 2008]

[Behrmann et al., 2006]

116 M. Behrmann et al. / Neuropsychologia 44 (2006) 110–129

Fig. 2. Examples of stimuli and results of global/local task. (a) Four com-pound stimuli, two of which are consistent and share identity at the globaland local level and two of which do not share identity at the global and locallevel. (b) RT (and one SE) for means for control and autism group for globaland local identification as a function of consistency.

shared identity (a large H made of smaller Hs and a large Smade of small Ss) or inconsistent letters, in which the lettersat the two levels had different identities (a large H madeof small Ss and a large S made of small Hs; see Fig. 2a).The global letter subtended 3.2◦ in height and 2.3◦ in width,and the local letter subtended 0.44◦ in height and 0.53◦ inwidth.The experiment consisted of the factorial combination

of two variables in a repeated measures design: globality(global identification versus local identification), and con-sistency (consistent stimuli versus inconsistent stimuli). Thetwo tasks, local or global identification, were administeredin separate blocks of 96 experimental trials each, precededby 10 practice trials. The consistent and inconsistent letterswere randomized within block with each letter occurring onan equal number of trials, for a total of 192 trials. Before eachblock, participants were verbally instructed to respond to theglobal or local letters. Each trial was initiated with a centralfixation cross of 500ms duration. This was immediately re-placed by one of the four possible stimuli, which remainedcentrally on the screen until a response was made. Partici-pants were instructed to press the left key on the button box(or keyboard) to indicate a response of ‘s’ or the right keyfor ‘h’. The order of the blocks and response designation wascounterbalanced across subjects.

4.1.2. Results and discussionAt the outset, we note that there is neither a group differ-

ence between autistic and control subjects, nor an interactionof any sort, in the accuracy data (all F< 1). Autistic and con-trol subjects were correct on average 98% (S.D., 2%) and98.2% (S.D., 1.3%) of the time, respectively. The high ac-curacy rate is not surprising given the unlimited exposureduration and ease of task (making s/h decisions). The RTdata, calculated on the median for each subject for each con-dition, reveals a significant three-way interaction betweengroup× globality (global and local)× consistency (consis-tent and inconsistent) (F(1,39) = 4.9, p< 0.05). There are alsomain effects of group and of globality (p< 0.0001). As is evi-dent from Fig. 2b, under these testing conditions, the controlsubjects responded quickly and showed a slight advantagefor global over local identification (23ms) and a slight asym-metry with greater slowing in the inconsistent case (relativeto the congruent case) when local identification is required(interference from globally incongruous letter) than whenglobal identification is required (interference from locallyincongruous letters). This global advantage and the global-to-local interference replicates the standard findings (Navon,2003), although the condition differences may not be as largeas usual given the unlimited duration and repeated foveal pre-sentation.The autistic subjects were slower than the control sub-

jects overall, but most importantly, a different pattern of per-formance is observed for them (Fig. 2b). The autistic groupis overall faster for local than global identification althoughthis difference comes from the inconsistent trials: there is nostatistically significant difference between global and localidentification in the consistent case, but in the inconsistentcase, local identification is faster than global identification(p< 0.05). The latter result, namely greater slowing in theinconsistent case when global identification is required, in-dicates a large local-to-global interference. The faster localidentification and the local-to-global interference both indi-cate that autistic individuals show a local bias in their pro-cessing. It is the case, however, that there is no obvious localadvantage in the consistent case suggesting that there maybe some partial processing of the global identity too which,when congruent with the local letter, can be extracted.We caninfer then that, under the conditions employed here, the autis-tic individuals were able to derive the global configuration inthe consistent condition, but that their local bias gave riseto large interference in the inconsistent condition, suggest-ing that it was difficult to attain a stable global configurationwhen the elements had a conflicting identity.Given the ongoing controversy in the autism literature con-

cerning the extent to which processing is locally biased andthe extent to which configural processing is possible, andgiven the suggestion that the autistic group in this experi-ment may be able to derive the global identity as well asthe local identity in the consistent case, we examined thedata of each autistic subject individually. To this end, we cal-culated the number of autistic individuals who fall outside

[綾屋 & 熊谷, 2008]

Chilly

Irritated

Sad

Moving Stomach

Headache

Non- focused

Power- less

Pain

Itchy

Hungry

仮説：予測学習に基づくASDのメカニズム •  感覚・運動情報の予測誤差に対する異常な感度が，健常者と異なる内部モデルを形成．

➔ 社会的コミュニケーションの障害 [綾屋 & 熊谷, 2008; Nagai, in press]

Sensorimotor information

People with ASD à Smaller/larger tolerance

Typically developing people à Proper tolerance for

prediction error

ASD感覚体験システム •  ヘッドマウントディスプレイ上にASDの特異な感覚特性を再現 – 視覚・聴覚過敏，両耳聴統合不全，etc.

•  ASDの感覚特性を定量的に評価

•  社会性障害への因果関係を解明 Camera and

microphone

Head-mounted display

Earphone

[Qin, Nagai, Kumagaya, Ayaya, & Asada, ICDL-EpiRob 2014]

HMDで再現した視覚過敏の例（音声強度に比例した視覚ノイズ）

ASD者の視覚体験

(x 3 persons)

(x 2 persons)

(x 1 person)

Andy Warhol (1928-1987)

Cognitive Neuroscience Robotics

まとめ

認知発達の鍵：感覚・運動情報の予測学習 •  予測性（随伴性）Pi = 事象 i に関して，自己の運動 a(t) が感覚状態の変化 si(t) à si(t+1) に与える影響

1. 自己の発見と制御

Pself ≈1.0

2. 自他認知

Pself > Pother > 0.0

3. 他者との相互作用

Pother0 ,!, Potherm ≈ 0.9

Potherm+1,!, Pothern−1 ≈ 0.0

#$%

&%

Pother =1n

Pothern∑

[Nagai, in press]

(t : time)Pi si, a( ) = p si t +1( ) si t( ), a t( )( )

脳の基本機能としての予測学習

Predictive coding under the free-energy principleKarl Friston* and Stefan Kiebel

The Wellcome Trust Centre of Neuroimaging, Institute of Neurology, University College London,Queen Square, London WC1N 3BG, UK

This paper considers prediction and perceptual categorization as an inference problem that is solvedby the brain. We assume that the brain models the world as a hierarchy or cascade of dynamicalsystems that encode causal structure in the sensorium. Perception is equated with the optimization orinversion of these internal models, to explain sensory data. Given a model of how sensory data aregenerated, we can invoke a generic approach to model inversion, based on a free energy bound on themodel’s evidence. The ensuing free-energy formulation furnishes equations that prescribe theprocess of recognition, i.e. the dynamics of neuronal activity that represent the causes of sensoryinput. Here, we focus on a very general model, whose hierarchical and dynamical structure enablessimulated brains to recognize and predict trajectories or sequences of sensory states. We first reviewhierarchical dynamical models and their inversion. We then show that the brain has the necessaryinfrastructure to implement this inversion and illustrate this point using synthetic birds that canrecognize and categorize birdsongs.

Keywords: generative models; predictive coding; hierarchical; birdsong

1. INTRODUCTIONThis paper reviews generic models of our sensoriumand a Bayesian scheme for their inversion. We thenshow that the brain has the necessary anatomical andphysiological equipment to invert these models, givensensory data. Critically, the scheme lends itself to arelatively simple neural network implementation thatshares many features with real cortical hierarchies inthe brain. The basic idea that the brain tries to infer thecauses of sensations dates back to Helmholtz (e.g.Helmholtz 1860/1962; Barlow 1961; Neisser 1967;Ballard et al. 1983; Mumford 1992; Kawato et al. 1993;Dayan et al. 1995; Rao & Ballard 1998), with a recentemphasis on hierarchical inference and empirical Bayes(Friston 2003, 2005; Friston et al. 2006). Here, wegeneralize this idea to cover dynamics in the world andconsider how neural networks could be configured toinvert hierarchical dynamical models and deconvolvesensory causes from sensory input.

This paper comprises four sections. In §1, weintroduce hierarchical dynamical models and theirinversion. These models cover most of the modelsencountered in the statistical literature. An importantaspect of thesemodels is their formulation in generalizedcoordinates of motion, which lends them a hierarchalform in both structure and dynamics. These hierarchiesinduce empirical priors that provide structural anddynamical constraints, which can be exploited duringinversion. In §2, we show how inversion can beformulated as a simple gradient ascent using neuronalnetworks; in §3, we consider how evoked brain responsesmight be understood in terms of inference underhierarchical dynamical models of sensory input.1

2. HIERARCHICAL DYNAMICAL MODELSIn this section, we look at dynamical generative modelspð y;wÞZpð y jwÞp ðwÞ that entail a likelihood, p(yjw), ofgetting some data, y, given some causes, wZ{x, v, q},and priors on those causes, p(w). The sorts of modelswe consider have the following form:

yZ gðx; v; qÞCz;

_xZ f ðx; v; qÞCw;ð2:1Þ

where the nonlinear functions f and g of the states areparametrized by q. The states v(t) can be deterministic,stochastic or both, and are variously referred to asinputs, sources or causes. The states x(t) meditate theinfluence of the input on the output and endow thesystem with memory. They are often referred to ashidden states because they are seldom observeddirectly. We assume that the stochastic innovations(i.e. observation noise) z(t) are analytic, such that thecovariance of ~zZ ½z; z 0; z 00;.$T is well defined; simi-larly, for w(t), which represents random fluctuations onthe motion of hidden states. Under local linearityassumptions, the generalized motion of the output orresponse ~yZ ½ y; y 0; y 00;.$T is given by

yZ gðx; vÞCz x 0 Z f ðx; vÞCw

y 0 Z gxx0 Cgvv

0 Cz 0 x 00 Z fxx0 C fvv

0 Cw 0

y 00 Z gxx00 Cgvv

00 Cz 00 x000 Z fxx00 C fvv

00 Cw 00

« «

ð2:2Þ

The first (observer) equation shows that the generalizedstates uZ ½ ~v; ~x; $T are needed to generate a generalizedresponse or trajectory. The second (state) equationsenforce a coupling between different orders of themotion of the hidden states and confer memory on thesystem. We can write these equations compactly as

Phil. Trans. R. Soc. B (2009) 364, 1211–1221

doi:10.1098/rstb.2008.0300

One contribution of 18 to a Theme Issue ‘Predictions in the brain:using our past to prepare for the future’.

*Author for correspondence ([email protected]).

1211 This journal is q 2009 The Royal Society

on 30 March 2009rstb.royalsocietypublishing.orgDownloaded from

REVIEW

Predictive coding: an account of the mirror neuron system

James M. Kilner Æ Karl J. Friston Æ Chris D. Frith

Received: 21 February 2007 / Revised: 19 March 2007 / Accepted: 21 March 2007 / Published online: 12 April 2007! Marta Olivetti Belardinelli and Springer-Verlag 2007

Abstract Is it possible to understand the intentions of

other people by simply observing their actions? Many be-lieve that this ability is made possible by the brain’s mirror

neuron system through its direct link between action and

observation. However, precisely how intentions can beinferred through action observation has provoked much

debate. Here we suggest that the function of the mirror

system can be understood within a predictive codingframework that appeals to the statistical approach known as

empirical Bayes. Within this scheme the most likely cause

of an observed action can be inferred by minimizing theprediction error at all levels of the cortical hierarchy that

are engaged during action observation. This account

identifies a precise role for the mirror system in our abilityto infer intentions from actions and provides the outline of

the underlying computational mechanisms.

Keywords Mirror neurons ! Action observation !Bayesian inference ! Predictive coding

Introduction

The notion that actions are intrinsically linked to percep-

tion was proposed by William James, who claimed, ‘‘every

mental representation of a movement awakens to somedegree the actual movement which is its object’’ (James

1890). The implication is that observing, imagining, or in

anyway representing an action excites the motor program

used to execute that same action (Jeannerod 1994; Prinz

1997). Interest in this idea has grown recently, in part dueto the neurophysiological discovery of ‘‘mirror’’ neurons.

Mirror neurons discharge not only during action execution

but also during action observation, which has led many tosuggest that these neurons are the substrate for action

understanding.

Mirror-neurons were first discovered in the premotorarea, F5, of the macaque monkey (Di Pellegrino et al.

1992; Gallese et al. 1996; Rizzolatti et al. 2001; Umilta

et al. 2001) and have been identified subsequently in anarea of inferior parietal lobule, area PF (Gallese et al. 2002;

Fogassi et al. 2005). Neurons in the superior temporal

sulcus (STS), also respond selectively to biologicalmovements, both in monkeys (Oram and Perrett 1994) and

in humans (Frith and Frith 1999; Allison et al. 2000;

Grossman et al. 2000) but they are not mirror-neurons, asthey do not discharge during action execution. Neverthe-

less, they are often considered part of the mirror neuron

system (MNS; Keysers and Perrett 2004) and we willconsider them as such here. These three cortical areas,

which constitute the MNS, the STS, area PF and area F5,

are reciprocally connected. In the macaque monkey, areaF5 in the premotor cortex is reciprocally connected to area

PF (Luppino et al. 1999) creating a premotor–parietal MNS

and STS is reciprocally connected to area PF of the inferiorparietal cortex (Harries and Perrett 1991; Seltzer and

Pandya 1994) providing a sensory input to the MNS (see

Keysers and Perrett 2004 for a review). Furthermore, thesereciprocal connections show regional specificity. Although

STS has extensive connections with the inferior parietal

lobule, area PF is connected to an area of the STS that isspecifically activated by observation of complex body

movements. An analogous pattern of connectivity between

premotor areas and inferior parietal lobule has also been

J. M. Kilner (&) ! K. J. Friston ! C. D. FrithThe Wellcome Trust Centre for Neuroimaging,Institute of Neurology, 12 Queen Square,WC1N 3BG London, UKe-mail: [email protected]

123

Cogn Process (2007) 8:159–166

DOI 10.1007/s10339-007-0170-2

only underlies motor but cognitive and social skills as well”. Interest-ingly, they note consolidation of memory traces after the initial acqui-sition can “result in increased resistance to interference or evenimprovement in performance following an offline period”. This is afascinating observation that suggests optimisation of the brain's gen-erative model does not necessarily need online sensory data. Indeed,there are current theories about the role of sleep in optimising thebrain's generative model, not in terms of its ability to accurately pre-dict data, but in terms of minimising complexity. Mathematically, thisis interesting because surprise or model evidence can be decomposedinto accuracy and complexity terms; suggesting that model evidencecan be increased by removing redundant model components or pa-rameters (Friston, 2010). This provides a nice Bayesian perspectiveon synaptic pruning and the issues considered by Németh andJanacsek (2012-this issue).

4. Active inference

As noted above, a simple extension to predictive coding is to con-sider their suppression by the motor system. In this extension, predic-tion errors are not just suppressed by optimising top-down ordescending predictions but can also be reduced by changing sensoryinput. This does not necessarily mean visual or auditory input butthe proprioceptive input responding to bodily movements. As notedabove, the suppression of proprioceptive prediction errors is, ofcourse, just the classical reflex arc. In this view, motor control be-comes a function of descending predictions about anticipated or pre-dicted kinematic trajectories. See Fig. 1 for a schematic. The importantobservation here is that the same sorts of synaptic mechanisms andinferential principles can be applied to both perception and the con-sequences of action. This nicely accommodates the literature on

error related negativity reviewed by Hoffmann and Falkenstein(2012-this issue); who consider the “monitoring of one’s own ac-tions” and its role in adjusting behaviour. Again, the focus is on EEG,suggesting that even within single trial recordings, the neurophysio-logical correlates of behaviour-dependent prediction errors can beobserved empirically. In their words: “The initiated response is com-pared with the desired response and a difference; i.e., mismatch be-tween both representations induces the error negativity”. This is notthe proprioceptive prediction error that drives reflex arcs but a highlevel perceptual (or indeed conceptual) prediction error; suggestingthat the long-term hierarchical predictions of unfolding sensory andkinematic changes have been violated. In other words, these phe-nomena speak again to separation of temporal scales and hierarchiesin providing multimodal predictions to the peripheral sensory andmotor systems.

Active inference means that movements are caused by top-downpredictions, which means that the brain must have a model of whatcaused these movements. This begs the interesting question as towhether there is any sense of agency associated with representations.In other words, if I expect to move my fingers and classical motor re-flexes cause them to move, do I need to know that it was me who ini-tiated the movement? Furthermore, can I disambiguate between meas the agent or another. These are deep questions and move us onto issues of self modelling and action observation:

5. Action observation and agency

In a nice analysis of agency, gait and self consciousness, Kannapeand Blanke (2012-this issue) start by acknowledging: “Agency is animportant aspect of bodily self consciousness, allowing us to separateown movements from those induced by the environment and to

Fig. 1. This figure illustrates the neuronal architectures thatmight implement predictive coding and active inference. The left panel shows a schematic of predictive coding schemes inwhich Bayesian filtering is implemented by neuronal message passing between superficial (red) and deep (black) pyramidal cells encoding prediction errors and conditional pre-dictions or estimates respectively (Mumford 1992). In these predictive coding schemes, top-down predictions conveyed by backward connections are compared with conditionalexpectations at the lower level to form a prediction error. This prediction error is then passed forward to update the expectations in a Bayes-optimal fashion. In active inference,this scheme is extended to include classical reflex arcs, where proprioceptive prediction errors drive action— a (alpha motor neurons in the ventral horn of the spinal-cord) to elicitextrafusal muscle contractions and changes in primary sensory afferents frommuscle spindles. These suppress prediction errors encoded by Renshaw cells. The right panel presents aschematic of units encoding conditional expectations and prediction errors at some arbitrary level in a cortical hierarchy. In this example, there is a distinction between hidden statesxx that model dynamics and hidden causes xv that mediate the influence of one level on the level below. The equations correspond to a generalized Bayesian filtering or predictivecoding in generalized coordinates of motion as described in (Friston, 2010). In this hierarchical form f(i) := f(xx(i),xv(i)) corresponds to the equations of motion at the i-th level, whileg(i) :=g(xx(i),xv(i)) link levels. These equations constitute the agent's prior beliefs. D is a derivative operator and Π(i) represents precision or inverse variance. These equations wereused in the simulations presented in the next figure.

250 K. Friston / International Journal of Psychophysiology 83 (2012) 248–252

今後の課題 •  予測学習だけで認知発達はどこまで説明しうるか？ •  感覚・運動情報を超えた内部表象の獲得？

言語使用

摸倣

自他認知

心の理論

共同注意

目標指向動作

意図理解

予測学習 + ？

利他的行動

Thank you!

大阪大学 •  浅田稔教授 •  細田耕教授 •  浅田研研究員・学生京都大学 •  明和政子教授東京大学 •  熊谷晋一郎特任講師 •  綾屋紗月特任研究員 Bielefeld University •  Dr. Katharina Rohlfing KAIST •  Jun-Cheol Park

[email protected] http://cnr.ams.eng.osaka-u.a.jp/~yukie/

Science

ロボティクスから探る乳幼児の社会性認知発達(長井 志江)

ロボティクスから探る乳幼児の社会性認知発達(長井志江)