Predictability within a word: Evidence from Finnish compounds Raymond Bertram and Jukka Hyönä,

Predictability within a word: Evidence from Finnish compounds

Raymond Bertram and Jukka Hyönä, University of Turku

&Alexander Pollatsek

University of Massachusetts at Amherst

Presentation at ECEM13, 18.8.2005, 11.10-11.30, Bern, Switzerland

Predictability within a wordExp. 0 (N = 6 Finns)

dooms......

warehouse (6x) ware......

doomsday (4x)

Predictability across words

Balota, Pollatsek, Rayner (1985) The doctor told Fred that drinking would damage

his LIVER/HEART very quickly.

Assessment of predictability:a. How well a word fits into sentence (scale 1-5)

4.47 for highly predictable words

2.32 for less predictable words

b. The doctor told Fred that drinking would damage his ...

highly predictable words, 64 % of the time

less predictable words, < 1 % of the time

Predictability across words

Balota, Pollatsek, Rayner (1985) The doctor told Fred that drinking would damage

his LIVER/HEART very quickly.

Target: FFD GAZE %2fixLess predictable (heart) 225 264 .22Highly predictable (liver) 216 232 .09

Predictability Effect +9 ms +32 ms .13

Predictability across words: interim summary

highly predictable words easier to process than less

predictable words

predictability effects appear early in the eye

movement record => first fixation duration (or 2fix

%)

=> predictability is integral part of lexical access; it

isn’t confined to post-lexical checking processes

Predictability within words

Focus of this study: can we find predictability

effects within words as well => Finnish compounds


For compound words, e.g. alttari/taulu, ’altar

piece’ or aktivisti/liike ’activist movement, given the

first constituent ...

alttari ...

aktivisti ...

... how predictable is the second constituent?


liikeaktivisti ryhmä

klubi

aktivisti has a small family


alttari

hartaus

kaappi

laite

seinä

taulu

vaate

huone

kehys

osa

poika

komero

rakenne

taulumaalari

liina

kuva

kokonaisuus

syvennys

kaide

rakennelma

maalaus

... whereas alttari has a large family

Predictability within words => second constituent of alttari/taulu less predictable than

than the second constituent of aktivisti/liike

=> Does the left constituent family size (our initial operationalization of 2nd constituent predictability) affect compound word processing?

For example: will liike in aktivisti/liike be processed faster than taulu in alttari/taulu

aktivisti alttari liike

taulu

Predictability within words Earlier hints that famsize affects compound word processing:

1. Hyönä, Bertram, Pollatsek (2004): Compound word study in which first constituent frequency was manipulated, while keeping 2nd & whole-word freq. constant.

High Frequency 1st constituent: NEWS/PAPER vs.Low Frequnecy 1st constituent: STRAW/BERRY

HF < LF, involvement 1st constituent)

1a. The man saw the NEWSPAPER and picked it up from the rack.

1b. The man saw the STRAWBERRY and picked it up from the bush.

HF LF

High Frequency 1st constituent: NEWS/PAPER

Low Frequency 1st constituent: STRAW/BERRY

HF LF HF LF

221 231 * 243 285 * 555 622 *

news/paper news/paper news/paper

1

FFD Gaze Constituent 1 Gaze Whole Word

High Frequency 1st constituent: NEWS/PAPER

Low Frequnecy 1st constituent: STRAW/BERRY

HF LF

329 311 *

news/paper

Gaze Constituent 2


Same pattern of results found in Hyönä & Pollatsek (1998, 2000)

In sum, clear positive frequency effect of 1st constituent overall and in early stages, but reverse frequency effect on second constituent => Why?

Frequency as such may be a possible factor, but why would a second constituent (paper) be processed slow, when the first constituent (news) is of high frequency (against foveal load hypothesis

newspaper

It’s more likely that it is due to the wealth of possible compounds that can be formed with news!

straw +berry(+man, flower)

news + paper(+ man, agency, cast, desk, magazine, release, etc.)

Family size and 1st constituent Frequency

First Constituent Frequency

Family size

Constraint hypothesis

second constituent of compounds with low

frequency first constituents is more constrained

than second constituent of compounds with high

frequency first constituents

this will lead to faster processing of second

constituent, when 1st constituent is of low frequency

The family size experiment

Even though post-hoc analyses are suggestive, family

size and first constituent frequency are confounded in

earlier-mentioned studies =>

Family size experiment: manipulating family size

while controlling for first constituent frequency

And everything else ...

Lexical statistics

Large family size Small family sizeExample alttari/taulu aktivisti/liike

N 20 20FamSize 71,5 2,8

Freq. 1con 12,7 13,8Freq. 2con 160,5 168,5Freq. ww 0,9 0,8

Length ww 12,7 12,8Length 1con 7,3 7,2

Average bigram freq. 7,4 7,4

The family size experiment methodParticipants: N = 31 native FinnsApparatus: Eye Link 2Materials: 8 items in practice session, 40 target items, 60

fillers. Matched target words in samesentence frame similar up to target + 1:

Small family size: Ulla toivoi, että VIITTOMA/KIELI olisi kansalaisopiston seuraavan vuoden opinto-ohjelmassa. ‘Ulla hoped that sign language would

be in next year’s community college curriculum.’

Large family size: Ulla toivoi, että ALTTARI/TAULU olisi ripustettu vähän korkeammalle, jotta se näkyisi takariviin asti. ‘Ulla hoped, that the altar piecewould be hung a bit higher, so that it would be seen up to the back row.’

Procedure: Participants asked to paraphrase sentence on every 5th sentence


hypothesis 1: the right constituent of compounds whose

left constituent has a Small Family Size (SFS compounds)

will be processed faster/more efficiently than compounds

whose left constituent has a Large Family Size (LFS

compounds)

thus liike in aktivisti/liike will be processed faster than

taulu in alttari/taulu


measures one can consider to assess this hypothesis

second fixation duration (fix. 2) alttari/taulu

S < L

2nd constituent gaze duration (fix 2+3) alttari/taulu

S < L

2nd constituent total reading time (fix 2,3,5) alttari/taulu S < L

2

2 3

2 345

LFS SFS

Large Family Size (LFS): ALTTARI/TAULU

Small Family Size (SFS): AKTIVISTI/LIIKE

LFS SFS

211 199 * 243 220 (*) 266 246 (*)

alttari/taulu alttari/taulu alttari/taulu

2nd fix. durationp1=.03, p2=.01

2nd constituent totalp1=.05, p2=.21

2 2 3

2nd constituent gazep1=.02, p2=.08

2 345

LFS SFS

Conclusion

Stronger constraint of small family size leads to

faster/more efficient processing of second

constituent

=> faster generation of second constituent for

aktivisti/liike than for alttari/taulu!


hypothesis 2a: first constituent will be processed

equally fast for both conditions (aktivisti in

aktivisti/liike = alttari in alttari/taulu)

hypothesis 2b: SFS compounds will be processed

faster than LFS compounds (aktivisti/liike <

alttari/taulu)

LFS SFS



LFS SFS

1.61 1.70 338 355 379 412 (*)


Nr. of fix. on 1stp1=.07, p2=.24

1st constituent totalp1=.01, p2=.13

2

1st constituent gazep1=.10, p2=.27

34

LFS SFS

1 21 21

Conclusion There is a tendency to process the first constituent of LFS

compounds faster than SFS compounds

=> No gaze duration effect. In the end, it takes about an

equal amount of time to process SFS and LFS compounds

LFS SFS572 584

Gaze whole wordp1,p2 > .20

Discussion

Possible explanations for faster recognition of

first constituent in LFS compounds: first phase of 1st constituent recognition

(familiarization phase) faster for constituents with

large families (see E-Z reader)

easier to parse out first constituent with large

families

General Conclusions Predictability of the second constituent can be quantified by

family size (whether this is the best operationalization needs to be seen).

Within-word predictability similar to predictability effects across words Faster processing of the predictable constituent/word than the

unpredictable one

1st constituent processing seems to benefit from large families

Whatever is predictable, at least the last slide is ...(if not formally than conceptually)

KIITOS!!!


alttari

hartaus

kaappi

laite

seinä

taulu

vaate

huone

kehys

osa

poika

komero

rakenne

taulumaalari

liina

kuva

kokonaisuus

syvennys

kaide

rakennelma

maalaus

alttaritaulu: freq=60

20 times alttariXXXXwith frequency 1

Probability of encounteringalttaritaulugiven alttari is 75%

The family size experiment measures one can consider to assess this hypothesis

second fixation duration (fix. 2) alttari/taulu S < L

2nd constituent gaze duration (fix 2+3) alttari/taulu S < L

2nd constituent total reading time (fix 2,3,5) alttari/taulu S < L

skipping rate of 2nd constituent alttari/taulu S > L

number of fixations on second constituent alttari/taulu S < L

2

2 3

2 345

1

2 3

The family size experiment: results


equally (fast) for both conditions (aktivisti in


Measure LFS SFS Diff. p1 p2TFL 8,70 7,51 -1,19 <.001* .02*

% regressions to 1st constituent 0,17 0,23 -0,06 <.01 * >.20regressive fixation time 41 58 -17 <.01 * >.20

Gaze duration constituent 1 338 355 -17 .11 >.20Total reading time constituent 1 379 412 33 <.01 * .13Nr. of fixations on constituent 1 1.61 1.70 .09 .07 .24

The family size experiment: results


equally (fast) for both conditions (aktivisti in


Measure LFS SFS Diff. p1 p2TFL 8,70 7,51 -1,19 <.001* .02*

% regressions to 1st constituent 0,17 0,23 -0,06 <.01 * >.20regressive fixation time 41 58 -17 <.01 * >.20

Gaze duration constituent 1 338 355 -17 .11 >.20Total reading time constituent 1 379 412 33 <.01 * .13Nr. of fixations on constituent 1 1.61 1.70 .09 .07 .24

LFS SFS



LFS SFS

.236 .268 1.18 1.15

alttari/taulu alttari/taulu

Skipping rate 2ndp1=.12, p2=.51

Nr. of fix. on 2ndp1=.16, p2=.47

1 2 3

LFS SFS



LFS SFS

.17 .23 (*) 41 58 (*) 8.70 7.51 *


Regr. to C1p1=.01, p2=.21

3rd fix. locationp1=.001, p2=.02

2

Regr. fix. timep1=.01, p2=.21

LFS SFS

3 23 23

Discussion

Possible explanations for faster recognition of first

constituent in LFS compounds: first phase of 1st constituent recognition (familiarization

phase) faster for constituents with large families (see E-Z

reader)

more global activation for first constituent with many family

members (cf vld-studies Baayen et al. for simplex words)

easier to parse out constituents with large families

Meaning integration of 2 constituents sometimes easier for

LFS compounds (reflected in regression data).

Documents

Predictability within a word: Evidence from Finnish compounds Raymond Bertram and Jukka Hyönä,