Language change in individuals and populations

Language change in individuals and

populationsRichard A Blythe

with Gareth Baxter, Bill Croft, Alistair Jones, Simon Kirby, Alan McKane, Kenny Smith and Kevin Stadler

CANES Seminar – Nov 25th 2015

Large populations, long timescales

Dunn et al 2011 Maurits & Griffiths 2014

…

State = featuree.g. SVO vs OVS

see also Reali & Griffiths 2009, Smith & Wonnacott 2010, …

Small populations, short timescales

Input Output

Noun-Adj Adj-Noun

Noun-Num 443 (52%) 32 (4%)

Num-Noun 149 (17%) 227 (27%)

Culbertson et al 2012 reporting data from WALS 2008

What happens in between?572 LANGUAGE, VOLUME 83, NUMBER 3 (2007)

matic shift between 1971 and 1984. The trajectories of these nine individuals are desig- nated by arrows connecting their 1971 with their 1984 data points.

Figure 3. Individual percentages of [R]/([R] + [r]) for the 32 panel speakers for 1971 and 1984. Trajectories plotted for all speakers who showed a significant difference between the two years.

At the bottom of the graph, we see two groups of the 1971 speakers in the ellipses: a tight cluster of seven individuals under age thirty, none of whom uses more than 10% [R], and another group of five speakers between the ages of thirty-five and fifty, and who range between 0% and 17%. Ten of these twelve people form the stable group of Table 11 in their use of the conservative form. Their 1984 values are displayed in the two dotted ellipses to the right of each of the 1971 groups. Two of them, however, Lysiane B. (007) and Alain L. (104), make substantial changes, abandoning the virtually categorical conservative pattern they displayed in 1971. Along with the other mid- range speakers, their data appears individually in Table 12. Lysiane and Alain are among the nine speakers whose trajectories appear in Fig. 3.

A majority of mid-range speakers in 1971 (seven out of ten) had moved to the categorical or near-categorical use of innovative [R] by 1984. They are people we call 'later adopters' of the innovative variant. Five of the seven are young; five of the seven are male. As a group, we would characterize their behavior as catching up with their peers, the early adopters. The next three people listed in Table 12 are those mid-range speakers who were stable, somewhat older, behaving more as we had expected older individuals to behave. Of note is that one of them, Andre L. (065), a professional actor age twenty-seven when we met him in 1971, was the only speaker who exhibited stylistic variation (Sankoff & Blondeau 2008). In 1995 he maintained virtually the same overall level of [R] as he had in 1984: 69%. At the bottom of the table are the two individuals who moved from virtually categorical use of [r] to dominant use of [R], occupying the upper part of the variable range. Lysiane B. (007) is a case of exceptional upward social mobility, a twenty-four-year-old factory worker when we met her in 1 97 1 , a businesswoman in 1 984, and when last interviewed in 1 995, a successful realtor.

This content downloaded from 192.41.131.252 on Tue, 23 Jun 2015 10:59:50 UTCAll use subject to JSTOR Terms and Conditions

Montreal French /r/Sankoff & Blondeau 2007

[R] v

s [r

]

0 0 0 0 0 0 0 1 1 1 3 3 0

Fisher 1930, Wright 1931, Moran 1958, Boyd & Richerson 1985, Smith 2009, …

0

p00

1

p01

3

q3

t

t+1

Language learning and use in populationsThe Wright-Fisher-Moran-Iterated-Learning paradigm

Let λi be the probability that each offspring acquires state i, given the state of the parent population

P (n0, n1, . . . , nk; t+ 1) =

✓N !

n0!n1! . . . nk!

◆�n00 �n1

1 · · ·�nkk

If acquisition events are independent

xi =ni

N

h�xii = �i � xi h�xi�xji =�i(�i,j � �j)

N

+ (�i � xi)(�j � xj)

Only consistent continuous-time limit is if �i � xi ⇠

1

N

0 0 0 0 0 0 0 1 1 1 3 3 0

0

p00

1

p01

3

q3

t

t+1

λi is the probability that each offspring acquires state i

@

@t

P (~x, t) =X

n

anD̂nP (~x, t) +1

2N

X

i,j

@

2

@xi@xjxi(�i,j � xj)P (~x, t)

N is here

time, t

frequ

ency

, x

�i = xi

@

@t

P (x, t) =1

N

@

2

@x

2x(1� x)P (x, t)

The time until loss (fixation) increases with N

Agents learn from a randomly-chosen parent

The time until loss (fixation) increases with N

Two key results from neutral theory of evolution

Kimura 1984

If the probability of innovation (mutation) is constant for each individual (universal), the overall rate of innovation increases with N

Does the rate of

historical language change depend on the number of speakers?

Lexical gain and lossBromham et al 2015

Count gain and loss of cognatesWords with the same form and function

γ Lower 95% CL Max Likelihood Upper 95% CL

Gain 0.145 0.29 0.435

Loss -0.194 -0.12 -0.048

Gain & Loss -0.092 -0.03 0.033

Macroscopic Poisson process

A varies according to language pairN is current language size

!(k ! k ± 1) = AN� is the best fit

None 243 (39%)

None 296 (55%)

Demonstrative 69 (11%)

“One” 112 (21%)

Distinct word 216 (35%)

Distinct word 102 (19%)

Affix 92 (15%)

Affix 24 (4%)

0

1

3

Definite Indefinite

2

WALS 2014

Article grammaticalisation cycles

Historical data51 languages6 areal groups

(Europe, Mideast, S&SW Asia, E Asia, Mesoamerica, S America)

Changes recorded over periods lasting 500 to 5000 years

Also made estimates of historical population sizes

Blythe & Croft, in prep.

Changes in state modelled as a Poisson process

γ Lower 95% CL Max Likelihood Upper 95% CL

Definite -0.316 -0.213 -0.101

Definiteexcluding Mesoamerica

-0.266 -0.0706 0.136

Indefinite -0.13 0.0389 0.211

In these cases, a constant rate model is a marginally better fit

Current distribution over the world’s languages cf Maurits & Griffiths 2014


�(i � i + 1) =AN�

fi

Macroscopic changes in state are at most weakly affected by language size

Thought experiment1. Take the set of language changes in the sample

2. Assign a common population size N to each

3. Determine the likelihood of the set of changes within a microscopic model, P(N)

4. It shouldn’t matter (too much) what value of N is chosen in step 2


Looking for less than one order of magnitude variation in P(N) across four orders of magnitude in N

Model B Model C

Parents 1 2 1Cycle

mechansismBiased mutation +

noise Novelty bias Interaction between features

Free params 2 2 2Fixed params 0 1 1

Tail Geometric Power (-2) Power (-2)

Model A

P(N)

Effect size 10x larger than observed

@

@t

P (~x, t) =X

n

anD̂nP (~x, t) +1

2N

X

i,j

@

2

@xi@xjxi(�i,j � xj)P (~x, t)

N is here

an specify strengths of cognitive biases, assumed universalFor any statistic X(~a, ⌧, N) = F(N~a, ⌧

N ) ⇠ F(N~a, 0)

Asymptotically-flat P(N)

What if biases aren’t universal?Consequences of scaling law

Each a priori unknown parameter contributes

1/N to tail of P(N)Parameter needs to be

known to within 0.1% to count as “known”


0 0 0 0 0 0 1 1 1

0

p00(σ)1

p01(σ)3

t

t+1

σ = 0

Cognitive universals generate variation in linguistic behaviourPropagated by shared values associated with specific behaviour

Croft 2000

Other evidence

Rapid dialect formationBaxter et al 2009

S-curve pattern of changeBlythe & Croft 2012

Origin of shared biases?

Local majority rule?Memory?

Homophily?

Population-level change does not follow straightforwardly from individual-level change

Better understanding of the role of population size is essential

0

10

Documents

Language change in individuals and populations