A NALYZING THE LOCALIZATION OF LANGUAGE FEATURES WITH C OMPLEX S YSTEMS T OOLS AND PREDICTING...

Preview:

Citation preview

ANALYZING THE LOCALIZATION OF LANGUAGE FEATURES WITH COMPLEX SYSTEMS TOOLS AND PREDICTING LANGUAGE VITALITY

Samuel OmlinUniversity of Lausanne, Switzerland (samuel.omlin@unil.ch)

INTERNATIONAL CONFERENCE “COGNITIVE MODELING IN LINGUISTICS”CML-2010, Dubrovnik (Croatia)

Romansh – an endangered language

“Allegra, miu num ei Alfons Camiu. Jeu vivel en la biala Swizzera. Per discletg sundel jeu in dilsdavos 35'000 che discuoren aunc bein romontsch. Denton ei il prighel fetg gronds, che quei bi lungatg sto murir.”

Romansh – an endangered language

“Hello. My name is Alfons Camiu. I live in Switzerland. I am one of the 35'000 people who speak Romansh as their native language. Unfortunately, the language of my people is in danger of dying out.”

Half of the languages are endangered

Language competition

Business doctrine: location, location,

location

Can this doctrine be applied to the survival of

languages?

Literature study

The role played by the geographic situation of

a language in its ultimate survival, and in

particular the role played by the linguistic

structure of the languages neighboring it, is

still unclear.

Literature study

Inevitable extinction of minority languages in

competition with stronger ones or possibility

for stable coexistence under certain

circumstances?

Unesco: assessing language vitality

In no criteria geography was directly implied:

Focus

Relation between the vitality of a minority

language and the linguistic structure of the

languages neighboring it?

Method

Adaptation of a mathematical method, having

its origins in the economical sciences and

identifying optimal localizations to implement

commercial stores with empirical success

Presentation Outline

Identifying optimal business locations

Measuring the spatial distribution of linguistic features

Predicting language vitality

Modeling and sample

Results and conclusions

Presentation Outline

Identifying optimal business locations

Measuring the spatial distribution of linguistic features

Predicting language vitality

Modeling and sample

Results and conclusions

Modeling and sample

Modeling and sample

Sample summary

• 105 living languages

• 186 linguistic communities in Eurasia with

independent vitality

• 31 of these linguistic communities have associated a

vitality grade

Presentation Outline

Identifying optimal business locations

Measuring the spatial distribution of linguistic features

Predicting language vitality

Modeling and sample

Results and conclusions

M index

Quantifies the geographic aggregation and

dispersion tendencies of pairs of categories

of stores

Imaginary city: 16 Stores

Legend

ButcherBakery

Other store

Imaginary city: 16 Stores

Legend

ButcherBakery

Other store

Do butcher stores “attract” bakeries?

Step 1: definition of neighborhood

Legend

Butcher (A) Bakery (B)Other store

Draw a disk of radius r (100m) around each store (s) of

category A.

Step 2

Legend

Butcher (A) Bakery (B)Other store

Pick a store (s) of category A.

s1

Step 3

Legend

Butcher (A) Bakery (B)Other store

Count the total number of stores in its neighborhood: n(s);

n(s1) = 3

s1

Step 4

Legend

Butcher (A) Bakery (B)Other store

…count the number of B stores in its neighborhood: nB(s);n(s1) = 3

nB(s1) = 2

s1

Step 5

Legend

Butcher (A) Bakery (B)Other store

…compute the local concentration of B stores in its neighborhood:

. n(s1) = 3

nB(s1) = 2

= 2/3

.

s1

Step 6

Legend

Butcher (A) Bakery (B)Other store

Then, count the total number of stores in the entire city: N;

n(s1) = 3

nB(s1) = 2

= 2/3

N = 16

.

s1

Step 7

Legend

Butcher (A) Bakery (B)Other store

…count the number of B stores in the entire city: NB;n(s1) = 3

nB(s1) = 2

= 2/3

N = 16

NB = 4

.

s1

Step 8

Legend

Butcher (A) Bakery (B)Other store

…compute the overall concentration of B stores in the entire

city: . n(s1) = 3

nB(s1) = 2

= 2/3

N = 16

NB = 4

= 1/4

.

s1

Step 9

Legend

Butcher (A) Bakery (B)Other store

Compare the local concentration of B stores with its overall concentration:

n(s1) = 3

nB(s1) = 2

= 2/3

N = 16

NB = 4

= 1/4

= =

.

s1

Step 10

Legend

Butcher (A) Bakery (B)Other store

Compute this ratio also for all the other A stores in the

city. n(s2) = 6

nB(s2) = 2

= 1/3

N = 16

NB = 4

= 1/4

= =

.

s1

s2

Step 11

Finally, compute the average of this ratio over all A

stores in the city:

For our example (A: butcher; B: bakery):

.

Imaginary city: answer

Do butcher stores (A) “attract” bakeries

(B)?MAB = 2: next to the butcher stores, the local

concentration of bakeries is on average two times higher than the overall concentration. => Butcher stores tend to “attract” bakeries.

M index interpretation

Under pure randomness hypothesis E[MAB]=1 for all

r > 0.

=> MAB allows quantifying deviations from purely

random configurations:

MAB > 1: A tends to “attract” B

MAB < 1: A tends to “repulse” B

.

Location quality

Location “quality” index for a commercial

activity A at a point (x,y): essentially

represents the sum of all quantified

attraction and repulsion tendencies from the

stores in the point’s neighborhood

Presentation Outline

Identifying optimal business locations

Measuring the spatial distribution of linguistic features

Predicting language vitality

Modeling and sample

Results and conclusions

Adapted M index

Quantifies tendencies of typological language

features to aggregate or disperse

Adapted M index

A: “2.5.3.SIMPLE SENTENCE -> marginal constructions -> Affective”B: “2.1.4.SYLLABLE -> the element following the vowel -> not more than one consonant”

Neighborhood of a linguistic community

Defined as: the set of communities overlapping its

area, enlarged by a buffer of size r (1 degree ≈

110 km)

Particularity

Determination of the concentration of a

feature: adding numbers of speakers rather

than simply counting communities

Example

Does feature A “attract” feature B?

Example: answer

Does feature A “attract” feature B?MAB ≈ 0.001: next to speakers manifesting feature A,

the local concentration of speakers using feature B is on average about a thousandth of the overall concentration. => Feature A tends to “repulse” feature B.

Presentation Outline

Identifying optimal business locations

Measuring the spatial distribution of linguistic features

Predicting language vitality

Modeling and sample

Results and conclusions

Location quality

Location quality of a feature: average

ability of a feature to coexist with the

features manifested by the communities in

its neighborhood

Location quality of a linguistic

community: aggregated location quality

indexes of its features

Predicting language vitality

For the 31 minority communities for which I

could associate a vitality, I put it in relation

to the corresponding location quality.

Presentation Outline

Identifying optimal business locations

Measuring the spatial distribution of linguistic features

Predicting language vitality

Modeling and sample

Results and conclusions

Location quality and vitality

Spearman’s rang correlation: 0.62 (p-value: 0.00009)

Conclusions

• The degree of endangerment of the

considered minority languages seems

effectively related to the linguistic structure

of their neighboring languages.

Conclusions

• It has been outlined how to join

- Jaziky mira

- World Language Mapping System

- Atlas of the World’s Languages in Danger

in order to conduct quantitative linguistic

studies when geographic parameters are

involved.

Conclusions

• The first study to integrate realistic linguistic features

in order to describe languages in competition

Conclusions

• The approach constitutes a promising tool to

gain more knowledge about the mechanisms

that control the geographical distribution of

linguistic features.

Acknowledgement

• Dr Vladimir Polyakov, organizing

committee

• Professor Valery Solovyev, organizing

committee

• Dr Søren Wichmann, Department of

Linguistics of the Max Planck Institute for

Evolutionary Anthropology, Germany

Support (1/2)

• Dr Aris Xanthos, section of

Linguistics, section of Information

Technologies and Mathematical

Methods, University of Lausanne

(UNIL), Switzerland

• Professor François Golay and

Dr Stéphane Joost, Geographic

Information Systems Laboratory

(LASIG), Swiss Federal Institute of

Technology Lausanne (EPFL),

Switzerland

Support (2/2)

• Professor Pablo Jensen,

Laboratory of Physics, French

National Center for Scientific

Research (CNRS), France

• Professor François Pellegrino and

Dr Fermín Moscoso del Prado

Martín, 'Dynamique Du Langage‘

Laboratory, French National Center

for Scientific Research (CNRS),

France

References (1/3)

Jazyki mira (Languages of the World) (1993-2004). Moscow: Academia & Indrik. [Online]. Available: http://ww.dblang.ru/en

Jensen, P. (2006). Network-based predictions of retail store commercial categories and optimal locations. Phys. Rev. E 74(3), 035101(R). [Online]. Available: http://dx.doi.org/10.1103/PhysRevE.74.035101

References (2/3)

Jensen, P. (2009). Analyzing the Localization of Retail Stores with Complex Systems Tools. In Adams, N. M., Robardet, C., Siebes, A. & Boulicaut, J-F. (Eds.), Advances in Intelligent Data Analysis VIII: 8th International Symposium on Intelligent Data Analysis, Lecture Notes in Computer Science, Vol. 5772/2009, 10–20. Berlin Heidelberg: Springer-Verlag. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-03915-7_2

References (3/3)

Moseley, C. (Ed.) (2009). Atlas of the World’s Languages in Danger. Unesco. [Online]. Available: http://www.unesco.org/culture/en/endangeredlanguages/atlas

World Language Mapping System (2010). Global Mapping International & SIL International. [Online]. Available: http://www.gmi.org/wlms

Additional slides

Step 1: definition of neighborhood

Draw the neighborhood of every community (c) manifesting

feature A.

Step 2

Pick a community (c) manifesting feature A.

Step 3

Add up the number of speakers of all communities in its

neighborhood: n(c); n(c) ≈ 54 million

Step 4

…add up the number of speakers of the communities manifesting the feature B in its neighborhood: nB(c);

n(c) ≈ 54 million

nB(c) = 0

Step 5…compute the local concentration of communities manifesting

feature B in its neighborhood: .

n(c) ≈ 54 million

nB(c) = 0

= 0

Step 6

Then, add up the number of speakers of all communities in the entire sample region: N;

n(c) ≈ 54 million

nB(c) = 0

= 0

N ≈ 931 million

Step 7

…add up the number of speakers of all communities manifesting feature B in the entire sample region: NB;

n(c) ≈ 54 million

nB(c) = 0

= 0

N ≈ 931 million

NB ≈ 93 million

Step 8… compute the overall concentration of communities

manifesting feature B in the entire sample region: .

n(c) ≈ 54 million

nB(c) = 0

= 0

N ≈ 931 million

NB ≈ 93 million

≈ 1/10

Step 9Compare the local concentration of communities

manifesting feature B with its overall concentration:

n(c) ≈ 54 million

nB(c) = 0

= 0

N ≈ 931 million

NB ≈ 93 million

≈ 1/10

≈ ≈ 0

.

Step 10

Compute this ratio also for all the other communities manifesting feature B in the sample region.

Computers work…

Step 11

Finally, compute the average of this ratio over all

communities manifesting feature A in the sample region

(the average is weighted by their number of speakers):

For our example:

MAB ≈ 0.001

.

Example: answer

Does feature A “attract” feature B?MAB ≈ 0.001: next to speakers manifesting feature A,

the local concentration of speakers using feature B is on average about a thousandth of the overall concentration. => Feature A tends to “repulse” feature B.

Interpretation of the adapted M index

Under pure randomness hypothesis E[MAB]=1 for all

r > 0.

=> the adapted MAB allows quantifying deviations

from purely random configurations like Jensen’s MAB :

MAB > 1: A tends to “attract” B

MAB < 1: A tends to “repulse” B

.

Coexistence ability of features

The spatial distribution of the commercial

activities seems to unravel interactions that

favor or disfavor successful local

coexistence of certain activities.

From the spatial distribution of language

features only can not be directly

determined  which features can

successfully coexist and which can not.

Coexistence ability of features

We can quantify interactions favoring or dis-

favoring successful coexistence between

features by considering only communities

that are probably not endangered when

computing the M index.

C index: coexistence ability of features

where the considered linguistic communities

are only the ones that are probably not

endangered.

Method

City: a heterogeneous geographic space (with

parks, streams etc.) giving home to a

network of commercial activities

World: a heterogeneous geographic space

(with sea, mountains, lakes etc.) giving home

to a network of languages, or more precisely,

of linguistic features

Recommended