27 settembre 2005 Complessità, linguaggio e computazione Alessandro Lenci Università di Pisa, Dipartimento di Linguistica Via Santa Maria, 36, 56100 Pisa,

27 settembre 2005

Complessità, linguaggio e computazione

Alessandro Lenci

Università di Pisa, Dipartimento di LinguisticaVia Santa Maria, 36, 56100 Pisa, Italy

[email protected]

Sommario

Sistemi dinamici complessi complessità, informazione e probabilità misurare la complessità proprietà emergenti il linguaggio come sistema complesso

Reti neurali come sistemi dinamici complessi proprietà sintattiche emergenti

Significato come sistema complesso

Cosa è un sistema?

Un sistema (dinamico) è un insieme di aspetti del mondo che mutano nel tempo lo stato di un sistema al tempo t1 è il modo in cui si

presentano al tempo t1 gli aspetti che lo compongono l’insieme degli stati in cui può trovarsi il sistema è il suo

spazio degli stati (state space) Il comportamento di un sistema è il cambiamento

nel tempo del suo stato il comportamento di un sistema è una traiettoria nello

spazio degli stati

Lo spazio degli stati

Stati del sistema = {s1, s2, s3, s4, …}

s1

s2

s3

s4

Complessità e organizzazioneCollier & Hooker 1999

La complessità di un sistema dipende dalla quantità di informazione necessaria per descrivere i suoi stati e il suo comportamento

L’organizzazione di un sistema dipende dalle interdipendenze e correlazioni tra le sue componenti e dal loro grado di (non) linearità

Complessità e organizzazione

Sistemi semplici e poco

organizzati

Sistemi semplici e mediamente organizzati

Sistemi complessi e poco

organizzati

Sistemi complessi e altamente organizzati

organizzazione

com

ples

sità

gas

cristalli

esseri viventisistemi cognitiviLINGUAGGIO

Complessità e informazione

Un oggetto complesso richiede più informazione per essere descritto

La quantità di informazione necessaria per descrivere un sistema dipende da: la quantità dei suoi stati possibili la regolarità (predicibilità) delle sue dinamiche

grado di “randomness” (casualità) del verificarsi dei suoi stati

Incertezza e informazione

L’informazione è la diminuzione dell’incertezza se un evento aleatorio si verifica, abbiamo ottenuto dell’informazione più un evento è incerto maggiore è l’informazione che otteniamo

sapendo che si è verificato

L’entropia è la misura della quantità di informazione o incertezza di una variabile casuale

un sistema può essere descritto come una variabile casuale (W) gli stati possibili del sistema sono i valori della variabile a cui è associata

una distribuzione di probabilità p ad ogni istante ti, p fornisce la probabilità che il sistema si trovi in un certo

stato

Incertezza e informazione

L’entropia è una misura dell’incertezza di un sistema misura quanto è difficile predire qual è lo stato del sistema in un

certo istante ti

Da cosa dipende il grado di incertezza? numero di stati alternativi possibili

lancio di un dato = 6 esiti possibili estrazione di una carta = 52 esiti possibili

l’estrazione di una carta ha un maggior grado di incertezza!!

distribuzione delle probabilità per ciascun stato se gli stati hanno probabilità uniforme è più difficile prevedere quale si

verificherà in un dato momento (a parità del loro numero) cf. lancio di un dado non truccato vs. lancio di un dado in cui sappiamo che il

6 ha probabilità doppia di uscire degli altri numeri

Entropia puntuale

L’entropia è misurata in bits (cifre binarie) Supponiamo che

ad ogni istante ti si debba trasmettere un messaggio per comunicare in quale stato si trova il sistema in ti

che il messaggio debba essere in codice binario (una stringa di 0 e 1)

Entropia puntuale (informazione) di uno stato numero di bits necessari per trasmettere (= descrivere) che il sistema

si trova nello stato s

)(log)( 2 spsh

Entropia

In generale, un numero binario di n cifre può codificare al massimo 2n messaggi

un numero binario di 2 cifre può codificare 4 messaggi diversi 00, 01, 10, 11

Se W ha n stati possibili (tutti equiprobabili), il numero di bits necessari per codificare uno stato è log2n

h(s) = log2n se gli stati del sistema sono equiprobabili, p(s) = 1/n e n = 1/p(s) quindi, h(s) = log21/p(s) = - log2p(s)

se W ha 1 stato possibile, h(s) = 0 bits se W ha 2 stati possibili, h(s) = 1 bits se W ha 4 stati possibili, h(s) = 2 bits

Entropia del sistema

L’entropia di un variabile W è il numero medio di bits necessari per codificare i suoi stati possibili

Se W ha n stati possibili equiprobabili l’entropia del sistema è uguale all’entropia puntuale

W = 4 statiequiprobabili (p(s) = 1/4) h(W) = - (1/4*log21/4+1/4*log21/4+1/4*log21/4+1/4*log21/4) h(W) = - (1/4*(-2)+1/4*(-2)+1/4*(-2)+1/4*(-2)) h(W) = - (-1/2-1/2-1/2-1/2) = -(-2) = 2 bits (= log24)

L’entropia aumenta col crescere del numero degli stati possibili W = 8 stati equiprobabili h(W) log28 = 3 bits

)(log)()()(

2 sPsPWhWVs

Entropia

W = estrazione di una parola da un testo (esiti non equiprobabili!!) V(W) = {il, cane, mangia, gatto} p(cane) = 1/4 P(il) = 1/2 P(mangia) = 1/8 P(gatto) = 1/8 h(W) = - (1/4*log21/4 + 1/2*log21/2 + 1/8*log21/8 + 1/8*log21/8) h(W) = - (0,25*(-2) + 0,5 * (-1) + 0,125 * (-3) + 0,125 * (-3)) h(W) = - (-0,5 - 0,5 - 0,375 – 0,375) = 1,75 bits

L’entropia è il numero medio di bits necessri per descrivere gli stati del sistema

L’entropia permette di stabilire il codice ottimale per descrivere un sistema

gli stati più probabili (più frequenti) sono descritti usando messaggi più corti

gli stati meno probabili sono descritti usando messaggi più lunghi

Entropia

A parità di numero di esiti possibili, meno è

uniforme la distribuzione di

probabilità e minore è l’entropia

Entropia e organizzazione

L’entropia aumenta con l’aumentare degli stati possibili di un sistema

A parità di stati possibili l’entropia diminuisce se aumenta la struttura e l’organizzazione del sistema aumenta la predicibilità delle dinamiche del sistema

entr

opia • Maggiore “ridondanza dell’informazione”

• Regolarità nelle dinamiche del sistema• Esistenza di schemi e pattern ricorrenti nella

sequenza degli stati, ecc.

Sistemi organizzati

L’organizzazione è la coordinazione e interrelazione delle parti di un sistema che ne rende possibile il funzionamento

L’organizzazione richiede l’esistenza di “ridondanze” regolarità strutturali, vincoli, pattern ricorrenti, schematismi

Un sistema organizzato non è un sistema massimamente complesso

l’organizzazione strutturale riduce la complessità (entropia) del sistema

Gli organismi viventi sono sistemi complessi altamente organizzati

Sistemi auto-organizzati

I sistemi auto-organizzati sono in grado di trovare in maniera autonoma stati di organizzazione (struttura) stabile

L’organizzazione e le strutture (vincoli) del sistema sono proprietà emergenti che risultano dalle dinamiche non lineari tra gli elementi del sistema

il sistema ha una macro-organizzazione che emerge come risultato delle dinamiche della sua microstruttura

i vincoli emergenti sono nuovi rispetto ai vincoli microstrutturali

Gli organismi viventi sono sistemi auto-organizzati autonomi, adattivi e anticipativi

Auto-organizzazione

organizzazione stipulata

vs.

organizzazione emergente

(B. MacWhinney)

sistemi distribuiti

Proprietà emergentila forma esagonale delle celle degli alveari

(Bates 1999)

Dinamiche lineari

Le dinamiche del sistema sono additive il comportamento globale del sistema è solo la somma dei contributi

di ciascun componente piccoli mutamenti producono piccoli effetti

-50

-40

-30

-20

-10

0

10

20

30

40

50

-15 -10 -5 0 5 10 15

cmxy

Dinamiche non lineari

Le dinamiche del sistema non sono additive Il risultato globale del sistema non è la semplice somma delle sue

componenti piccoli mutamenti possono produrre grandi effetti

0

0,2

0,4

0,6

0,8

1

1,2

-15 -10 -5 0 5 10 15

xey

1

1

Il linguaggio come sistema complesso

Language is simply the result of a number of tweaks and twiddles each of which may in fact be quite minor, but which in the aggregate and

through interaction yield what appears to be a radically new behavior

Elman 1999

We define grammar as the class of possible solutions to the problem of mapping back and forth between a high-dimensional meaning space

with universal properties and a low-dimensional channel that unfolds in time, heavily constrained by limits of information processing. […] This

is a constrained satisfaction problem and also a dimension reduction problem. In problems like this complex solutions are likely to emerge

that are not directly predictable from any individual componentBates e Goodman 1999

Il linguaggio come sistema complesso

La grammatica è una proprietà emergente del sistema cognitivo, prodotto dell’interazione non lineare di un numero complesso di fattori interazionismo neurale

la grammatica è realizzata in reti di neuroni ad elevato grado di interconnessione

elevata integrazione di tipi diversi di informazione cognitiva sensomotoria, sintattica, semantica, pragmatica, ecc.

interazionismo sociale la grammatica vive nella rete sociale delle interazioni

comunicative

Il linguaggio come sistema complessocontinua

Funzionalismo linguistico Approccio “usage base” all’acquisizione del linguaggio

la conoscenza linguistica è acquisita attarverso processi generali di categorizzazione e schematizzazione cognitiva

Epistemologia costruttivista / interazionista di tipo neo-piagetiano

La grammatica è un sistema intrinsecamente probabilistico effetti di frequenza, gradienza delle strutture grammaticali

Superamento di alcune dicotomie tradizionali competenza vs. esecuzione lessico vs. grammatica rote learning vs. rule-base learning type vs. token

Linguaggio e dinamiche non lineari

“lexical burst” Bates e Goodman 1997

Linguaggio e dinamiche non lineari

Curva a “U”Pinker, Rumelhart, McClelland, Plunkett, Bowerman, ecc.

30

40

50

60

70

80

90

100

1 2 3 4 5 6tempo

% e

rror

i

Le reti neurali come sistemi complessi

Una rete neurale è un sistema dinamico complesso la computazione è il risultato dell’interazione non lineare

di un grande numero di neuroni la rete evolve il suo stato nel tempo fino a raggiungere

uno stato stabile auto-organizza il suo comportamento in risposta agli

stimoli esterni è sensibile alla distribuzione statistica degli input manifesta processi evolutivi non lineari produce proprietà di alto livello emergenti

La computazione neurale

unità di input unità di output

unità nascoste

aj

netinputfunzione

di attivazione

ai

wj


Ogni unità ha un livello di attività (a), che varia durante la computazione

tipicamente un valore reale tra 0 e 1 Le connessioni hanno un peso (un numero positivo o

negativo) L’apprendimento della rete avviene modificando i pesi delle

connessioni Le unità integrano l’input che ricevono dai livelli precedenti

netinputi = j ajwij

A ogni unità i è associata una funzione di attivazione che trasforma l’input ricevuto dalle unità precedenti in un livello di attività ai


La funzione di attivazione è tipicamente non lineare (sigmoide)

La sintassi come proprietà emergente(Elman 1990)

(31) unità di output

(31) unità di input

(150) unità nascoste

(150) unità contestuali

Simple Recurrent Network (SRN)rappresentano eventi che si susseguono nel tempo

La sintassi come proprietà emergente

Word Prediction Task a ogni istante ti, viene presentata una parola wi in una frase la rete deve imparare a produrre in output (predire) la parola wj che

segue wi nella frase

Codifica localistica dell’input ogni parola (tipo) è codificata come una sequenza di 31 bits, di cui

solo uno è diverso da 0

La sintassi come proprietà emergente

Le proprietà grammaticali emergono dalle rappresentazioni

distribuite delle unità nascoste, come risultato

dell’ auto-organizzazione della rete

proprietà emergenti dalle regolarità

statistiche (ridondanze) nelle

sequenze delle parole

La sintassi come proprietà emergentetype vs. token

La rappresentazione delle parole è

intrinsecamente “context sensitive”

The Context in Conceptsevidence from cognitive psychology

Include situational (contextual) information settings, events, situations of use, etc.

Highly “tuned” to specific contexts of use

Different dimensions of a concept are activated in different contexts

“Situation effects” occurs through a wide variety of cognitive tasks similarity judgement are highly context-dependent

Conceptual representations are context-sensitive and context-dependent

(Barsalou, Elman, McRae, et al.)

Towards a Context-Sensitive Lexicon

Semantic properties of nouns will be acquired by inspecting a sufficiently large number of linguistic contexts

distributionally-based methods for word meaning acquisition

Lexical representations will be built out of context data

Goal

To apply computational techniques to bootstrap multidimensional and context-sensitive lexical representations

Semantic Spaces

Gärdenfors (2000) conceptual spaces as a framework for conceptual representations and

cognitive semantics

Words can be represented as regions in n-dimensional semantic space

color semantic space

hue

saturation

brightness

red pink

blue

brown

violet

Carving the Semantic Space of Nouns

The semantic space of nouns is usually characterized as a class taxonomy entity

concrete_object location

animal artifact

abstraction

The primacy of taxonomical structures in the noun system has radically been downgraded in recent cognitive psychology

The organization of the conceptual space is greatly based on the roles that nouns have in events and situations thematic relatedness (Lin & Murphy 2001)

Carving the Semantic Space of Nouns

The events in which objects are involved provide the structuring dimensions to represent the semantics of nouns

Two major criteria to structure the event space

1. the type of event in which objects occur

2. the roles of objects in events

Nouns can be represented as regions in the space of events

The Dimensions of the Event Spaceevent classes

7 major event classes correspond to basic cognitive domains for events

typical top classes in semantic lexicons (e.g. WordNet, SIMPLE)

ACT dormire “to sleep”, bere “to drink”, lavorare “to work”, etc.

CHANGE aprire “to open”, aumentare “to rise”, sciogliere “to melt”, etc.

CREATION costruire “to build”, creare “to create”, fondare “to found”, etc.

COGNITION pensare “to think”, vedere “to see”, leggere “to read”, etc.

COMMUNICATION dire “to say”, dichiarare “to declare”, affermare “to affirm”, etc.

POSSESSION dare “to give”, possedere “to possess”, comprare “to buy”, etc.

SPACE arrivare “to arrive”, correre “to run”, abitare “to live”, etc.

The Dimensions of the Event Spaceobject roles in events

Two basic roles of objects in events subject of event (S)

e.g. The President read the report subject of COGNITION direct object of event (O)

e.g. The President read the report direct object of COGNITION

<ACT, S> <ACT, O>

<CHANGE, S> <CHANGE, O>

<CREATION, S> <CREATION, O>

<COGNITION, S> <COGNITION, O>

<COMMUNICATION,S> <COMMUNICATION, O>

<POSSESSION, S> <POSSESSION, O>

<SPACE, S> <SPACE, O>

event class

subject of event object of event

The Dimensions of the Event Space

Nouns are represented as regions in a 14-dimensional event semantic space

<COGNITION, O>

newspaper

book

dictionary

<COMMUNICATION, S>

<POSSESSION, O>

car

Locating Nouns in the Event Space from Corpus Distributions

The position of a noun wrt a dimension <C, r> is statistically correlated with the number of verb types belonging to the event

class C with which the noun occurs in a corpus with role r

own: POSSESSION

read: COGNITION

buy: POSSESSION

say: COMMUNICATION

verb noun

subj

verb noun

obj

president

newspaper

bank

book

First Experiment

Training set 25.000 triples <verb, noun, role> extracted from an

Italian corpus (general and economic newspapers)<leggere, libro, o>, <correre, cavallo, s>, etc. automatic extraction with manual revision

the verb in each triple has been assigned to one of the 7 event classes<leggere: COGNITION, libro, o>, <correre: SPACE, cavallo, s> the SIMPLE Italian lexicon acted as background lexical resource

for verb class assignment

First Experiment

CLASS (Allegrini, Montemagni, Pirrelli 2000) distributionally-based machine learning method to estimate

association scores between a noun and a verb similarity scores between two nouns

The CLASS algorithm has been extended to compute the association score between nouns and event classes

For each noun n, event class C and role r, we computed the association score AS(n, C, r) AS is estimated from the number of triples <v, n, r> in the

training set, such that vC

Putting Nouns into Semantic Spaces

A noun is represented as a 14-dimension real-valued vector each value determines the position of the noun wrt to a certain

semantic dimension in the event space

libro "book"

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

libro


A noun is represented as a 14-dimension real-valued vector each value determines the position of the noun wrt to a certain

semantic dimension in the event space

governo "government"

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

governo


Similar nouns tend to share close regions in the semantic space

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

banca

governobankgovernment


0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

libro

governo

Less similar nouns are more distant in the semantic space

bookgovernment

Local Semantic Similarity

Similarity relations between nouns change depending on the semantic dimension

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

idea

libro

ideabook


Similarity relations between nouns change depending on the semantic dimension

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

denaro

libro

moneybook

Local Semantic Similaritythe emergence of semantic dynamics

“Time flies”

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

tempo

macchina

space domain

timecar, machine

Local Semantic Similaritythe emergence of semantic dynamics

“Time is money”possession domain

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0,16

Act_S

Act_O

Change_S

Change_O

Creat_S

Creat_O

Cog_S

Cog_O

Comm

_S

Comm

_O

Poss_S

Poss_O

Space_S

Space_O

tempo

denarotimemoney


Each semantic dimension determines its own similarity space similar nouns tend to converge towards similar value distributions along

particular semantic dimensions

Experiment given a target noun n, find with CLASS the most similar nouns to n

wrt a particular semantic dimension The similarity of two nouns ni and nk with respect to a given

semantic dimensions <C, r> is estimated from the number of verb types belonging to C they share with role r

There is no global similarity space for nouns


tempo “time”

<Space, S> <Possession, O>

acqua 0.0128054

macchina 0.0107913

paese 0.010286

strada 0.0102453

inflazione 0.00975182

casa 0.0127473

credito 0.010111

titolo 0.00946314

miliardo 0.00944679

lavoro 0.00931217

libro “book”

<Cognition,O> <Possession, O>

parola 0.0120119

situazione 0.0116718

problema 0.0116064

verita' 0.010701

ruolo 0.0104088

casa 0.0199175

denaro 0.0150771

quota 0.0149145

fiducia 0.012468

tempo 0.0119739

Similarity spaces relative to specific semantic dimensions

The Shape of Semantic Spaces

Events determine similarity spaces for nouns that can not be easily mapped onto standard taxonomies

libro “book”

<leggere: COGNITION, o> “to read”

<consultare: COGNITION, o> “to consult”

musica 0.0016810800

domanda 0.0016467700

previsione 0.0014409200

carta 0.0014409200

pensiero 0.0012808200

numero 0.0012808200

norma 0.0012808200

contenuto 0.0012808200

discorso 0.0009606160

medico 0.0009546520

dizionario 0.0009466520

avvocato 0.0009424520

orologio 0.0002386630

Discrete vs. Continuous Representations

The dimensions (e.g. POSSESSION, ACT, SPACE, etc.) structuring the semantic space “look like” traditional conceptual primitives, but they are radically different in standard representations these primitives are assigned to

nouns in a dichotomic (YES/NO) way a noun n has OR (exclusive) has not a certain feature or conceptual

function in the event space representations, semantic dimensions

are assigned to nouns in a gradient, continuous way e.g. two nouns n1 and n2 can have the same feature POSSESSION

but to different degrees

Semantic Representations as Complex Objects

It is possible to design semantic representations inherently context-dependent

positions in the semantic space is determined and conditioned by the way words distribute in contexts

multidimensional naturally polysemous

polysemy emerges out of semantic representations (cf. also Elman 1995, 2004)

semantic dynamics are directly related to the structure of representations

Meanings as Emergent Systems

Meanings are systems of dimensions that structure the semantic space organize (linguistic, but also sensory) contextual data

distributional data alone are not enough!! guide and constrain semantic change Emerge out of usage distribution make explicit various types of lexical relations provide an explicit representation of word semantic content

Searching for Semantic Spaces

Investigating which semantic dimensions provide the best structure for the semantic space empirical verifications of models of conceptualization computational analysis as a probe into semantic

organizations to explore and simulate dynamics in the lexicon

A research program for computational linguistics in cognitive semantics

Alcune conclusioni

La complessità nel linguaggio significa funzionalismo alto parallelismo e integrazione di vincoli linguistici natura probabilistica l’uso comunicativo come radice della competenza

linguistica superamento di dicotomie tipiche dei modelli classici (es.

lessico vs. grammatica) integrazione di vincoli non specifici: cognitivi, biologici

e sociali azione di vincoli sistemici

Documents

27 settembre 2005 Complessità, linguaggio e computazione Alessandro Lenci Università di Pisa, Dipartimento di Linguistica Via Santa Maria, 36, 56100 Pisa,