Download ppt - Acquisition of Semantic Classes for Adjectives from Distributional Evidence

Acquisition of Semantic Classes for Adjectives from

Distributional Evidence

Gemma BoledaUniversitat Pompeu Fabra

Barcelona

general picture

• automatic classification of adjectives– Catalan

• according to broad semantic characteristics• clustering

– syntactic evidence

motivation

• Lexical Acquisition– infer properties of words– lexical bottleneck

• both symbolic and statistical approaches

• adjectives– determining NP reference

• the French general– establishing properties of entities

• this maimai is round and sweet

motivation

• initial motivation: POS-tagging– 55% remaining ambiguity involves adjectives

general francès: ‘French general’ or ‘general French’?• observations

– general tendencies in syntactic behaviour of adjectives– ... which correspond to broad semantic properties

• generalisation: best at semantic level– low-level tasks (POS-tagging)– initial schema for lexical semantic representation

approach

• no general, well established semantic classification– have to build and test ours!

• clustering: unsupervised technique– groups objects according to feature distribution– does not depend on pre-classification– provides insight into the nature of the data

• shallow approach to syntax: n-grams– limited syntactic distribution– local relationship to arguments=> test feasibility

rodó ‘round’ 0.4 0.4 0.2

dolç ‘sweet’ 0.5 0.4 0.1

francès ‘French’ 0.1 0.6 0.3

italià ‘Italian’ 0.05 0.5 0.45

outline

• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis

Boleda, Badia, Batlle (2004)

outline


adjective syntax

• default function: noun modifier (92%)– right of the noun (default position: 72%)– some to the left (‘epithets’: 28%)

• predicative uses unfrequent (7%), but significant

two-way classification

• number of arguments– unary: pilota vermella ‘red ball’– binary: professor gelós de la Maria ‘teacher jealous of

Maria’• ontological kind (Ontological Semantics)

– basic: vermell ‘red’– object: malaltia pulmonar ‘pulmonary disease’ (=>

lung)– event: propietat constitutiva ‘constitutive property’ (=>

constitutes)

Ontological Semantics

• coverage (ordinary cases)• machine tractability• explicit model of world: ontology

– vermell => attribute::colour::red(x)– pulmonar => related-to::lung(x)– constitutiu => event::benef::constitute(x)

• however: no commitment to particular framework

rationale

• observation: syntactic preferences correspond to semantic properties

• hypothesis: we can use syntactic features to infer semantic classes

outline

• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• conclusions and future work

data and procedure

• 2283 adjectives>50 times in 16 million word Catalan corpus

• lemma and morphological info

• cluster the whole set– perform different tasks on different subsets

• tuning subset: choose features• Gold Standard: evaluation and analysis

features and feature selection• features:

– empirically chosen from blind distribution– double bigram, simplified POS-representation

ella diu que la pilota vermella és seva

she says that the ball red is hers

-3ey -2dd -1cn +1ve

• tuning subset: 100 adjectives– choose features (distribution)

Fig. A: Feature selection

analysis

• Gold Standard– 80 adjectives– annotated by 3 human judges, acceptable

agreement (92 and 84%, .72 and .74 kappa)

outline


experiment 1: unary / binary

• final evaluation:10 features, raw percentage– clustering algorithm: k-means (cosine)

• predictions:– binary adjectives cooccur with prepositions

more frequently than unary ones– unary adjectives are more flexible

unary / binary: results

• agreement with Gold Standard: – 97%, kappa = 0.87 – comparable to humans

• features:

cl high low0 (un) -1cn +1prep

1 (bin) +1prep (-1cn)Fig. B: Clusters vs. unary/binary

unary (yellow)

binary (red)

outline


experiment 2: basic / object / event

• final evaluation: 32 features, normalisation– clustering algorithm: k-means (cosine)

• predictions:– basic adjectives are flexible, work as epithets,

occur in predicative contexts, appear further from the noun

– object adjectives appear rigidly after the noun– event adjectives tend to occur in predicative

positions and do not act as epithets

basic / object / event: results

• agreement with Gold Standard: – 73%, kappa = 0.56– lower than humans

• features:

cl high low

0 (obj) -1cn -1ve1 (ev) +1prep

2 (bas) -1co +1ajFig C: Clusters vs. basic/event/object

object (yellow)

event (orange)

basic (red)

basic/object/event: error analysis

• something has gone wrong!– characterisation of event adjectives

Fig C: Clusters vs. basic/event/object

Fig D: Clusters vs. unary/binary

binary!unary event adjectivesbasic adjectives with an object reading (polysemy)

binary event adjectives

outline


partial conclusions

• overall, results seem to back up:– use of syntax-semantics interface for adjectives– linguistic predictions as to relevant features and

differences across classes– shallow approach

• unary / binary: piece of cake– few binary adjectives, but worth spotting

(denote relationships)

partial conclusions

• basic / object / event: need reworking– object adjectives seem to be the most robust

class– variation in basic adjectives (default class),

polysemy– event adjectives: seem to behave much like

basic adjectives with respect to features chosen => redefine class?

outline


outlook: rest of the thesis

• rethink classification• redefine features in light of results• integrate polysemy judgments into the

experiment and analysis• perform experiments with other corpora

classification

• what to do with event adjectives? cp.:– constitutiu ‘constitutive’ (“active”)– legible ‘readable’ (“passive”)– reproductor ‘reproducing’ (“active,

habituality”)• yet another parameter: gradability

– important for adjectives– should be easy to induce

better blind distribution or self-defined features?

empirical accurate sparseness objective

blind X X ?

self X?(depends on method)

X

• n-grams: sparseness, selection

• other features?– account for different levels of description

polysemy

• crucial aspect, explains much of results• difficult to integrate!

– meaningless kappa values• alternatives?

– clearer definition of polysemy within task– specific tests– other resources: dictionary?

other resources

• CUCWeb (208 million word)http://www.catedratelefonica.upf.es

• test whether “more data is better data” (Mercer and Church 1993: 18-19)– advantages and challenges of Web corpora

• current results: for verb subcategorisation experiment, results 12 points lower than using smaller, balanced, controled corpus

Acquisition of Semantic Classes for Adjectives from

Distributional Evidence

Gemma BoledaUniversitat Pompeu Fabra

Barcelona