Acquisition of Semantic Classes for Adjectives from
Distributional Evidence
Gemma BoledaUniversitat Pompeu Fabra
Barcelona
general picture
• automatic classification of adjectives– Catalan
• according to broad semantic characteristics• clustering
– syntactic evidence
motivation
• Lexical Acquisition– infer properties of words– lexical bottleneck
• both symbolic and statistical approaches
• adjectives– determining NP reference
• the French general– establishing properties of entities
• this maimai is round and sweet
motivation
• initial motivation: POS-tagging– 55% remaining ambiguity involves adjectives
general francès: ‘French general’ or ‘general French’?• observations
– general tendencies in syntactic behaviour of adjectives– ... which correspond to broad semantic properties
• generalisation: best at semantic level– low-level tasks (POS-tagging)– initial schema for lexical semantic representation
approach
• no general, well established semantic classification– have to build and test ours!
• clustering: unsupervised technique– groups objects according to feature distribution– does not depend on pre-classification– provides insight into the nature of the data
• shallow approach to syntax: n-grams– limited syntactic distribution– local relationship to arguments=> test feasibility
rodó ‘round’ 0.4 0.4 0.2
dolç ‘sweet’ 0.5 0.4 0.1
francès ‘French’ 0.1 0.6 0.3
italià ‘Italian’ 0.05 0.5 0.45
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis
Boleda, Badia, Batlle (2004)
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis
adjective syntax
• default function: noun modifier (92%)– right of the noun (default position: 72%)– some to the left (‘epithets’: 28%)
• predicative uses unfrequent (7%), but significant
two-way classification
• number of arguments– unary: pilota vermella ‘red ball’– binary: professor gelós de la Maria ‘teacher jealous of
Maria’• ontological kind (Ontological Semantics)
– basic: vermell ‘red’– object: malaltia pulmonar ‘pulmonary disease’ (=>
lung)– event: propietat constitutiva ‘constitutive property’ (=>
constitutes)
Ontological Semantics
• coverage (ordinary cases)• machine tractability• explicit model of world: ontology
– vermell => attribute::colour::red(x)– pulmonar => related-to::lung(x)– constitutiu => event::benef::constitute(x)
• however: no commitment to particular framework
rationale
• observation: syntactic preferences correspond to semantic properties
• hypothesis: we can use syntactic features to infer semantic classes
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• conclusions and future work
data and procedure
• 2283 adjectives>50 times in 16 million word Catalan corpus
• lemma and morphological info
• cluster the whole set– perform different tasks on different subsets
• tuning subset: choose features• Gold Standard: evaluation and analysis
features and feature selection• features:
– empirically chosen from blind distribution– double bigram, simplified POS-representation
ella diu que la pilota vermella és seva
she says that the ball red is hers
-3ey -2dd -1cn +1ve
• tuning subset: 100 adjectives– choose features (distribution)
Fig. A: Feature selection
analysis
• Gold Standard– 80 adjectives– annotated by 3 human judges, acceptable
agreement (92 and 84%, .72 and .74 kappa)
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis
experiment 1: unary / binary
• final evaluation:10 features, raw percentage– clustering algorithm: k-means (cosine)
• predictions:– binary adjectives cooccur with prepositions
more frequently than unary ones– unary adjectives are more flexible
unary / binary: results
• agreement with Gold Standard: – 97%, kappa = 0.87 – comparable to humans
• features:
cl high low0 (un) -1cn +1prep
1 (bin) +1prep (-1cn)Fig. B: Clusters vs. unary/binary
unary (yellow)
binary (red)
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis
experiment 2: basic / object / event
• final evaluation: 32 features, normalisation– clustering algorithm: k-means (cosine)
• predictions:– basic adjectives are flexible, work as epithets,
occur in predicative contexts, appear further from the noun
– object adjectives appear rigidly after the noun– event adjectives tend to occur in predicative
positions and do not act as epithets
basic / object / event: results
• agreement with Gold Standard: – 73%, kappa = 0.56– lower than humans
• features:
cl high low
0 (obj) -1cn -1ve1 (ev) +1prep
2 (bas) -1co +1ajFig C: Clusters vs. basic/event/object
object (yellow)
event (orange)
basic (red)
basic/object/event: error analysis
• something has gone wrong!– characterisation of event adjectives
Fig C: Clusters vs. basic/event/object
Fig D: Clusters vs. unary/binary
binary!unary event adjectivesbasic adjectives with an object reading (polysemy)
binary event adjectives
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis
partial conclusions
• overall, results seem to back up:– use of syntax-semantics interface for adjectives– linguistic predictions as to relevant features and
differences across classes– shallow approach
• unary / binary: piece of cake– few binary adjectives, but worth spotting
(denote relationships)
partial conclusions
• basic / object / event: need reworking– object adjectives seem to be the most robust
class– variation in basic adjectives (default class),
polysemy– event adjectives: seem to behave much like
basic adjectives with respect to features chosen => redefine class?
outline
• adjective syntax and semantic classification• methodology• experiment 1• experiment 2• partial conclusions• outlook: rest of the thesis
outlook: rest of the thesis
• rethink classification• redefine features in light of results• integrate polysemy judgments into the
experiment and analysis• perform experiments with other corpora
classification
• what to do with event adjectives? cp.:– constitutiu ‘constitutive’ (“active”)– legible ‘readable’ (“passive”)– reproductor ‘reproducing’ (“active,
habituality”)• yet another parameter: gradability
– important for adjectives– should be easy to induce
better blind distribution or self-defined features?
empirical accurate sparseness objective
blind X X ?
self X?(depends on method)
X
• n-grams: sparseness, selection
• other features?– account for different levels of description
polysemy
• crucial aspect, explains much of results• difficult to integrate!
– meaningless kappa values• alternatives?
– clearer definition of polysemy within task– specific tests– other resources: dictionary?
other resources
• CUCWeb (208 million word)http://www.catedratelefonica.upf.es
• test whether “more data is better data” (Mercer and Church 1993: 18-19)– advantages and challenges of Web corpora
• current results: for verb subcategorisation experiment, results 12 points lower than using smaller, balanced, controled corpus
Acquisition of Semantic Classes for Adjectives from
Distributional Evidence
Gemma BoledaUniversitat Pompeu Fabra
Barcelona