Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation of Association Measures
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Want to
• identify the practical feasibility of a certain AM for identifying collocations? which types of collocation? which corpora (domain, size)? high frequency versus low
frequency data
• compare the outcomes of different association measures
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
• We have • differently ranked collocation candidates
• We need • true collocation data for comparison,
e.g• collocation lexica• list of true collocations occurring in the
extraction corpus
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Problems & Inconveniences
using collocation lexica for evaluation• will not tell us how well an AM worked
on a particular corpus• it only tells us that
• some of the reference collocations also occur in in our base data and
• the AM has found them
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Problems & Inconveniences
Using a list of true collocations occurring in the extraction corpus• requires a good deal of hand-
annotation• requires “objective” criteria for the
distinction of collocational and noncollocational word combinations in our candidate list
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Our Approach
• Evaluation of lexical association measures AMs against a manually identified reference corpus of true collocations (TPs)
• Evaluation based on the full reference set
• Precise, linguistically motivated definition of TPs
• The evaluation of results based on recall and precision graphs
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
For Further Discussion
• Testing for significance of AMs is an important but still open question
• There is a potential for fine-tuning of AMs given a specific data set and a particular type of collocations to be extracted(Krenn, Evert 2001)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation Experiments
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Data
Extraction corpora • newspaper: 8 million words
Frankfurter Rundschau Corpus(ECI Multilingual Corpus 1)
• newsgroup: 10 million words FLAG corpus (LT-DFKI)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Data
• Base data: • list of
PP-verb pairs ~ (PN,V)-combinations
• Collocation types:• support verb constructions FVG• figurative expressions figur
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Examples
zur Verfügung stellenFVG at_the availability put
“make available”
am Herzen liegenfigur at_the heart lie
“have at heart”
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Support Verb Constructions FVG
• verb-object collocation• function as predicates• can be paraphrased by main verbs• NP-verb or PP-verb
• verbal collocate (function verb / light verb / support verb)• main verb • conveys Aktionsart and causativity
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Support Verb Constructions FVG
• nominal collocate • abstract noun • often de-verbal or de-adjectival• contributes the core meaning
• (prepositional collocate)
• verbal and nominal collocate together determine the argument structure of the collocation
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
FVG Examples
pred. phrase
verb Actionsart caus translation
in Betrieb gehen incho - ‘go into operation’
nehmen incho + ‘put into operation’
setzen incho + ‘start up’
sein neutral - ‘be running’
bleiben contin - ‘keep on running’
lassen contin + ‘keep (sth) running
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
FVG Examples
pred. phrase
verb Actionsart caus translation
ausser Betrieb
gehen termin - ‘go out of sevice’
nehmen termin + ‘take out of sevice’
setzen termin + ‘stop
sein neutral - ‘be out of order’
bleiben contin - ‘stay out of order’
lassen contin + ‘keep out of order’
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Figurative Expressionsfigur
• not restricted to NP/PP-verb• figurative reinterpretation of literal
meaning required(e.g., unter die Haut gehen (get under ones skin)
• nouns: conrete • verbs: often causative-noncausative
alternation e.g., auf Eis legen (put on ice) auf Eis liegen (be on ice)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Decision Tree:FVG versus figur
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Frequency Distributions
PP-verb pairs (full forms)
instances candidates corpus
newspaper 453,861 372,121
newsgroup 912,287 631,140
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Frequency Distributions
PP-verb pairs (instances) with
corpus f >= 3 f >= 5 f >= 10
newspaper 10,396 2,853 743
newsgroup -- 4,795 1,029
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Frequency Distributions
baseline precision
corpus total figur FVG
newspaperf >= 3
12.31%(1,280)
5.47%(569)
6.84%(711)
newsgroupf >= 5
12.45%(597)
5.38%(258)
7.07%(339)
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Combination of Properties in the Candidate Lists
newspaperf >= 3FVG
newspaperf >= 3figur
newsgroupf >= 5
FVG,figur
newspaperf >= 3
FVG,figur
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation Procedure
SourceCorpus
1.9921.9921.9861.5781.6521.6722.4401.5961.7311.9922.9471.9991.7051.7171.7191.7231.7241.9981.9992.449
...
ab Dienstag bietetab Donnerstag bietetab Freitag bietetab Jahren beginntab Jahren bietetab Jahren eingeladenab Jahren geeignetab Jahren heißtab Jahren käthiab Jahren tanzenab Jahren treffenab Juni restauriertab Mark findenab Mark kostetab Mark zu_findenab März bietet+anab Mittwoch bietetab Notierungen nutzenab Notierungen zu_nutzenab November einladen...
t-scoreCandidate pair
candidatelist
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation Procedure
1.2.3.4.5.6.7.8.9.
10.11.12.13.14.15.16.17.18.19.20.
...
Rank19.21813.52312.75111.72411.14710.46510.008
9.7829.7009.4738.9788.9318.6138.6008.4238.3958.2988.2898.2828.269
...
um Uhr beginntbis Uhr geöffnetzur Verfügung stehenzur Verfügung gestelltzur Verfügung stellenums Leben gekommenzur Verfügung stehtauf Programm stehenin Anspruch genommenauf Tagesordnung stehenam Dienstag sagteam Montag sagteauf Seite lesenauf Kürzungen behält vorauf Programm stehtim Mittelpunkt stehtin Regionalausgabe erscheintan Stelle meldenauf Seite zeigenzur Verfügung zu_stellen...
t-scoreCandidate pair
significance list
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Evaluation Procedure: N-best Lists
1.2.3.4.5.6.7.8.9.
10.11.12.13.14.15.16.17.18.19.20.
...
Rank19.21813.52312.75111.72411.14710.46510.008
9.7829.7009.4738.9788.9318.6138.6008.4238.3958.2988.2898.2828.269
...
um Uhr beginntbis Uhr geöffnetzur Verfügung stehenzur Verfügung gestelltzur Verfügung stellenums Leben gekommenzur Verfügung stehtauf Programm stehenin Anspruch genommenauf Tagesordnung stehenam Dienstag sagteam Montag sagteauf Seite lesenauf Kürzungen behält vorauf Programm stehtim Mittelpunkt stehtin Regionalausgabe erscheintan Stelle meldenauf Seite zeigenzur Verfügung zu_stellen...
t-scoreCandidate pair
9 false positives
11 true positives
precision:11/20 = 55%
total:1280 TPs
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graph:PNV full forms
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Base Line:Random Selection
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Recall Graphs
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision/Recall
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs:Newspaper, FVG + figur
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs: Newspaper
FVG figur
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs:AdjN
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision Graphs:AdjN
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Precision/Recall:AdjN
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Frequency Layers: AdjN Data
f 5
2 f < 5
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Frequency Layers: PNV Data
f 10
3 f < 5
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Lemmas vs. Word Forms (PNV)
lemmas f 3
word forms f 3
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Text Type and Domain (PNV)
news group discussions
newspaper
comparison for non-lemmatised candidates
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
The MI Mystery (FVG)
region of high "local precision" for 4.0 < MI < 7.5
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Further particularities of the newspaper data
• candidates with MI > 7.5 are more frequent than expected under independence assumption
• but very few FVG among them
• data do not support the counter-MI argument of overestimation of data with low-frequency joint and marginal distributions
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
optimized MI
• | MI - 5.75 |
• account for the FVG concentration • among 4.0<= MI >= 7.5• in the newspaper test data
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Summary of Results
• Best measures: • t-score / frequency best for
identifying PP-verb collocations (FVG, figur)
• log-likelihood, t-score, Fisher, binominal and multinominal p value work well for AdjN
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Summary of Results
• Reproducibility of results for different text types:• Precision results from newsgroup
data comparable to newspaper data
• Strong evidence that identical classes of collocations are similarly distributed in different types of corpora
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Summary of Results
• Differences in suitability of AMs to identify particular collocation types:• (PN,V)-candidates with high MI
score are less likely to be FVG• Log-likelihood not well suited for
identifying FVG• but better suited for identifying figur
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Summary of Results
• Experimental results based either on a small number of best-scoring candidates or on more than the first 50 % of the SLs are unreliable
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Conclusion on AMs
Optimal results
do not necessarily come from a statistical discussion
but
from tuning on a particular data set
Ste
fan
Eve
rt,
IMS
- U
ni
Stu
ttg
art
Bri
git
te K
ren
n,
ÖF
AI
Wie
n
IMS
Vast Land:Lowest-frequency Data
• lowest-frequency data (hapax legomena, dis legomena, ...) are a serious challenge for all statistical approaches
• typical solution: cut-off thresholds• Evert/Krenn used cut-off thresholds in
evaluation to reduce manual annotation work
• need to estimate number of TPs among excluded lowest-frequency candidates