49
Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Embed Size (px)

Citation preview

Page 1: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Evaluation of Association Measures

Page 2: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Want to

• identify the practical feasibility of a certain AM for identifying collocations? which types of collocation? which corpora (domain, size)? high frequency versus low

frequency data

• compare the outcomes of different association measures

Page 3: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

• We have • differently ranked collocation candidates

• We need • true collocation data for comparison,

e.g• collocation lexica• list of true collocations occurring in the

extraction corpus

Page 4: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Problems & Inconveniences

using collocation lexica for evaluation• will not tell us how well an AM worked

on a particular corpus• it only tells us that

• some of the reference collocations also occur in in our base data and

• the AM has found them

Page 5: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Problems & Inconveniences

Using a list of true collocations occurring in the extraction corpus• requires a good deal of hand-

annotation• requires “objective” criteria for the

distinction of collocational and noncollocational word combinations in our candidate list

Page 6: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Our Approach

• Evaluation of lexical association measures AMs against a manually identified reference corpus of true collocations (TPs)

• Evaluation based on the full reference set

• Precise, linguistically motivated definition of TPs

• The evaluation of results based on recall and precision graphs

Page 7: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

For Further Discussion

• Testing for significance of AMs is an important but still open question

• There is a potential for fine-tuning of AMs given a specific data set and a particular type of collocations to be extracted(Krenn, Evert 2001)

Page 8: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Evaluation Experiments

Page 9: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Data

Extraction corpora • newspaper: 8 million words

Frankfurter Rundschau Corpus(ECI Multilingual Corpus 1)

• newsgroup: 10 million words FLAG corpus (LT-DFKI)

Page 10: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Data

• Base data: • list of

PP-verb pairs ~ (PN,V)-combinations

• Collocation types:• support verb constructions FVG• figurative expressions figur

Page 11: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Examples

zur Verfügung stellenFVG at_the availability put

“make available”

am Herzen liegenfigur at_the heart lie

“have at heart”

Page 12: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Support Verb Constructions FVG

• verb-object collocation• function as predicates• can be paraphrased by main verbs• NP-verb or PP-verb

• verbal collocate (function verb / light verb / support verb)• main verb • conveys Aktionsart and causativity

Page 13: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Support Verb Constructions FVG

• nominal collocate • abstract noun • often de-verbal or de-adjectival• contributes the core meaning

• (prepositional collocate)

• verbal and nominal collocate together determine the argument structure of the collocation

Page 14: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

FVG Examples

pred. phrase

verb Actionsart caus translation

in Betrieb gehen incho - ‘go into operation’

nehmen incho + ‘put into operation’

setzen incho + ‘start up’

sein neutral - ‘be running’

bleiben contin - ‘keep on running’

lassen contin + ‘keep (sth) running

Page 15: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

FVG Examples

pred. phrase

verb Actionsart caus translation

ausser Betrieb

gehen termin - ‘go out of sevice’

nehmen termin + ‘take out of sevice’

setzen termin + ‘stop

sein neutral - ‘be out of order’

bleiben contin - ‘stay out of order’

lassen contin + ‘keep out of order’

Page 16: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Figurative Expressionsfigur

• not restricted to NP/PP-verb• figurative reinterpretation of literal

meaning required(e.g., unter die Haut gehen (get under ones skin)

• nouns: conrete • verbs: often causative-noncausative

alternation e.g., auf Eis legen (put on ice) auf Eis liegen (be on ice)

Page 17: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Decision Tree:FVG versus figur

Page 18: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Frequency Distributions

PP-verb pairs (full forms)

instances candidates corpus

newspaper 453,861 372,121

newsgroup 912,287 631,140

Page 19: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Frequency Distributions

PP-verb pairs (instances) with

corpus f >= 3 f >= 5 f >= 10

newspaper 10,396 2,853 743

newsgroup -- 4,795 1,029

Page 20: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Frequency Distributions

baseline precision

corpus total figur FVG

newspaperf >= 3

12.31%(1,280)

5.47%(569)

6.84%(711)

newsgroupf >= 5

12.45%(597)

5.38%(258)

7.07%(339)

Page 21: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Combination of Properties in the Candidate Lists

newspaperf >= 3FVG

newspaperf >= 3figur

newsgroupf >= 5

FVG,figur

newspaperf >= 3

FVG,figur

Page 22: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Evaluation Procedure

SourceCorpus

1.9921.9921.9861.5781.6521.6722.4401.5961.7311.9922.9471.9991.7051.7171.7191.7231.7241.9981.9992.449

...

ab Dienstag bietetab Donnerstag bietetab Freitag bietetab Jahren beginntab Jahren bietetab Jahren eingeladenab Jahren geeignetab Jahren heißtab Jahren käthiab Jahren tanzenab Jahren treffenab Juni restauriertab Mark findenab Mark kostetab Mark zu_findenab März bietet+anab Mittwoch bietetab Notierungen nutzenab Notierungen zu_nutzenab November einladen...

t-scoreCandidate pair

candidatelist

Page 23: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Evaluation Procedure

1.2.3.4.5.6.7.8.9.

10.11.12.13.14.15.16.17.18.19.20.

...

Rank19.21813.52312.75111.72411.14710.46510.008

9.7829.7009.4738.9788.9318.6138.6008.4238.3958.2988.2898.2828.269

...

um Uhr beginntbis Uhr geöffnetzur Verfügung stehenzur Verfügung gestelltzur Verfügung stellenums Leben gekommenzur Verfügung stehtauf Programm stehenin Anspruch genommenauf Tagesordnung stehenam Dienstag sagteam Montag sagteauf Seite lesenauf Kürzungen behält vorauf Programm stehtim Mittelpunkt stehtin Regionalausgabe erscheintan Stelle meldenauf Seite zeigenzur Verfügung zu_stellen...

t-scoreCandidate pair

significance list

Page 24: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Evaluation Procedure: N-best Lists

1.2.3.4.5.6.7.8.9.

10.11.12.13.14.15.16.17.18.19.20.

...

Rank19.21813.52312.75111.72411.14710.46510.008

9.7829.7009.4738.9788.9318.6138.6008.4238.3958.2988.2898.2828.269

...

um Uhr beginntbis Uhr geöffnetzur Verfügung stehenzur Verfügung gestelltzur Verfügung stellenums Leben gekommenzur Verfügung stehtauf Programm stehenin Anspruch genommenauf Tagesordnung stehenam Dienstag sagteam Montag sagteauf Seite lesenauf Kürzungen behält vorauf Programm stehtim Mittelpunkt stehtin Regionalausgabe erscheintan Stelle meldenauf Seite zeigenzur Verfügung zu_stellen...

t-scoreCandidate pair

9 false positives

11 true positives

precision:11/20 = 55%

total:1280 TPs

Page 25: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graph:PNV full forms

Page 26: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Base Line:Random Selection

Page 27: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs

Page 28: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs

Page 29: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs

Page 30: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Recall Graphs

Page 31: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision/Recall

Page 32: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs:Newspaper, FVG + figur

Page 33: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs: Newspaper

FVG figur

Page 34: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs:AdjN

Page 35: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision Graphs:AdjN

Page 36: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Precision/Recall:AdjN

Page 37: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Frequency Layers: AdjN Data

f 5

2 f < 5

Page 38: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Frequency Layers: PNV Data

f 10

3 f < 5

Page 39: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Lemmas vs. Word Forms (PNV)

lemmas f 3

word forms f 3

Page 40: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Text Type and Domain (PNV)

news group discussions

newspaper

comparison for non-lemmatised candidates

Page 41: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

The MI Mystery (FVG)

region of high "local precision" for 4.0 < MI < 7.5

Page 42: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Further particularities of the newspaper data

• candidates with MI > 7.5 are more frequent than expected under independence assumption

• but very few FVG among them

• data do not support the counter-MI argument of overestimation of data with low-frequency joint and marginal distributions

Page 43: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

optimized MI

• | MI - 5.75 |

• account for the FVG concentration • among 4.0<= MI >= 7.5• in the newspaper test data

Page 44: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summary of Results

• Best measures: • t-score / frequency best for

identifying PP-verb collocations (FVG, figur)

• log-likelihood, t-score, Fisher, binominal and multinominal p value work well for AdjN

Page 45: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summary of Results

• Reproducibility of results for different text types:• Precision results from newsgroup

data comparable to newspaper data

• Strong evidence that identical classes of collocations are similarly distributed in different types of corpora

Page 46: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summary of Results

• Differences in suitability of AMs to identify particular collocation types:• (PN,V)-candidates with high MI

score are less likely to be FVG• Log-likelihood not well suited for

identifying FVG• but better suited for identifying figur

Page 47: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summary of Results

• Experimental results based either on a small number of best-scoring candidates or on more than the first 50 % of the SLs are unreliable

Page 48: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Conclusion on AMs

Optimal results

do not necessarily come from a statistical discussion

but

from tuning on a particular data set

Page 49: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Evaluation of Association Measures

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Vast Land:Lowest-frequency Data

• lowest-frequency data (hapax legomena, dis legomena, ...) are a serious challenge for all statistical approaches

• typical solution: cut-off thresholds• Evert/Krenn used cut-off thresholds in

evaluation to reduce manual annotation work

• need to estimate number of TPs among excluded lowest-frequency candidates