Error analysis of Word Sense Disambiguation

Error analysis of Word Sense DisambiguationRuben IzquierdoMarten PostmaPiek Vossen

Izq

uie

rdo

, P

ost

ma

an

d V

oss

en

VU

Am

ste

rda

m

Motivation

Word Sense Disambiguation is still an unsolved problem

2 Izquierdo, Postma and Vossen VU Amsterdam

Error Analysis

Perform error analysis on previous WSD evaluations to prove our hypothesis

Senseval-2: all-words task

Senseval-3: all-words task

Semeval2007: all-words task (#17)

Semeval2010: all-words on specific domain (#17)

Semeval2013: multilingual all-words WSD and entity linking (#12)


Motivation

Some “propagated” errors

Errors on monosemous

Errors because pos-tags

Multiwords and phrasal verbs

Little attention has been paid to the real problem

WSD is not 1 problem but N problems

Our hypothesis

Context is not modeled properly in general

System rely too much on the most frequent sense


Monosemous errors


Monosemous errors


Competition Monosemous Wrong Examples

Senseval2 499 (20.9%) 37.5% gene.n (suppressor_gene.n), chance.a(chance.n) next.r (next.a)

Senseval3 334 (16.6%) 44.1% Datum.n (data.n) making.n (make.v) out_of_sight (sight)

Semeval2007 25 (5.5%) 11.1% get_stuck.v, lack.v, write_about.v

Semeval2010 31 (2.2%) 97.9% Tidal_zone.n pine_marten.n roe_deer.ncordgrass.n

Semeval2013 (lemmas)

348 (21.1%) 1.9% Private_enterprise, developing_country, narrow_margin

Most Frequent Sense


Most Frequent Sense

When the correct sense is NOT the most frequent sense

Systems still assign mostly the MFS

Senseval2

799 tokens are not MFS

84% systems still assign the MFS

Most “failed” words due to MFS bias

Senseval2, senseval3

Say.v find.v take.v have.v cell.n church.n

Semeval2010

Area.n nature.n connection.n water.n population.n


Analysis per PoS-tag


Analysis per polysemy class


2Senses

Poly. C.

6 15

Low Medium High

Analysis per frequency class


Most difficult words


Expected vs. Observeddifficulties

Calculate per sentence

The “expected” difficulty

Average polysemy, sentence length, average word length




Average polysemy, sentence length, average word length





Average polysemy, sentence length, average wor length

The “observed” difficulty

From the real participant outputs, average error rate

We should expect:

harder sentences higher error rate

easier sentences lower error rate







• The context is not (probably) exploited properly • Expected “easy” sentences SHOULD show low error rates• Occurrences of the same word in different contexts have similar error

rate• The difficulty of a word depends more on its polysemy than on the

context where it appears18 Izquierdo, Postma and Vossen VU Amsterdam


WSD Corpora

http://github.com/rubenIzquierdo/wsd_corpora


https://github.com/rubenIzquierdo/wsd_corpora

WSD Corpora


System Outputs

https://github.com/rubenIzquierdo/sval_systems



System Outputs


Error analysis of Word Sense Disambiguation

Ruben Izquierdo

Marten Postma

Piek Vossen

[email protected]

http://github.com/rubenIzquierdo/wsd_corpora

http://github.com/rubenIzquierdo/sval_systems

23

mailto:[email protected]

https://github.com/rubenIzquierdo/wsd_corpora


Analysis per PoS-tag


Presentations & Public Speaking

Error analysis of Word Sense Disambiguation