22
Automatic Eurovoc Indexing: Results and Evaluations Bruno Pouliquen Lang Tech group, JRC, European Commission Ispra-Italy http://www.jrc.cec.eu.int/langtech Addressing the Language Barrier Problem in the Enlarged EU Automating Eurovoc Descriptor Assignment

Automatic Eurovoc Indexing: Results and Evaluations Bruno Pouliquen Lang Tech group, JRC, European Commission Ispra-Italy

Embed Size (px)

Citation preview

Automatic Eurovoc Indexing: Results and Evaluations

Bruno PouliquenLang Tech group, JRC, European Commission

Ispra-Italy

http://www.jrc.cec.eu.int/langtech

Addressing the Language Barrier Problem in the Enlarged EU

Automating Eurovoc Descriptor Assignment

Contents

• Viewing the results– Browser– Exports– Validation interface

• Evaluation method– Test set– Precision/Recall– Evaluation interface

• Results

Browser

For a given text:• The original text• The pre-processed text• Keywords in the text (associates)• Eurovoc descriptors manually assigned• Eurovoc descriptors assigned automatically

– With context• Access to parallel texts

Browser example:

COMMISSION DECISION of 8 September 1997 on the temporary suspension of imports of pistachios and certain products derived from pistachios originating in or consigned from Iran (Text with EEA relevance) (97/613/EC) THE COMMISSION OF THE EUROPEAN COMMUNITIES,Having regard to the Treaty establishing the European Community,Having regard to Council Directive 93/43/EEC of 14 June 1993 on the hygiene of foodstuffs (1), and in particular Article 10 thereof,Whereas pistachios originating in or consigned from Iran are in many cases contaminated with excessive levels of Aflatoxin B1;Whereas the Scientific Committee for Food has noted that Aflatoxin B1, even at extremely low doses, causes cancer of the liver and in addition it is genotoxic;Whereas this constitutes a serious threat to public health within the Community and it is imperative to adopt urgently protective measures at Community level;Whereas, in the absence, at this time, of sanitary guarantees from the Iranian authorities, it is necessary to suspend imports of pistachios and certain products derived from pistachios originating in or consigned from Iran;…

Pre-processed text

Keywords (associates) in text

Histogram

Keywords occurring in the text

Eurovoc descriptors assigned

Descriptors assigned manually

Descriptors assigned automatically

Export of the results

• XML file containing the assignment<assignment><descriptor ID="1006020102000000" COSINE="0.20" OKAPI="8.83"> PRESIDENCY OF THE EC COUNCIL</descriptor><descriptor ID="1016030000000000" COSINE="0.17" OKAPI="9.08"> EUROPEAN UNION</descriptor><descriptor ID="1006040100000000" COSINE="0.15" OKAPI="9.63"> PRESIDENT</descriptor><descriptor ID="2826020000000000" COSINE="0.14" OKAPI="7.82"> SOCIAL POLICY</descriptor><descriptor ID="1011020102000000" COSINE="0.14" OKAPI="8.22"> PRINCIPLE OF SUBSIDIARITY</descriptor>...

</assignment>

Validation interface: overview

“Financial Instrument for Fisheries Guidance”is in the text

This text was previously indexed with this descriptor

“fish”, “fisherman”, “fishery”, “conservation”and “fishery_resources” are in the text

Validation interface: example of good assignment

Validation interface: example of bad assignment on a small text

Strangely, “Austria” was manually assigned

“UN convention” was manually assigned

Validation interface: other example of bad assignment...

Evaluation method

• A test set is built– Not used for training– Should be representative

• After training, we compare automatically the manually assigned descriptors to the automatically assigned ones– Depending on the rank (number of descriptors)– Depending on the various parameters and formulae

• Use precision/recall

Evaluation results

• Use precision/recall (here: English)

Rank Precision Recall Prec RT Rec RTF1-

measure

1 76.418 14.560 82.890 15.886 24.45

2 68.617 25.800 76.463 28.051 37.49

3 61.820 34.531 71.011 37.823 44.31

4 57.114 42.279 67.509 45.850 48.58

5 51.489 47.496 63.209 51.279 49.41

6 47.015 51.843 59.427 55.776 49.31

7 43.364 55.687 56.130 59.453 48.75

8 40.027 58.660 53.147 62.378 47.58

9 36.879 60.782 50.325 64.738 45.90

F1-Measure: combines precision and recall (Harmonic average)

Evaluation interface

Graph showing precision/recall/F-measure depending on the number of descriptors

Results across languages

0

10

20

30

40

50

60

En Es Da De Fi Fr It Nl Pt Sv Li Hu

With pre-processing (French=> only stop words)Without pre-processing

Validation: expert judgment

• Expert judgment on automatic assignment:– ‘G’ for good descriptor– ‘NT’ for good, but it would have been better to use a NT instead– ‘BT’ for good, but its BT would have been better– ‘?’ for un known /not possible to make a judgement in the time

available– ‘B’ for clearly bad– ‘S’ for semantically related, but wrong.

Manual Evaluation of the Assignment

Manual Evaluation

Manual Evaluation of manual assignment

Manual Evaluation - Overview

Manual Evaluation of Automatic Assignment

• Correct descriptors compared to benchmark of manual assignment

English: 65 / 78 = 83%Spanish: 69 / 87 = 80%