Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Preview:

DESCRIPTION

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text. Silke Scheible , Richard Jason Whitt, Martin Durrell , and Paul Bennett. The GerManC project School of Languages, Linguistics, and Cultures University of Manchester (UK). Overview. Motivation The GerManC corpus - PowerPoint PPT Presentation

Citation preview

Evaluating an ‘off-the-shelf’

POS-tagger on Early Modern German text

Silke Scheible, Richard Jason Whitt, Martin Durrell, and Paul Bennett

The GerManC projectSchool of Languages, Linguistics, and Cultures

University of Manchester (UK)

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Overview

• Motivation• The GerManC corpus• POS-tagger and tagset• Challenges• Results

2

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

3

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

• Problems:– No specialised tagger available for EMG– Limited funds: Manual annotation not

feasible for whole corpus

4

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

• Problems:– No specialised tagger available for EMG– Limited funds: Manual annotation not

feasible for whole corpus

• Question:– How well does an ‘off-the shelf’ tagger for

modern German perform on Early Modern German data?

5

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

6

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

• Idea: – Develop gold-standard subcorpus of

GerManC – Use subcorpus to test and adapt modern

NLP tools– Create historical text processing pipeline

7

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

• Idea: – Develop gold-standard subcorpus of

GerManC – Use subcorpus to test and adapt modern NLP

tools– Create historical text processing pipeline

• Results useful for other small humanities-based projects wishing to add POS annotations to EMG data

8

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

9

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Purpose: Studies of development and standardisation of German language

• Texts published between 1650 and 1800

• Sample corpus (2,000 words per text)• Total corpus size: ca. 1 million words• Aims to be “representative”

10

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Eight genres

11

Orally-oriented

Print-oriented

DramasNewspapersLettersSermons

Narrative proseHumanities textsScience & medicine textsLegal texts

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Three periods

12

1650-1700

1700-1750

1750-1800

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Five regions

13

North German

West Central German

East Central German

West Upper German

East Upper German

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Three 2,000-word files per genre/period/region

• Total size: ca. 1 million words

14

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Gold-standard subcorpus: GerManC-GS

• One 2,000-word file per genre and period from North German region 24 files

• > 50,000 tokens• Annotated by two historical linguists• Gold standard POS tags, lemmas, and

normalised word forms

15

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

POS-tagger

• TreeTagger (Schmid, 1994)• Statistical, decision tree-based POS

tagger• Parameter file for modern German

supplied with the tagger• Trained on German newspaper corpus• STTS tagset

16

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

1. PIAT (merged with PIDAT): Indefinite determiner, as in ‘viele solche Bemerkungen’

(‘many such remarks’)

17

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

2. NA: Adjectives used as nouns, as in ‘der Gesandte’ (‘the ambassador’)

18

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

3. PAVREL: Pronominal adverb used as relative, as in ‘die Puppe, damit sie spielt’ (‘the doll with which she plays’)4. PTKREL: Indeclinable relative particle, as in‘die Fälle, so aus Schwachheit entstehen’ (‘the cases which arise from weakness’)

19

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

5. PWAVREL: Interrogative adverb used as relative, as

in ‘der Zaun, worüber sie springt’(‘the fence over which she jumps’)6. PWREL: Interrogative pronoun used as relative,

as in ‘etwas, was er sieht’ (‘something which he sees’)

20

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

POS-tagging in GerManC-GS

• New categories account for 2% of all tokens

• IAA on POS-tagging task: 91.6%

21

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– hastu: hast du

(‘have you’)- wirstu: wirst du

(‘will you’)

22

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

23

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

• Multi-word tokens:– obgleich vs. ob gleich

(‘even though’)

24

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

• Multi-word tokens:– obgleich/KOUS vs. ob/KOUS gleich/ADV

(‘even though’)

25

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

• Spelling not standardised:– Comet Komet– auff auf– nachdeme nachdem– ko�mpt kommt– Bothenbrodt Botenbrot– differiret differiert– beßer besser– kehme käme– trucken trockenen– gepressett gepreßt– büxen Büchsen

26

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

• All spelling variants in GerManC-GS normalised to a modern standard Assess what effect spelling variation has on the performance of automatic tools Help improve automated processing?

• Important for:–Automatic tools (POS tagger!)–Accurate corpus search

27

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

Proportion of normalised word tokens plotted against time28

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Questions

• What is the “off-the-shelf” performance of the TreeTagger on historical data from the EMG period?

• Can the results be improved by running the tool on normalised data?

29

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Results

Original data Normalised data

Accuracy

69.6% 79.7%

30

TreeTagger accuracy on original vs. normalised input

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Improvement through normalisation over time

31

Tagger performance plotted against publication date

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Effects of spelling normalisation on POS tagger performance

32

For normalised tokens: Effect of using original (O)/normalised (N) input on tagger accuracy

+: correctly tagged; -: incorrectly tagged

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low

33

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low• But:– Modern accuracy figure: evaluation of

tagger on the text type it was developed on (newspaper text)

34

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low• But:– Modern accuracy figure: evaluation of

tagger on the text type it was developed on (newspaper text)

– IAA higher for modern German (98.6%)

35

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Conclusion

• Substantial amount of manual post-editing required

• Normalisation layer can improve results by 10%, but so far only half of all annotations have positive effect

36

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Future work

• Adapt normalisation scheme to account for more cases

• Automate normalisation (Jurish, 2010)• Retrain state-of-the-art POS taggers Evaluation?• Provide detailed information about

annotation quality to research community

37

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

38

Thank you!

Martin.Durrell@manchester.ac.uk

Paul.Bennett@manchester.ac.ukSilke.Scheible@manchester.ac.ukRichard.Whitt@manchester.ac.uk

http://tinyurl.com/germanc

Recommended