Transcript
Page 1: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’

POS-tagger on Early Modern German text

Silke Scheible, Richard Jason Whitt, Martin Durrell, and Paul Bennett

The GerManC projectSchool of Languages, Linguistics, and Cultures

University of Manchester (UK)

Page 2: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Overview

• Motivation• The GerManC corpus• POS-tagger and tagset• Challenges• Results

2

Page 3: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

3

Page 4: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

• Problems:– No specialised tagger available for EMG– Limited funds: Manual annotation not

feasible for whole corpus

4

Page 5: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

• Problems:– No specialised tagger available for EMG– Limited funds: Manual annotation not

feasible for whole corpus

• Question:– How well does an ‘off-the shelf’ tagger for

modern German perform on Early Modern German data?

5

Page 6: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

6

Page 7: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

• Idea: – Develop gold-standard subcorpus of

GerManC – Use subcorpus to test and adapt modern

NLP tools– Create historical text processing pipeline

7

Page 8: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

• Idea: – Develop gold-standard subcorpus of

GerManC – Use subcorpus to test and adapt modern NLP

tools– Create historical text processing pipeline

• Results useful for other small humanities-based projects wishing to add POS annotations to EMG data

8

Page 9: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

9

Page 10: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Purpose: Studies of development and standardisation of German language

• Texts published between 1650 and 1800

• Sample corpus (2,000 words per text)• Total corpus size: ca. 1 million words• Aims to be “representative”

10

Page 11: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Eight genres

11

Orally-oriented

Print-oriented

DramasNewspapersLettersSermons

Narrative proseHumanities textsScience & medicine textsLegal texts

Page 12: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Three periods

12

1650-1700

1700-1750

1750-1800

Page 13: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Five regions

13

North German

West Central German

East Central German

West Upper German

East Upper German

Page 14: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Three 2,000-word files per genre/period/region

• Total size: ca. 1 million words

14

Page 15: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Gold-standard subcorpus: GerManC-GS

• One 2,000-word file per genre and period from North German region 24 files

• > 50,000 tokens• Annotated by two historical linguists• Gold standard POS tags, lemmas, and

normalised word forms

15

Page 16: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

POS-tagger

• TreeTagger (Schmid, 1994)• Statistical, decision tree-based POS

tagger• Parameter file for modern German

supplied with the tagger• Trained on German newspaper corpus• STTS tagset

16

Page 17: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

1. PIAT (merged with PIDAT): Indefinite determiner, as in ‘viele solche Bemerkungen’

(‘many such remarks’)

17

Page 18: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

2. NA: Adjectives used as nouns, as in ‘der Gesandte’ (‘the ambassador’)

18

Page 19: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

3. PAVREL: Pronominal adverb used as relative, as in ‘die Puppe, damit sie spielt’ (‘the doll with which she plays’)4. PTKREL: Indeclinable relative particle, as in‘die Fälle, so aus Schwachheit entstehen’ (‘the cases which arise from weakness’)

19

Page 20: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

5. PWAVREL: Interrogative adverb used as relative, as

in ‘der Zaun, worüber sie springt’(‘the fence over which she jumps’)6. PWREL: Interrogative pronoun used as relative,

as in ‘etwas, was er sieht’ (‘something which he sees’)

20

Page 21: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

POS-tagging in GerManC-GS

• New categories account for 2% of all tokens

• IAA on POS-tagging task: 91.6%

21

Page 22: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– hastu: hast du

(‘have you’)- wirstu: wirst du

(‘will you’)

22

Page 23: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

23

Page 24: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

• Multi-word tokens:– obgleich vs. ob gleich

(‘even though’)

24

Page 25: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

• Multi-word tokens:– obgleich/KOUS vs. ob/KOUS gleich/ADV

(‘even though’)

25

Page 26: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

• Spelling not standardised:– Comet Komet– auff auf– nachdeme nachdem– ko�mpt kommt– Bothenbrodt Botenbrot– differiret differiert– beßer besser– kehme käme– trucken trockenen– gepressett gepreßt– büxen Büchsen

26

Page 27: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

• All spelling variants in GerManC-GS normalised to a modern standard Assess what effect spelling variation has on the performance of automatic tools Help improve automated processing?

• Important for:–Automatic tools (POS tagger!)–Accurate corpus search

27

Page 28: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

Proportion of normalised word tokens plotted against time28

Page 29: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Questions

• What is the “off-the-shelf” performance of the TreeTagger on historical data from the EMG period?

• Can the results be improved by running the tool on normalised data?

29

Page 30: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Results

Original data Normalised data

Accuracy

69.6% 79.7%

30

TreeTagger accuracy on original vs. normalised input

Page 31: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Improvement through normalisation over time

31

Tagger performance plotted against publication date

Page 32: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Effects of spelling normalisation on POS tagger performance

32

For normalised tokens: Effect of using original (O)/normalised (N) input on tagger accuracy

+: correctly tagged; -: incorrectly tagged

Page 33: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low

33

Page 34: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low• But:– Modern accuracy figure: evaluation of

tagger on the text type it was developed on (newspaper text)

34

Page 35: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low• But:– Modern accuracy figure: evaluation of

tagger on the text type it was developed on (newspaper text)

– IAA higher for modern German (98.6%)

35

Page 36: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Conclusion

• Substantial amount of manual post-editing required

• Normalisation layer can improve results by 10%, but so far only half of all annotations have positive effect

36

Page 37: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Future work

• Adapt normalisation scheme to account for more cases

• Automate normalisation (Jurish, 2010)• Retrain state-of-the-art POS taggers Evaluation?• Provide detailed information about

annotation quality to research community

37

Page 38: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

38

Thank you!

[email protected]

[email protected]@[email protected]

http://tinyurl.com/germanc


Recommended