38
Evaluating an ‘off- the-shelf’ POS-tagger on Early Modern German text Silke Scheible, Richard Jason Whitt, Martin Durrell, and Paul Bennett The GerManC project School of Languages, Linguistics, and Cultures University of Manchester (UK)

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

  • Upload
    dasan

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text. Silke Scheible , Richard Jason Whitt, Martin Durrell , and Paul Bennett. The GerManC project School of Languages, Linguistics, and Cultures University of Manchester (UK). Overview. Motivation The GerManC corpus - PowerPoint PPT Presentation

Citation preview

Page 1: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’

POS-tagger on Early Modern German text

Silke Scheible, Richard Jason Whitt, Martin Durrell, and Paul Bennett

The GerManC projectSchool of Languages, Linguistics, and Cultures

University of Manchester (UK)

Page 2: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Overview

• Motivation• The GerManC corpus• POS-tagger and tagset• Challenges• Results

2

Page 3: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

3

Page 4: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

• Problems:– No specialised tagger available for EMG– Limited funds: Manual annotation not

feasible for whole corpus

4

Page 5: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Goal: – POS-tagged version of GerManC corpus

• Problems:– No specialised tagger available for EMG– Limited funds: Manual annotation not

feasible for whole corpus

• Question:– How well does an ‘off-the shelf’ tagger for

modern German perform on Early Modern German data?

5

Page 6: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

6

Page 7: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

• Idea: – Develop gold-standard subcorpus of

GerManC – Use subcorpus to test and adapt modern

NLP tools– Create historical text processing pipeline

7

Page 8: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Motivation

• Tagger evaluation requires gold standard data

• Idea: – Develop gold-standard subcorpus of

GerManC – Use subcorpus to test and adapt modern NLP

tools– Create historical text processing pipeline

• Results useful for other small humanities-based projects wishing to add POS annotations to EMG data

8

Page 9: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

9

Page 10: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Purpose: Studies of development and standardisation of German language

• Texts published between 1650 and 1800

• Sample corpus (2,000 words per text)• Total corpus size: ca. 1 million words• Aims to be “representative”

10

Page 11: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Eight genres

11

Orally-oriented

Print-oriented

DramasNewspapersLettersSermons

Narrative proseHumanities textsScience & medicine textsLegal texts

Page 12: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Three periods

12

1650-1700

1700-1750

1750-1800

Page 13: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Five regions

13

North German

West Central German

East Central German

West Upper German

East Upper German

Page 14: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

The GerManC corpus

• Three 2,000-word files per genre/period/region

• Total size: ca. 1 million words

14

Page 15: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Gold-standard subcorpus: GerManC-GS

• One 2,000-word file per genre and period from North German region 24 files

• > 50,000 tokens• Annotated by two historical linguists• Gold standard POS tags, lemmas, and

normalised word forms

15

Page 16: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

POS-tagger

• TreeTagger (Schmid, 1994)• Statistical, decision tree-based POS

tagger• Parameter file for modern German

supplied with the tagger• Trained on German newspaper corpus• STTS tagset

16

Page 17: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

1. PIAT (merged with PIDAT): Indefinite determiner, as in ‘viele solche Bemerkungen’

(‘many such remarks’)

17

Page 18: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

2. NA: Adjectives used as nouns, as in ‘der Gesandte’ (‘the ambassador’)

18

Page 19: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

3. PAVREL: Pronominal adverb used as relative, as in ‘die Puppe, damit sie spielt’ (‘the doll with which she plays’)4. PTKREL: Indeclinable relative particle, as in‘die Fälle, so aus Schwachheit entstehen’ (‘the cases which arise from weakness’)

19

Page 20: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

STTS-EMG

5. PWAVREL: Interrogative adverb used as relative, as

in ‘der Zaun, worüber sie springt’(‘the fence over which she jumps’)6. PWREL: Interrogative pronoun used as relative,

as in ‘etwas, was er sieht’ (‘something which he sees’)

20

Page 21: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

POS-tagging in GerManC-GS

• New categories account for 2% of all tokens

• IAA on POS-tagging task: 91.6%

21

Page 22: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– hastu: hast du

(‘have you’)- wirstu: wirst du

(‘will you’)

22

Page 23: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

23

Page 24: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

• Multi-word tokens:– obgleich vs. ob gleich

(‘even though’)

24

Page 25: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Tokenisation issues

• Clitics:– has|tu: hast du

(‘have you’)- wirs|tu: wirst du

(‘will you’)

• Multi-word tokens:– obgleich/KOUS vs. ob/KOUS gleich/ADV

(‘even though’)

25

Page 26: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

• Spelling not standardised:– Comet Komet– auff auf– nachdeme nachdem– ko�mpt kommt– Bothenbrodt Botenbrot– differiret differiert– beßer besser– kehme käme– trucken trockenen– gepressett gepreßt– büxen Büchsen

26

Page 27: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

• All spelling variants in GerManC-GS normalised to a modern standard Assess what effect spelling variation has on the performance of automatic tools Help improve automated processing?

• Important for:–Automatic tools (POS tagger!)–Accurate corpus search

27

Page 28: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Challenges: Spelling variation

Proportion of normalised word tokens plotted against time28

Page 29: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Questions

• What is the “off-the-shelf” performance of the TreeTagger on historical data from the EMG period?

• Can the results be improved by running the tool on normalised data?

29

Page 30: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Results

Original data Normalised data

Accuracy

69.6% 79.7%

30

TreeTagger accuracy on original vs. normalised input

Page 31: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Improvement through normalisation over time

31

Tagger performance plotted against publication date

Page 32: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Effects of spelling normalisation on POS tagger performance

32

For normalised tokens: Effect of using original (O)/normalised (N) input on tagger accuracy

+: correctly tagged; -: incorrectly tagged

Page 33: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low

33

Page 34: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low• But:– Modern accuracy figure: evaluation of

tagger on the text type it was developed on (newspaper text)

34

Page 35: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Comparison with “modern” results

• Performance of TreeTagger on modern data: ca. 97% (Schmid, 1995)

• Current results seem low• But:– Modern accuracy figure: evaluation of

tagger on the text type it was developed on (newspaper text)

– IAA higher for modern German (98.6%)

35

Page 36: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Conclusion

• Substantial amount of manual post-editing required

• Normalisation layer can improve results by 10%, but so far only half of all annotations have positive effect

36

Page 37: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

Future work

• Adapt normalisation scheme to account for more cases

• Automate normalisation (Jurish, 2010)• Retrain state-of-the-art POS taggers Evaluation?• Provide detailed information about

annotation quality to research community

37

Page 38: Evaluating an ‘off-the-shelf’  POS-tagger on Early Modern German text

Evaluating an ‘off-the-shelf’ POS-tagger on Early Modern German text

38

Thank you!

[email protected]

[email protected]@[email protected]

http://tinyurl.com/germanc