68
Carnegie Mellon Christian Monson ParaMo r Finding Paradigms Across Morphology Christian Monson

Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

Embed Size (px)

Citation preview

Page 1: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

Carnegie Mellon

Christian Monson

ParaMorFinding Paradigms

Across Morphology

Christian Monson

Page 2: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

2Carnegie Mellon

Christian Monson

Turkish Morphology – Beads on a String

take passive negativepresent

progressive2nd person singular

You are not being taken

Page 3: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

3Carnegie Mellon

Christian Monson

götür ül m sunüyor

take passive negativepresent

progressive

You are not being taken

2nd person singular

Turkish Morphology – Beads on a String

Page 4: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

4Carnegie Mellon

Christian Monson

Applications of Computational Morphology

• Machine Translation– Turkish-English (Oflazer, 2007)

– Czech-English (Goldwater and McClosky, 2005)

• Speech Recognition– Finnish (Creutz, 2006)

• Information Retrieval

Page 5: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

5Carnegie Mellon

Christian Monson

Challenges of Computational Morphology

• Time Consuming for a New Language– Kemal Oflazer estimates

• 3-4 months to build basic Turkish analyzer• Plus lexicon development and maintenance

• Expertise Needed– Greenlandic

• Official language of Greenland• Agglutinative Inuit language• 50,000 speakers• Per Langaard

Page 6: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

6Carnegie Mellon

Christian Monson

The SolutionRaw Text

Unsupervised Morphology

Induction

Page 7: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

7Carnegie Mellon

Christian Monson

ParaMor – Paradigm MorphologyParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults• ParaMor

– Unsupervised morphology induction system

• Paradigm– The natural structure of morphology

Page 8: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

8Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m sunüyor

take passive negativepresent

progressive2nd person singular

Stem Voice PolarityTense &

MoodPerson & Number

götür

Page 9: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

9Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive 1st person singular

umgötür

Page 10: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

10Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive3rd person singular

umØ

götür

Page 11: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

11Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive

1st person plural

umØuz

götür

Page 12: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

12Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativepresent

progressive

umØuz

götür

Page 13: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

13Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negativefuture

umØuz

yecekgötür

Page 14: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

14Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

take passive negative

umØuz

yecekgötür

Page 15: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

15Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyor

Stem Voice PolarityTense &

MoodPerson & Number

umØuz

yecek

Page 16: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

16Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyorumØuz

yecek

Paradigms

Page 17: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

17Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyorumØuz

yecek

Paradigms

• Paradigm– Set of mutually replaceable strings

Page 18: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

18Carnegie Mellon

Christian Monson

Paradigms – The Structure of Morphology

ül m umüyorumØuz

yecek

Paradigm

• Paradigm– Set of mutually replaceable strings

Page 19: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

19Carnegie Mellon

Christian Monson

The ParaMor AlgorithmParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Identify suffix paradigms in 3 steps

Page 20: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

20Carnegie Mellon

Christian Monson

The ParaMor AlgorithmParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

Page 21: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

21Carnegie Mellon

Christian Monson

The ParaMor AlgorithmParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

2.Cluster candidates modeling the same paradigm

Page 22: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

22Carnegie Mellon

Christian Monson

The ParaMor Algorithm

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

2.Cluster candidates modeling the same paradigm

3.Filter

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 23: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

23Carnegie Mellon

Christian Monson

The ParaMor Algorithm

• Identify suffix paradigms in 3 steps1.Search for candidate paradigms

2.Cluster candidates modeling the same paradigm

3.Filter

• Segment words – Using the discovered paradigms

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 24: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

24Carnegie Mellon

Christian Monson

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

• All character boundaries are candidate morpheme boundaries

Page 25: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

25Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

autorizacionesbuscabamos

costasimportadoras

vallas…

• Begin search with the most frequent word-final string

Spanish

Page 26: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

26Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

autorizacionesbuscabamos

costasimportadoras

vallas…

Ø s5501

• Identify the most frequent mutually replaceable string– Stems that occur with one

suffix in a paradigm will likely occur with other suffixes in that paradigm Spanish

Page 27: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

27Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

• Stop adding suffixes – When the most frequent mutually

replaceable string severly decreases the stem count.

Ø s5501

Ø r s

287autorizaciones

buscabamoscostas

importadorasvallas

Page 28: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

28Carnegie Mellon

Christian Monson

s10662

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

• Move on to the next most frequent word-final string

Ø s5501

Ø r s

287

a8981

Page 29: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

29Carnegie Mellon

Christian Monson

a8981

s10662

a o2304

a o os

1410

a as o os892

Ø s5501

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 30: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

30Carnegie Mellon

Christian Monson

n6051

a8981

s10662

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 31: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

31Carnegie Mellon

Christian Monson

n6051

a8981

s10662

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

es2751

Ø es874

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 32: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

32Carnegie Mellon

Christian Monson

an1786

n6051

a8981

s10662

a an1049

a an ar

413

a an ar ó353

a ada adas ado ados an

ar aron ó149

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

es2751

Ø es874

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

Page 33: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

33Carnegie Mellon

Christian Monson

...strado

15rado167

an1786

n6051

a8981

s10662

a an1049

a an ar

413

a an ar ó353

a ada adas ado ados an

ar aron ó149

rada radas rado rados

53

rada radorados

67

rada rado89

ra rada radasrado rados ran

rar raron ró23

Ø n1874

Ø n r

509

Ø do n r354

Ø da das do dos n ndo r ron

118

a o2304

a o os

1410

a as o os892

Ø s5501

strada strado12

strada strado stró

9

strada strado strar stró

8

strada stradas strado strar stró

7

es2751

Ø es874

Ø r s

287

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Search for Candidate Paradigms

...

Page 34: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

34Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 35: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

35Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664

451 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 36: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

36Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci, aplic, apoy, celebr, consider, …

375 Covered Types

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664

451 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 37: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

37Carnegie Mellon

Christian Monson

Cluster Candidates per Paradigm

15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci, aplic, apoy, celebr, consider, …

375 Covered Types

15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci, aplic, apoy, celebr, concentr, …

330 Covered Types

15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci, apoy, confirm, consider, declar, …

345 Covered Types

16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664

451 Covered Types

17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.715

532 Covered Types

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Page 38: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

38Carnegie Mellon

Christian Monson

Filter Candidate ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• 2 types of filtering1. Remove small unclustered

candidate paradigms

2. Remove candidates modeling unlikely morpheme boundaries (Harris, 1955)

Page 39: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

39Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

Page 40: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

40Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

a ada adas ado ados an ar aron ó ...

Page 41: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

41Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

a ada adas ado ados an ar aron ó ...

administrada

Page 42: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

42Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas

administrada

a ada adas ado ados an ar aron ó ...

Page 43: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

43Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas

a as o os

administrada

Page 44: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

44Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas, administrad +as

a as o os

administrada

Old way: Separate alternative analysis

Page 45: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

45Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +adas, administrad +as

a as o os

administrada

administr +ad +as New way: Augment the current segmentation

Page 46: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

46Carnegie Mellon

Christian Monson

Segment Words Using ParadigmsParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

administradas

administr +ad +a +s

Ø s

administradaØ

administr +adas, administrad +as, administrada +s

Page 47: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

47Carnegie Mellon

Christian Monson

Morpho Challenge 2007ParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Peer operated competition – For unsupervised morphology

induction algorithms

• 4 languages– English– German– Finnish– Turkish

Page 48: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

48Carnegie Mellon

Christian Monson

ParaMor in Morpho Challenge 2007ParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

• Developed on Spanish – ParaMor’s free parameters were

frozen

Page 49: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

49Carnegie Mellon

Christian Monson

2 Methods of EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

1. LinguisticSegmentations compared to a morphologically analyzed lexicon

Analysis Answer

administradas administr +ad +a +s administrar +Adj +Fem +Pl

administrada administr +ad +a administrar +Adj +Fem

Page 50: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

50Carnegie Mellon

Christian Monson

2 Methods of EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

1. LinguisticSegmentations compared to a morphologically analyzed lexicon

Analysis Answer

administradas administr +ad +a +s administrar +Adj +Fem +Pl

administrada administr +ad +a administrar +Adj +Fem

Page 51: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

51Carnegie Mellon

Christian Monson

2 Methods of EvaluationParaMor

IdentifySearchClusterFilter

SegmentEvaluationResults

2. Task basedInformation retrieval– Short two-sentence queries– About international news topics – Binary relevance assessments – About 50 queries and 20K

relevance judgements for each language.

Page 52: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

52Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

47.2

Page 53: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

53Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

47.2

Mor

fess

or

Par

aMor

50.6

Page 54: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

54Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or47.2

50.6 50.7

Page 55: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

55Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

50.7

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

Page 56: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

56Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

Ber

nhar

d 2

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.3

Page 57: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

57Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9 53.4

Page 58: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

58Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9

53.4

Page 59: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

59Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orf.

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9

53.4

48.2 48.5

Page 60: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

60Carnegie Mellon

Christian Monson

20

40

60

English German Finnish Turkish

Linguistic Evaluation

F1

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orf.

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

Ber

nhar

d 2

Mor

fess

or

Par

aMor

Par

aMor

& M

orfe

ssor

60.8

56.352.9

53.4

48.2 48.5

24.7

52.0

Page 61: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

61Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

McN

amee

Par

.

27.0 – No Morphological Analysis

28.9

26.4

Page 62: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

62Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

McN

amee

Par

aMor 27.0 – No Morphological Analysis

28.9 29.3

Page 63: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

63Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

McN

amee

Par

aMor

Mor

fess

or B

asel

ine

Par

aMor

& M

. 30.7 – No Morphological Analysis28.9 29.3

38.3

32.1

Page 64: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

64Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

McN

amee

Par

aMor

Mor

fess

or B

asel

ine

Par

aMor

& M

. 30.7 – No Morphological Analysis28.9 29.3

38.3 38.2

Page 65: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

65Carnegie Mellon

Christian Monson

20

35

English German Finnish

IR Evaluation (TF/IDF)

Average PrecisionM

orf.

P &

M

ParaMorIdentify

SearchClusterFilter

SegmentEvaluationResults

Mor

fess

or

Par

aMor

Mor

fess

or

Par

aMor

McN

amee

Par

aMor

Mor

fess

or B

asel

ine

Par

aMor

& M

orfe

ssor

Mor

fess

or B

asel

ine

Par

aMor

& M

orfe

ssor

32.0 – No Morphological Analysis

28.9 29.3

38.8 38.2

41.2

37.2

Page 66: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

66Carnegie Mellon

Christian Monson

ParaMor: State-of-the-Art Unsupervised Morphology Induction System

• Combined system among the best in Morpho Challenge 2007

• Consistent across languages

• Better than no morphology– Task based (IR) measure

Page 67: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

67Carnegie Mellon

Christian Monson

Many Future Directions

• Improve Performance– F1 of 50-60% is state-of-the-art!

– Inflection classes– Morphophonology

• Beyond beads-on-a-string

Page 68: Carnegie Mellon Christian Monson ParaMor Finding Paradigms Across Morphology Christian Monson

68Carnegie Mellon

Christian Monson