Upload
realworld90
View
43
Download
1
Tags:
Embed Size (px)
DESCRIPTION
POS Tagging using Mallet
Citation preview
Mallet & MaxEnt POS Tagging
Shallow Processing Techniques for NLP Ling570
November 16, 2011
Roadmap � Mallet
� Classifiers
� Testing � Resources
� HW #8
� MaxEnt POS Tagging � POS Tagging as classification
� Feature engineering � Sequence labeling
Mallet Commands � Mallet command types:
� Data preparation
� Data/model inspection � Training
� Classification
� Command line scripts � Shell scripts
� Set up java environment
� Invoke java programs
� --help lists command line parameters for scripts
Mallet Data � Mallet data instances:
� Instance_id label f1 v1 f2 v2 …..
� Stored in internal binary format: “vectors”
� Binary format used by learners, decoders
� Need to convert text files to binary format
Building & Accessing Models � bin/mallet train-classifier --input data.vector --trainer
classifiertype –input data.vector- -training-portion 0.9 --output-classifier OF � Builds classifier model
� Can also store model, produce scores, confusion matrix, etc
Building & Accessing Models � bin/mallet train-classifier --input data.vector --trainer
classifiertype --training-portion 0.9 --output-classifier OF � Builds classifier model
� Can also store model, produce scores, confusion matrix, etc
� --trainer: MaxEnt, DecisionTree, NaiveBayes, etc
Building & Accessing Models � bin/mallet train-classifier --input data.vector --trainer
classifiertype - -training-portion 0.9 --output-classifier OF � Builds classifier model
� Can also store model, produce scores, confusion matrix, etc
� --trainer: MaxEnt, DecisionTree, NaiveBayes, etc
� --report: train:accuracy, test:f1:en
Building & Accessing Models � bin/mallet train-classifier --input data.vector --trainer
classifiertype - -training-portion 0.9 --output-classifier OF � Builds classifier model
� Can also store model, produce scores, confusion matrix, etc
� --trainer: MaxEnt, DecisionTree, NaiveBayes, etc
� --report: train:accuracy, test:f1:en
� Can also use pre-split training & testing files � e.g. output of vectors2vectors
� --training-file, --testing-file
Building & Accessing Models � bin/mallet train-classifier --input data.vector --trainer
classifiertype - -training-portion 0.9 --output-classifier OF � Builds classifier model
� Can also store model, produce scores, confusion matrix, etc � --trainer: MaxEnt, DecisionTree, NaiveBayes, etc � --report: train:accuracy, test:f1:en
� Confusion Matrix, row=true, column=predicted accuracy=1.0 � label 0 1 |total � 0 de 1 . |1 � 1 en . 1 |1 � Summary. train accuracy mean = 1.0 stddev = 0 stderr = 0 � Summary. test accuracy mean = 1.0 stddev = 0 stderr = 0
Accessing Classifiers � classifier2info --classifier maxent.model
� Prints out contents of model file
Accessing Classifiers � classifier2info --classifier maxent.model
� Prints out contents of model file
� FEATURES FOR CLASS en
� <default> -0.036953801963395115
� book 0.004605219133228236
� the 0.24270652500835088
� i 0.004605219133228236
Testing � Use new data to test a previously built classifier
� bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model
Testing � Use new data to test a previously built classifier
� bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model � Also instance file, directories: classify-file, classify-dir
Testing � Use new data to test a previously built classifier
� bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model � Also instance file, directories: classify-file, classify-dir � Prints class,score matrix
Testing � Use new data to test a previously built classifier
� bin/mallet classify-svmlight --input testfile --output outputfile --classifier maxent.model � Also instance file, directories: classify-file, classify-dir � Prints class,score matrix
� Inst_id class1 score1 class2 score2 � array:0 en 0.995 de 0.0046 � array:1 en 0.970 de 0.0294 � array:2 en 0.064 de 0.935 � array:3 en 0.094 de 0.905
General Use � bin/mallet import-svmlight --input
svmltrain.vectors.txt --output svmltrain.vectors � Builds binary representation from feature:value pairs
General Use � bin/mallet import-svmlight --input
svmltrain.vectors.txt --output svmltrain.vectors � Builds binary representation from feature:value pairs
� bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model � Trains MaxEnt classifier and stores model
General Use � bin/mallet import-svmlight --input
svmltrain.vectors.txt --output svmltrain.vectors � Builds binary representation from feature:value pairs
� bin/mallet train-classifier --input svmltrain.vectors –trainer MaxEnt --output-classifier svml.model � Trains MaxEnt classifier and stores model
� bin/mallet classify-svmlight --input svmltest.vectors.txt --output - --classifier svml.model � Tests on the new data
Other Information � Website:
� Download and documentation (such as it is)
� http://mallet.cs.umass.edu
Other Information � Website:
� Download and documentation (such as it is)
� http://mallet.cs.umass.edu
� API tutorial: � http://mallet.cs.umass.edu/mallet-tutorial.pdf
Other Information � Website:
� Download and documentation (such as it is) � http://mallet.cs.umass.edu
� API tutorial: � http://mallet.cs.umass.edu/mallet-tutorial.pdf
� Local guide (refers to older version 0.4) � http://courses.washington.edu/ling572/winter07/
homework/mallet_guide.pdf
HW #8
Goals � Get experience with Mallet
� Import data
� Build and evaluate classifiers
Goals � Get experience with Mallet
� Import data
� Build and evaluate classifiers
� Build your own text classification systems w/Mallet � 20 Newsgroups data
� Build your own feature extractor � Train and test classifiers
Text Classification � Q1: Build representations of 20 Newsgroups data
� Use mallet built-in functions
� text2vectors --input dropbox…/20_newsgroups/* --skip-headers --output news3.vectors
� Q2: Do the same thing but build your own featues
Feature Creation � Skip headers
� Read data only from first blank line
� Simple Tokenization: � Convert a non-alphabetic chars ([a-zA-Z]) to white
space � Convert everything to lowercase � Split tokens on white space
� Feature values � Frequencies of tokens in documents
Example Xref: cantaloupe.srv.cs.cmu.edu misc.headlines:41568 talk.politics.guns:53293
…
Lines: 38
[email protected] wrote:
: In article <[email protected]>, [email protected] (Steve Manes) writes:
Due to F. Xia
Tokenized Example [email protected] wrote:
:In article<[email protected]>, [email protected](SteveManes) writes:
hambidge bms com wrote
In article c psog c magpie linknet com manes magpie linknet com stevemanes writes
Due to F. Xia
Example Feature Vector � guns a:11 about:2 absurd:1 again:1 an:1 and:5
any:2 approaching:1 are:5 argument:1 article:1 as:5 associates:1 at:1 average:2 bait:1 ….
Due to F. Xia
MaxEnt POS Tagging
N-gram POS tagging
argmaxt1n P(t1
n |w1n ) = P(wi | ti )P(ti
i! | ti"1i"n+1)
Bigram Model: P(wi | ti )P(ti
i! | ti"1)
Trigram Model: P(wi | ti )P(ti
i! | ti"1, ti"2 )
MaxEnt POS Tagging � POS tagging as classification
� What are the inputs?
MaxEnt POS Tagging � POS tagging as classification
� What are the inputs? � What units are classified?
MaxEnt POS Tagging � POS tagging as classification
� What are the inputs? � What units are classified?
� Words
� What are the classes?
MaxEnt POS Tagging � POS tagging as classification
� What are the inputs? � What units are classified?
� Words
� What are the classes? � POS tags
MaxEnt POS Tagging � POS tagging as classification
� What are the inputs? � What units are classified?
� Words
� What are the classes? � POS tags
� What information should we use?
MaxEnt POS Tagging � POS tagging as classification
� What are the inputs? � What units are classified?
� Words
� What are the classes? � POS tags
� What information should we use? � Consider the ngram model
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS?
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS? � Current word: w0
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS? � Current word: w0
� Previous two tags: t-2t-1
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS? � Current word: w0
� Previous two tags: t-2t-1
� What other feature templates could be useful?
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS? � Current word: w0
� Previous two tags: t-2t-1
� What other feature templates could be useful? � More word context
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS? � Current word: w0
� Previous two tags: t-2t-1
� What other feature templates could be useful? � More word context
� Previous: w-1;Pre-pre: w-2; Next: w+1;….
� Word bigram: w-1w0
POS Feature Representation � Feature templates
� What feature templates correspond to trigram POS? � Current word: w0
� Previous two tags: t-2t-1
� What other feature templates could be useful? � More word context
� Previous: w-1;Pre-pre: w-2; Next: w+1;….
� Word bigram: w-1w0
� Backoff tag context: t-1
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
x2 (flies)
x3 (like)
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s>
x2 (flies)
Time
x3 (like)
flies
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time
x2 (flies)
Time flies
x3 (like)
flies like
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time
x2 (flies)
Time flies Time flies
x3 (like)
flies like flies like
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies
x2 (flies)
Time flies Time flies like
x3 (like)
flies like flies like an
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS
x2 (flies)
Time flies Time flies like N
x3 (like)
flies like flies like an N
Feature Templates � Time flies like an arrow
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
Feature Templates
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
In mallet:
Feature Templates
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
In mallet: N prevW=<s>:1 currw=Time:1 precurrW=<s>-Time:1 postW=flies:1 preT=BOS:1
Feature Templates
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
In mallet: N prevW=<s>:1 currw=Time:1 precurrW=<s>-Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1
Feature Templates
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
In mallet: N prevW=<s>:1 currw=Time:1 precurrW=<s>-Time:1 postW=flies:1 preT=BOS:1 N prevW=Time:1 currw=flies:1 precurrW=Time-flies:1 postW=like:1 preT=N:1 V prevW=flies:1 currw=like:1 precurrW=flies-like:1 postW=an:1 preT=N:1
MaxEnt Feature Template � Words:
� Current word: w0
� Previous word: w-1 � Word two back: w-2
� Next word: w+1 � Next next word: w+2
� Tags: � Previous tag: t-1 � Previous tag pair: t-2t-1
� How many features?
MaxEnt Feature Template � Words:
� Current word: w0
� Previous word: w-1 � Word two back: w-2
� Next word: w+1 � Next next word: w+2
� Tags: � Previous tag: t-1 � Previous tag pair: t-2t-1
� How many features? 5|V|+|T|+|T|2
Unknown Words � How can we handle unknown words?
Unknown Words � How can we handle unknown words?
� Assume rare words in training similar to unknown test
� What similarities can we exploit?
Unknown Words � How can we handle unknown words?
� Assume rare words in training similar to unknown test
� What similarities can we exploit? � Similar in link between spelling/morphology and POS
� -able: à JJ
� -tion àNN
� -ly à RB
� Case: John à NP, etc
Representing Orthographic Patterns
� How can we represent morphological patterns as features?
Representing Orthographic Patterns
� How can we represent morphological patterns as features? � Character sequences
� Which sequences?
Representing Orthographic Patterns
� How can we represent morphological patterns as features? � Character sequences
� Which sequences? Prefixes/suffixes � e.g. suffix(wi)=ing or prefix(wi)=well
Representing Orthographic Patterns
� How can we represent morphological patterns as features? � Character sequences
� Which sequences? Prefixes/suffixes � e.g. suffix(wi)=ing or prefix(wi)=well
� Specific characters or character types � Which?
Representing Orthographic Patterns
� How can we represent morphological patterns as features? � Character sequences
� Which sequences? Prefixes/suffixes � e.g. suffix(wi)=ing or prefix(wi)=well
� Specific characters or character types � Which?
� is-capitalized
� is-hyphenated
MaxEnt Feature Set
Rare Words & Features � Intuition:
� Rare words = infrequent words in training � What qualifies as “Rare”?
Rare Words & Features � Intuition:
� Rare words = infrequent words in training � What qualifies as “Rare”? 5 in paper
� Uncommon words better represented by spelling
Rare Words & Features � Intuition:
� Rare words = infrequent words in training � What qualifies as “Rare”? 5 in paper
� Uncommon words better represented by spelling � Spelling could generalize
� Specific words would be undertrained
� Intuition: � Rare features = features less than X times in training
Rare Words & Features � Intuition:
� Rare words = infrequent words in training � What qualifies as “Rare”? 5 in paper
� Uncommon words better represented by spelling � Spelling could generalize � Specific words would be undertrained
� Intuition: � Rare features = features less than X times in training � Infrequent features unlikely to be informative � Skip
Examples
� well-heeled: rare word
Examples
� well-heeled: rare word JJ prevW=about:1 prev2W=stories-about:1 nextW=communities:1 next2W=and:1 pref=w:1 pref=we:1 pref=wel:1 pref=well:1 suff=d:1 suff=ed:1 suff=led:1 suff=eled:1 is-hyphenated:1 preT=IN:1 pre2T=NNS-IN:1
Finding Features � In training, where do features come from?
� Where do features come from in testing?
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
Finding Features � In training, where do features come from?
� Where do features come from in testing? � tag features come from classification of prior word
w-1 w0 w-1w0 w+1 t-1 y
x1(Time)
<s> Time <s>Time flies BOS N
x2 (flies)
Time flies Time flies like N N
x3 (like)
flies like flies like an N V
Sequence Labeling
Sequence Labeling � Goal: Find most probable labeling of a sequence
� Many sequence labeling tasks � POS tagging
� Word segmentation � Named entity tagging � Story/spoken sentence segmentation
� Pitch accent detection � Dialog act tagging
Solving Sequence Labeling
Solving Sequence Labeling � Direct: Use a sequence labeling algorithm
� E.g. HMM, CRF, MEMM
Solving Sequence Labeling � Direct: Use a sequence labeling algorithm
� E.g. HMM, CRF, MEMM
� Via classification: Use classification algorithm � Issue: What about tag features?
Solving Sequence Labeling � Direct: Use a sequence labeling algorithm
� E.g. HMM, CRF, MEMM
� Via classification: Use classification algorithm � Issue: What about tag features?
� Features that use class labels – depend on classification
� Solutions:
Solving Sequence Labeling � Direct: Use a sequence labeling algorithm
� E.g. HMM, CRF, MEMM
� Via classification: Use classification algorithm � Issue: What about tag features?
� Features that use class labels – depend on classification
� Solutions: � Don’t use features that depend on class labels (loses info)
Solving Sequence Labeling � Direct: Use a sequence labeling algorithm
� E.g. HMM, CRF, MEMM
� Via classification: Use classification algorithm � Issue: What about tag features?
� Features that use class labels – depend on classification
� Solutions: � Don’t use features that depend on class labels (loses info)
� Use other process to generate class labels, then use
Solving Sequence Labeling � Direct: Use a sequence labeling algorithm
� E.g. HMM, CRF, MEMM
� Via classification: Use classification algorithm � Issue: What about tag features?
� Features that use class labels – depend on classification � Solutions:
� Don’t use features that depend on class labels (loses info) � Use other process to generate class labels, then use � Perform incremental classification to get labels, use labels
as features for instances later in sequence