77
Named Entity Named Entity Tagging Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Embed Size (px)

Citation preview

Page 1: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Named Entity Named Entity TaggingTagging

Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Page 2: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

OutlineOutline

Named Entities and the basic ideaNamed Entities and the basic idea IOB TaggingIOB Tagging A new classifier: Logistic RegressionA new classifier: Logistic Regression

Linear regression Logistic regression Multinomial logistic regression = MaxEnt

Why classifiers aren’t as good as sequence Why classifiers aren’t as good as sequence modelsmodels

A new sequence model:A new sequence model: MEMM = Maximum Entropy Markov Model

Page 3: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Named Entity TaggingNamed Entity Tagging

Slide from Jim Martin

CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York.

Page 4: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Named Entity TaggingNamed Entity Tagging

CHICAGOCHICAGO (AP) — Citing high fuel prices, (AP) — Citing high fuel prices, United AirlinesUnited Airlines said said Friday it has increased fares by $6 per round trip on flights to Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. some cities also served by lower-cost carriers. American American AirlinesAirlines, a unit , a unit AMRAMR, immediately matched the move, , immediately matched the move, spokesman spokesman Tim WagnerTim Wagner said. said. UnitedUnited, a unit of , a unit of UALUAL, said the , said the increase took effect Thursday night and applies to most routes increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as where it competes against discount carriers, such as ChicagoChicago to to DallasDallas and and AtlantaAtlanta and and DenverDenver to to San Francisco, Los AngelesSan Francisco, Los Angeles and and New York.New York.

Slide from Jim Martin

Page 5: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Named Entity Named Entity RecognitionRecognition Find the named entities and classify them by typeFind the named entities and classify them by type Typical approachTypical approach

Acquire training data Encode using IOB labeling Train a sequential supervised classifier Augment with pre- and post-processing using available

list resources (census data, gazetteers, etc.)

Slide from Jim Martin

Page 6: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Temporal and Numerical Temporal and Numerical ExpressionsExpressions TemporalsTemporals

Find all the temporal expressions Normalize them based on some reference point

Numerical ExpressionsNumerical Expressions Find all the expressions Classify by type Normalize

Slide from Jim Martin

Page 7: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

NE TypesNE Types

Slide from Jim Martin

Page 8: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

NE Types: ExamplesNE Types: Examples

Slide from Jim Martin

Page 9: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

AmbiguityAmbiguity

Slide from Jim Martin

Page 10: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Biomedical EntitiesBiomedical Entities

DiseaseDisease SymptomSymptom DrugDrug Body PartBody Part TreatmentTreatment EnzimeEnzime ProteinProtein Difficulty: discontiguous or overlapping mentionsDifficulty: discontiguous or overlapping mentions

Abdomen is soft, nontender, nondistended, negative bruits

Page 11: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

NER ApproachesNER Approaches

As with partial parsing and chunking there are As with partial parsing and chunking there are two basic approaches (and hybrids)two basic approaches (and hybrids) Rule-based (regular expressions)

• Lists of names• Patterns to match things that look like names• Patterns to match the environments that

classes of names tend to occur in. ML-based approaches

• Get annotated training data• Extract features• Train systems to replicate the annotation

Slide from Jim Martin

Page 12: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

ML ApproachML Approach

Slide from Jim Martin

Page 13: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Encoding for Sequence Encoding for Sequence LabelingLabeling We can use IOB encoding:We can use IOB encoding:

……United AirlinesUnited Airlines said Friday it has increased said Friday it has increasedB_ORG I_ORG O O O O O

the move , spokesman Tim Wagner said.

O O O O B_PER I_PER O

How many tags?How many tags? For N classes we have 2*N+1 tags

• An I and B for each class and one O for no-class

Each token in a text gets a tagEach token in a text gets a tag Can use simpler IO tagging if what?Can use simpler IO tagging if what?

Page 14: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

NER FeaturesNER Features

Slide from Jim Martin

Page 15: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

How to do NE tagging?How to do NE tagging?

ClassifiersClassifiers Naïve Bayes Logistic Regression

Sequence ModelsSequence Models HMMs MEMMs CRFs

Sequence models work betterSequence models work better

Page 16: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Linear RegressionLinear Regression

Example from Freakonomics (Levitt and Example from Freakonomics (Levitt and Dubner 2005)Dubner 2005) Fantastic/cute/charming versus granite/maple

Can we predict price from # of adjs?Can we predict price from # of adjs?

Page 17: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Linear RegressionLinear Regression

Page 18: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Muliple Linear RegressionMuliple Linear Regression

Predicting values:Predicting values:

In general:In general:

Let’s pretend an extra “intercept” feature f0 with value 1

Multiple Linear RegressionMultiple Linear Regression

Page 19: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Learning in Linear Learning in Linear RegressionRegression Consider one instance Consider one instance xxjj

We’d like to choose weights to minimize the We’d like to choose weights to minimize the difference between predicted and observed difference between predicted and observed value for value for xxjj::

This is an optimization problem that turns out to This is an optimization problem that turns out to have a closed-form solutionhave a closed-form solution

Page 20: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Put the weight from the training set into matrix Put the weight from the training set into matrix XX of observations of observations ff((ii))

Put the observed values in a vector Put the observed values in a vector yyFormula that mimimizes the cost:Formula that mimimizes the cost:

W = W = ((XXTTXX))−−11XXTTyy

Page 21: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Logistic RegressionLogistic Regression

Page 22: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Logistic RegressionLogistic Regression

But in these language problems we are doing But in these language problems we are doing classificationclassification Predicting one of a small set of discrete values

Could we just use linear regression for this?Could we just use linear regression for this?

Page 23: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Logistic regressionLogistic regression Not possible: the result doesn’t fall between 0 and Not possible: the result doesn’t fall between 0 and

11

Instead of predicting prob, predict ratio of probs:Instead of predicting prob, predict ratio of probs:

but still not good: doesn’t lie between 0 and 1

So how about if we predict the log:So how about if we predict the log:

Page 24: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Logistic regressionLogistic regression Solving this for Solving this for pp((y=truey=true))

Page 25: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Logistic functionLogistic function

Page 26: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Logistic RegressionLogistic Regression How do we do classification?How do we do classification?

Or:Or:

Or back to explicit sum notation:Or back to explicit sum notation:

Page 27: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Multinomial logistic Multinomial logistic regressionregressionMultiple classes:Multiple classes:

One change: indicator functions One change: indicator functions ff((c,xc,x)) instead of real valuesinstead of real values

Page 28: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Estimating the weightEstimating the weight

Gradient Iterative ScalingGradient Iterative Scaling

Page 29: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

FeaturesFeatures

Page 30: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Summary so farSummary so far

Naïve Bayes ClassifierNaïve Bayes Classifier Logistic Regression ClassifierLogistic Regression Classifier

Sometimes called MaxEnt classifiers

Page 31: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

How do we apply How do we apply classification to classification to sequences?sequences?

Page 32: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

Slide from Ray Mooney

John saw the saw and decided to take it to the table.

classifier

NNP

Page 33: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 34: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

DT

Slide from Ray Mooney

Page 35: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NN

Slide from Ray Mooney

Page 36: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

CC

Slide from Ray Mooney

Page 37: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 38: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

TO

Slide from Ray Mooney

Page 39: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

VB

Slide from Ray Mooney

Page 40: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

PRP

Slide from Ray Mooney

Page 41: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

IN

Slide from Ray Mooney

Page 42: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

DT

Slide from Ray Mooney

Page 43: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Sequence Labeling as Sequence Labeling as ClassificationClassification Classify each token independently but use as Classify each token independently but use as

input features, information about the input features, information about the surrounding tokens (sliding window).surrounding tokens (sliding window).

John saw the saw and decided to take it to the table.

classifier

NN

Slide from Ray Mooney

Page 44: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Using Outputs as InputsUsing Outputs as Inputs

Better input features are usually the Better input features are usually the categoriescategories of the surrounding tokens, of the surrounding tokens, but these are not available yetbut these are not available yet

Can use category of either the Can use category of either the preceding or succeeding tokens by preceding or succeeding tokens by going forward or back and using going forward or back and using previous outputprevious output

Slide from Ray Mooney

Page 45: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

John saw the saw and decided to take it to the table.

classifier

NNP

Slide from Ray Mooney

Page 46: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNPJohn saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 47: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNP VBDJohn saw the saw and decided to take it to the table.

classifier

DT

Slide from Ray Mooney

Page 48: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNP VBD DTJohn saw the saw and decided to take it to the table.

classifier

NN

Slide from Ray Mooney

Page 49: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNP VBD DT NNJohn saw the saw and decided to take it to the table.

classifier

CC

Slide from Ray Mooney

Page 50: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNP VBD DT NN CCJohn saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 51: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNP VBD DT NN CC VBDJohn saw the saw and decided to take it to the table.

classifier

TO

Slide from Ray Mooney

Page 52: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Forward ClassificationForward Classification

NNP VBD DT NN CC VBD TOJohn saw the saw and decided to take it to the table.

classifier

VB

Slide from Ray Mooney

Page 53: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

DT NNJohn saw the saw and decided to take it to the table.

classifier

IN

Slide from Ray Mooney

Page 54: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

IN DT NNJohn saw the saw and decided to take it to the table.

classifier

PRP

Slide from Ray Mooney

Page 55: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

PRP IN DT NNJohn saw the saw and decided to take it to the table.

classifier

VB

Slide from Ray Mooney

Page 56: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

VB PRP IN DT NNJohn saw the saw and decided to take it to the table.

classifier

TO

Slide from Ray Mooney

Page 57: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 58: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier

CC

Slide from Ray Mooney

Page 59: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 60: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

VBD CC VBD TO VB PRP IN DT NNJohn saw the saw and decided to take it to the table.

classifier

DT

Slide from Ray Mooney

Page 61: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

DT VBD CC VBD TO VB PRP IN DT NNJohn saw the saw and decided to take it to the table.

classifier

VBD

Slide from Ray Mooney

Page 62: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Backward ClassificationBackward Classification

Disambiguating “to” in this case would be Disambiguating “to” in this case would be even easier backward.even easier backward.

VBD DT VBD CC VBD TO VB PRP IN DT NN John saw the saw and decided to take it to the table.

classifier

NNP

Slide from Ray Mooney

Page 63: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

NER as Sequence LabelingNER as Sequence Labeling

Page 64: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Why classifiers aren’t as Why classifiers aren’t as good as sequence modelsgood as sequence models

Page 65: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Problems with using Classifiers Problems with using Classifiers for Sequence Labelingfor Sequence Labeling

It’s not easy to integrate information It’s not easy to integrate information from hidden labels on both sidesfrom hidden labels on both sides

We make a hard decision on each We make a hard decision on each tokentoken We’d rather choose a global optimum The best labeling for the whole sequence Keeping each local decision as just a

probability, not a hard decision

Page 66: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Probabilistic Sequence Probabilistic Sequence ModelsModelsProbabilistic sequence models allow Probabilistic sequence models allow

integrating uncertainty over multiple, integrating uncertainty over multiple, interdependent classifications and interdependent classifications and collectively determine the most likely collectively determine the most likely global assignmentglobal assignment

Two standard modelsTwo standard models Hidden Markov Model (HMM) Conditional Random Field (CRF) Maximum Entropy Markov Model (MEMM)

is a simplified version of CRF

Page 67: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

HMMs vs. MEMMsHMMs vs. MEMMs

Slide from Jim Martin

Page 68: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

HMMs vs. MEMMsHMMs vs. MEMMs

Slide from Jim Martin

Page 69: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

HMMs vs. MEMMsHMMs vs. MEMMs

Slide from Jim Martin

Page 70: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

HMM (top) and MEMM HMM (top) and MEMM (bottom)(bottom)

Page 71: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Viterbi in MEMMsViterbi in MEMMs We condition on the observation AND the previous state:We condition on the observation AND the previous state:

HMM decoding:HMM decoding:

Which is the HMM version of:Which is the HMM version of:

MEMM decoding: MEMM decoding:

Page 72: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Decoding in MEMMsDecoding in MEMMs

Page 73: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

Evaluation MetricsEvaluation Metrics

Page 74: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

PrecisionPrecision

Precision: how many of the names we Precision: how many of the names we returned are really names?returned are really names?

Recall: how many of the names in the Recall: how many of the names in the database did we find?database did we find?

Precision Number of correct names given by system

Total number of names given by system

Recall Number of correct names given by system

Total number of actual names in the text

Page 75: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

F-measureF-measure

F-measure is a way to combine these:F-measure is a way to combine these:

More generally:More generally:

Page 76: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

F-measureF-measure

Harmonic mean is the reciprocal of Harmonic mean is the reciprocal of arthithmetic mean of reciprocals:arthithmetic mean of reciprocals:

Hence F-measure is:Hence F-measure is:

Page 77: Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides

OutlineOutline

Named Entities and the basic ideaNamed Entities and the basic idea IOB TaggingIOB Tagging A new classifier: Logistic RegressionA new classifier: Logistic Regression

Linear regression Logistic regression Multinomial logistic regression = MaxEnt

Why classifiers aren’t as good as sequence Why classifiers aren’t as good as sequence modelsmodels

A new sequence model:A new sequence model: MEMM = Maximum Entropy Markov Model