Upload
jeffery-waters
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
04/20/23 CPSC503 Winter 2009 1
CPSC 503Computational Linguistics
Lecture 7Giuseppe Carenini
04/20/23 CPSC503 Winter 2009 2
Knowledge-Formalisms Map
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse and
Dialogue
Semantics
AI planners
Markov Models
Markov Chains -> n-grams
Hidden Markov Models (HMM)
MaxEntropy Markov Models (MEMM)
04/20/23 CPSC503 Winter 2009 3
Today 30/9
• Hidden Markov Models: – definition– the three key problems (only one in
detail)• Part-of-speech tagging
– What it is– Why we need it– How to do it
04/20/23 CPSC503 Winter 2007 4
HMMs (and MEMM) introThey are probabilistic sequence-classifier /
sequence-lablers: assign a class/label to each unit in a sequence
Used extensively in NLP• Part of Speech Tagging e.g Brainpower_NN ,_,
not_RB physical_JJ plant_NN ,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN ._.
• Partial parsing
[NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived].
• Named entity recognition
[John Smith PERSON] left [IBM Corp. ORG] last summer.
04/20/23 CPSC503 Winter 2007 5
Hidden Markov Model(State Emission)
.7.3
.4
.6 1
.6
.4
s1
a
b
i
Start.6
Start.4
s2
as3
s4
i
a
b
b
.5
.5
.1
.9
1
.1
.4.5
04/20/23 CPSC503 Winter 2007 6
SjiaA ij ,},{
Hidden Markov Model
Formal Specification as five-tuple
Set of States
Output Alphabet
Initial State Probabilities
State Transition Probabilities
Symbol EmissionProbabilities
BAKS ,,,,
Sii },{
},...,{ 1 NssS },...,1{},...,{ 1 MkkK M
KoSiobB tti ,)},({
11
N
j
ija
1)(1
M
t
ti ob
.7.3
.4
.6 1
.6
.4
s1
a
b
i
Start.6
Start.4
s2
as3
s4
i
a
b
b
.5
.5
.1
.9
1
.1
.4.5
04/20/23 CPSC503 Winter 2009 7
Three fundamental questions for HMMs
Decoding: Finding the probability of an observation sequence
• brute force or Forward/Backward-Algorithms
Manning/Schütze, 2000: 325
)|(compute ),,,( model aGiven OPBA
Finding the most likely state sequence
• Viterbi-Algorithm
Training: find model parameters which best explain the observations
),|(maxarg OXPX
)|(maxarg
trainingOP
04/20/23 8
Computing the probability of an observation sequence O= o1 ...
oT
)|( OP XX
XPXOPXOP )|(),|()|,(
)(),|(1
t
t
T
t
oX
bXOP
1
)|(1
11
tX
tXa
XXP
T
t
X = all sequences of T states
e.g., P(b,i | sample HMM )
X
T
t
t
t
T
t tX
tXa
Xo
XbOP
1)()|(
1
11 1
.7.3
.4
.6 1
.6
.4
s1
a
b
i
Start.6
Start.4
s2
as3
s4
i
a
b
b
.5
.5
.1
.9
1
.1
.4.5
04/20/23 CPSC503 Winter 2009 9
Decoding Example
Manning/Schütze, 2000: 327
4statesof#,2);,( NTibP
s1, s1 = 0 ?
s1, s4 = 1 * .5 * .6 * .7s2, s4 = 0?……….
s1, s2 = 1 * .1 * .6 * .3
……….
……….
……….
tXX ...1
Complexity
X
T
t
t
t
T
t tX
tXa
Xo
XbOP
1)()|(
1
11 1
.7.3
.4
.6 1
.6
.4
s1
a
b
i
Start.6
Start.4
s2
as3
s4
i
a
b
b
.5
.5
.1
.9
1
.1
.4.5
04/20/23 CPSC503 Winter 2009 10
The forward procedure
1. Initialization
Niobi ii 1),()( 11
2. Induction
NjNiobaij tj
N
i
ijtt
1,1),()()(1
1
3. Total
N
i
T iOP1
)()|( Complexity
)|,....()( 21 iXoooPi ttt
.7.3
.4
.6 1
.6
.4
s1
a
b
i
Start.6
Start.4
s2
as3
s4
i
a
b
b
.5
.5
.1
.9
1
.1
.4.5
04/20/23 CPSC503 Winter 2009 11
Three fundamental questions for HMMs
Decoding: Finding the probability of an observation sequence
• brute force or Forward or Backward Algorithm
)|(compute ),,,( model aGiven OPBA
Finding the most likely state sequence
• Viterbi-AlgorithmTraining: find model parameters
which best explain the observations
),|(maxarg OXPX
)|(maxarg
trainingOP
If interested in details of Backward algorithm and the next two questions, read (Sections 6.4 –
6.5)
04/20/23 CPSC503 Winter 2009 12
Today 30/9• Hidden Markov Models:
– definition– the three key problems (only one in
detail)• Part-of-speech tagging
– What it is, Why we need it…– Word classes (Tags)
• Distribution• Tagsets
– How to do it• Rule-based• Stochastic
04/20/23 CPSC503 Winter 2009 13
Parts of Speech Tagging: What
• Brainpower_NN ,_, not_RB physical_JJ plant_NN ,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN ._.
Tag meanings• NNP (Proper N sing), RB (Adv), JJ (Adj), NN (N
sing. or mass), VBZ (V 3sg pres), DT (Determiner), POS (Possessive ending), . (sentence-final punct)
Output
• Brainpower, not physical plant, is now a firm's chief asset.
Input
04/20/23 CPSC503 Winter 2009 14
Parts of Speech Tagging: Why?
• As a basis for (Partial) Parsing • Information Retrieval• Word-sense disambiguation• Speech synthesis
• Part-of-speech (word class, morph. class, syntactic category) gives a significant amount of info about the word and its neighborsUseful in the following NLP
tasks:
04/20/23 CPSC503 Winter 2009 15
Parts of Speech
• Eight basic categories– Noun, verb, pronoun, preposition,
adjective, adverb, article, conjunction• These categories are based on:
– morphological properties (affixes they take)
– distributional properties (what other words can occur nearby)
– e.g, green It is so… , both…, The… is• Not semantics!
04/20/23 CPSC503 Winter 2009 16
Parts of Speech• Two kinds of category
– Closed class (generally are function words)
•Prepositions, articles, conjunctions, pronouns, determiners, aux, numerals
– Open class•Nouns (proper/common; mass/count), verbs, adjectives, adverbs
Very short, frequent and important
Objects, actions, events, properties
• If you run across an unknown word….??
04/20/23 CPSC503 Winter 2009 17
PoS Distribution• Parts of speech follow a usual
behavior in Language
Words
1 PoS2 PoS
(unfortunately very frequent)>2 PoS
…but luckily different tags associated with a word are not equally likely
~35k~4k
~4k
04/20/23 CPSC503 Winter 2009 18
Sets of Parts of Speech:Tagsets• Most commonly used:
– 45-tag Penn Treebank, – 61-tag C5, – 146-tag C7
• The choice of tagset is based on the application (do you care about distinguishing between “to” as a prep and “to” as a infinitive marker?)
• Accurate tagging can be done with even large tagsets
04/20/23 CPSC503 Winter 2009 19
PoS Tagging
Dictionarywordi -> set of tags
from Tagset
• Brainpower_NN ,_, not_RB physical_JJ plant_NN ,_, is_VBZ now_RB a_DT firm_NN 's_POS chief_JJ asset_NN ._. ……….
• Brainpower, not physical plant, is now a firm's chief asset. …………
Input text
Output
Tagger
04/20/23 CPSC503 Winter 2009 20
Tagger Types• Rule-based ‘95
• Stochastic– HMM tagger ~ >= ’92– Transformation-based tagger (Brill) ~
>= ’95– MEMM (Maximum Entropy Markov Models) ~ >= ’97 (if interested sec. 6.6-6.8)
04/20/23 CPSC503 Winter 2009 21
Rule-Based (ENGTWOL ‘95)1. A lexicon transducer returns for each
word all possible morphological parses 2. A set of ~3,000 constraints is applied to
rule out inappropriate PoSStep 1: sample I/O
“Pavlov had show that salivation….”Pavlov N SG PROPERhad HAVE V PAST SVO
HAVE PCP2 SVOshown SHOW PCP2 SVOO
……that ADV
PRON DEM SG CS……..…….
Sample ConstraintExample: Adverbial “that” ruleGiven input: “that”If
(+1 A/ADV/QUANT)(+2 SENT-LIM)(NOT -1 SVOC/A)
Then eliminate non-ADV tagsElse eliminate ADV
04/20/23 CPSC503 Winter 2009 22
HMM Stochastic Tagging•Tags corresponds to an HMM states
•Words correspond to the HMM alphabet symbols Tagging: given a sequence of words (observations), find the most likely sequence of tags (states)But this is…..!We need: State transition and symbol emission probabilities
1) From hand-tagged corpus2) No tagged corpus: parameter estimation (forward/backward aka Baum-Welch)
04/20/23 CPSC503 Winter 2009 23
Evaluating Taggers
•Accuracy: percent correct (most current taggers 96-7%) *test on unseen data!*
•Human Celing: agreement rate of humans on classification (96-7%)
•Unigram baseline: assign each token to the class it occurred in most frequently in the training set (race -> NN). (91%)
•What is causing the errors? Build a confusion matrix…
04/20/23 CPSC503 Winter 2009 24
Confusion matrix
• Look at a confusion matrix
• Precision ?• Recall ?
04/20/23 CPSC503 Winter 2009 25
Error Analysis (textbook)
• Look at a confusion matrix
• See what errors are causing problems– Noun (NN) vs ProperNoun (NNP) vs Adj (JJ)– Preterite (VBD) vs Participle (VBN) vs Adjective
(JJ)
04/20/23 CPSC503 Winter 2009 26
Knowledge-Formalisms Map(next three lectures)
Logical formalisms (First-Order Logics)
Rule systems (and prob. versions)(e.g., (Prob.) Context-Free
Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners
04/20/23 CPSC503 Winter 2009 27
Next Time
• Read Chapter 12 (syntax & Context Free Grammars)