Upload
paramdeepsingh
View
234
Download
0
Embed Size (px)
Citation preview
7/27/2019 Cs626 Lect1to4 Intro Pos
1/131
Speech, NLP and the Web
Pushpak Bhattacharyya
CSE Dept.,
IIT Bombay
Lecture 1-4: Introduction, POS
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 1
7/27/2019 Cs626 Lect1to4 Intro Pos
2/131
Basic information
Slot 4: Mon- 11.30, Tue- 8.30, Thu- 9.30AM
Venue: F.C. Kohli auditorium
TA team: Aditya, Geetanjali, Sandeep, Sagar, Naman
[email protected] [email protected]
Course notes: http://www.cse.iitb.ac.in/~pb/cs626-2014
No midsem, end sem, assignments and paper reading for newentrants, projects for others
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 2
7/27/2019 Cs626 Lect1to4 Intro Pos
3/131
NLP- a foundation: Noisy Channel Model
Sequence w is transformed into sequence t
T*=argmax(P(T|W))= argmax(P(T).P(W|T))w w
W*=argmax(P(W|T))= argmax(P(W).P(T|W))T T
W t
3
7/27/2019 Cs626 Lect1to4 Intro Pos
4/131
5 representative problems
using noisy channel modeling
Statistical Spell Checking
Automatic Speech Recognition Part of Speech Tagging: discussed in
detail in subsequent classes
Probabilistic Parsing
Statistical Machine Translation
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 4
7/27/2019 Cs626 Lect1to4 Intro Pos
5/131
Some general observationsA*= argmax [P(A|B)]
A
= argmax [P(A).P(B|A)]A
Computing and using P(A) and P(B|A), both need
(i) looking at the internal structures of A and B
(ii) making independence assumptions(iii) putting together a computation from smallerparts
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 5
7/27/2019 Cs626 Lect1to4 Intro Pos
6/131
Corpus
A collection of text called corpus, is used for collectingvarious language data
With annotation: more information, but manual laborintensive
Practice: label automatically; correct manually
The famous Brown Corpus contains 1 million tagged words.
Switchboard: very famous corpora 2400 conversations,
543 speakers, many US dialects, annotated with orthographyand phonetics
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 6
7/27/2019 Cs626 Lect1to4 Intro Pos
7/131
What is NLP
Branch of AI
2 Goals Science Goal: Understand the way
language operates
Engineering Goal: Build systems that
analyse and generate language; reduce theman machine gap
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 7
7/27/2019 Cs626 Lect1to4 Intro Pos
8/131
Perpectivising NLP: Areas of AI and
their inter-dependencies
Search
Vision
PlanningMachine
Learning
Knowledge
RepresentationLogic
Expert
SystemsRoboticsNLP
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 8
7/27/2019 Cs626 Lect1to4 Intro Pos
9/131
NLP: Two pictures
NLP
Vision Speech
Algorithm
Problem
LanguageHindi
Marathi
English
FrenchMorph
Analysis
Statistics and Probability
+
Knowledge Based
Part of SpeechTagging
Parsing
Semantics
CRF
HMM
MEMM
NLPTrinity
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 9
7/27/2019 Cs626 Lect1to4 Intro Pos
10/131
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexity
OfProcessing
NLP Architecture
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 10
7/27/2019 Cs626 Lect1to4 Intro Pos
11/131
A famous sentence (1/2)
Buffalo buffaloes Buffalo buffaloesbuffalo buffalo Buffalo buffaloes Buffalobuffaloes buffalo
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 11
7/27/2019 Cs626 Lect1to4 Intro Pos
12/131
A famous sentence (2/2)
Buffalo buffaloes Buffalo buffaloesbuffalo buffalo Buffalo buffaloes Buffalobuffaloes buffalo
Buffalo:
Animal
City
bully
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 12
7/27/2019 Cs626 Lect1to4 Intro Pos
13/131
NLP: mu ti ayere ,multidimensional
Morphology
POS tagging
Chunking
Parsing
Semantics
Discourse and Coreference
Increased
ComplexityOfProcessing
Algorithm
Problem
LanguageHindi
Marathi
English
FrenchMorph
Analysis
Part of SpeechTagging
Parsing
Semantics
CRF
HMM
MEMM
NLPTrinity
7/27/2019 Cs626 Lect1to4 Intro Pos
14/131
Multilinguality: Indian situation Major streams
Indo European
Dravidian
Sino Tibetan
Austro-Asiatic
Some languages are rankedwithin 20 in the world in termsof the populations speakingthem
Hindi and Urdu: 5th (~500milion)
Bangla: 7th (~300 million)
Marathi 14th (~70 million)
7/27/2019 Cs626 Lect1to4 Intro Pos
15/131
NLP architecture and stages of
processing- ambiguity at every stage
Phonetics and phonology
Morphology Lexical Analysis
Syntactic Analysis
Semantic Analysis Pragmatics
Discourse
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 15
7/27/2019 Cs626 Lect1to4 Intro Pos
16/131
Phonetics: processing of speechsound and associated challenges
Homophones: bank (finance) vs. bank (river bank)
Near Homophones: maatraa vs. maatra (hin)
Word Boundary
(aajaayenge) (aa jaayenge (will come) or aaj aayenge
(will come today) I got [ua]plate
His research is in human languages
Disfluency: ah, um, ahem etc.
(near homophone trouble) The king of Abu Dhabi expired and there wasnational mourning for 7 days. Some children were playing in the eveningwhen a person chided them, "Do not play; it is mourning time". Thechildren said, "No it is evening time and we will play".
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 16
7/27/2019 Cs626 Lect1to4 Intro Pos
17/131
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexity
OfProcessing
NLP Architecture
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 17
7/27/2019 Cs626 Lect1to4 Intro Pos
18/131
Morphology
Word formation rules from root words
Nouns: Plural (boy-boys); Gender marking (czar-czarina)
Verbs: Tense (stretch-stretched);Aspect (e.g. perfective sit-hadsat); Modality (e.g. request khaanaa khaaiie)
First crucial first step in NLP
Languages rich in morphology: e.g., Dravidian, Hungarian,Turkish
Languages poor in morphology: Chinese, English
Languages with rich morphology have the advantage of easierprocessing at higher stages of processing
A task of interest to computer science: Finite State Machines forWord Morphology
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 18
7/27/2019 Cs626 Lect1to4 Intro Pos
19/131
Lexical Analysis
Dictionary and word properties
dog
noun (lexical property)take-s-in-plural (morph property)animate (semantic property)4-legged (-do-)
carnivore (-do)
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 19
7/27/2019 Cs626 Lect1to4 Intro Pos
20/131
Lexical Disambiguation
part of Speech Disambiguation Dog as a noun (animal)
Dog as a verb (to pursue)
Sense Disambiguation Dog (as animal)
Dog (as a very detestable person)
The chair emphasised the need for adult education
Very common in day to day communications
Satellite Channel Ad: Watch what you want, when youwant (two senses of watch)
Ground breaking ceremony/research
(ToI: 14/1/14) India eradicates polio, says WHO
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 20
7/27/2019 Cs626 Lect1to4 Intro Pos
21/131
Technological developments bring in newterms, additional meanings/nuances for
existing terms Justify as injustify the right margin (word
processing context)
Xeroxed: a new verb
Digital Trace: a new expression
Communifaking: pretending to talk onmobile when you are actually not
Discomgooglation: anxiety/discomfort at
not being able to access internet Helicopter Parenting: over parenting
Obamagain, Obama care, modinomics
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 21
7/27/2019 Cs626 Lect1to4 Intro Pos
22/131
Ambiguity of Multiwords
The grandfather kicked the bucket after suffering from cancer.
This job is a piece of cake
Put the sweater on
He is the dark horse of the match
Google Translations of above sentences:
.
. . .
21 July, 2014
Pushpak Bhattacharyya: Intro,
POS 22
7/27/2019 Cs626 Lect1to4 Intro Pos
23/131
Ambiguity of Named Entities
Bengali:English: Government is restless at home. (*)Chanchal Sarkar is at home
Amsterdam airport: Baby Changing Room
Hindi: English: everyday bold world
Actually name of a Hindi newspaper in Indore
High degree of overlap between NEs and MWEs
Treat differently - transliterate do not translate
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 23
7/27/2019 Cs626 Lect1to4 Intro Pos
24/131
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexityOfProcessing
NLP Architecture
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 24
7/27/2019 Cs626 Lect1to4 Intro Pos
25/131
Structure
S
NPVP
V NP
I
like mangoes
21 July, 2014
Pushpak Bhattacharyya: Intro,POS
25
7/27/2019 Cs626 Lect1to4 Intro Pos
26/131
Structural Ambiguity
Scope1.The old men and women were taken to safe locations(old men and women) vs. ((old men) and women)2. No smoking areas will allow Hookas inside
Preposition Phrase Attachment I saw the boy with a telescope
(who has the telescope?) I saw the mountain with a telescope
(world knowledge: mountain cannot be an instrument of
seeing)
Very ubiquitous: newspaper headline 20 years later, BMCpays father 20 lakhs for causing sons death
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 26
7/27/2019 Cs626 Lect1to4 Intro Pos
27/131
Garden pathing
The only minus possibly was the needto face the audience more and more
insightful question answer
The old man the boat
The horse raced past the garden fell
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 27
7/27/2019 Cs626 Lect1to4 Intro Pos
28/131
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexityOfProcessing
NLP Architecture
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 28
7/27/2019 Cs626 Lect1to4 Intro Pos
29/131
Semantic Analysis
Representation in terms of
Predicate calculus/SemanticNets/Frames/Conceptual Dependencies andScripts
John gave a book to Mary
Give action: Agent: John, Object: Book, Recipient:Mary
Challenge: ambiguity in semantic role labeling
(Eng) Visiting aunts can be a nuisance
(Hin) aapko mujhe mithaai khilaanii padegii(ambiguous in Marathi and Bengali too; not inDravidian languages)
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 29
7/27/2019 Cs626 Lect1to4 Intro Pos
30/131
Coreference: challenge
Binding of co-referring nouns andpronouns
The monkey ate the banana, because itwas hungry
The monkey ate the banana, because it
was ripe and sweet The monkey ate the banana, because it
was lunch time
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 30
7/27/2019 Cs626 Lect1to4 Intro Pos
31/131
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexityOfProcessing
NLP Architecture
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 31
7/27/2019 Cs626 Lect1to4 Intro Pos
32/131
Pragmatics
Very hard problem Model user intention
Tourist (in a hurry, checking out of the hotel,motioning to the service boy): Boy, go upstairs
and see if my sandals are under the divan. Do notbe late. I just have 15 minutes to catch the train.
Boy (running upstairs and coming back panting):yes sir, they are there.
World knowledge
WHY INDIA NEEDS A SECOND OCTOBER (ToI,2/10/07)
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 32
7/27/2019 Cs626 Lect1to4 Intro Pos
33/131
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexityOfProcessing
NLP Architecture
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 33
7/27/2019 Cs626 Lect1to4 Intro Pos
34/131
Discourse
Processing of sequence of sentencesMother to John:
John go to school. It is open today. Should youbunk? Father will be very angry.
Ambiguity of open
bunk what?Why will the father be angry?
Complex chain of reasoning and application ofworld knowledge
Ambiguity of father
father as parentor
father as headmaster
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 34
7/27/2019 Cs626 Lect1to4 Intro Pos
35/131
Complexity of Connected Text
John was returning from schooldejected today was the math test
He couldnt control the class
Teacher shouldnt have made him
responsible
After all he is just a janitor
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 35
7/27/2019 Cs626 Lect1to4 Intro Pos
36/131
Textual Humour (1/2)
1. Teacher (angrily): did you miss the class yesterday?Student: not much
2. A man coming back to his parked car sees thesticker "Parking fine". He goes and thanks thepoliceman for appreciating his parking skill.
3. John: I got a Jaguar car for my unemployedyoungest son.Jack: That's a great exchange!
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 36
7/27/2019 Cs626 Lect1to4 Intro Pos
37/131
Textual Humour (2/2)
A teacher-student exchange
Teacher: What do you think is the capital ofEthiopia?
Student: What do you think?
Teacher (angrily): I do not think Iknow
Student: I do not think I know 21 July, 2014
Pushpak Bhattacharyya: Intro,POS 37
7/27/2019 Cs626 Lect1to4 Intro Pos
38/131
Example of Application of Noisy Channel Model:Probabilistic Speech Recognition (Isolated
Word)[8] Problem Definition : Given a sequence of speech
signals, identify the words.
2 steps :
Segmentation (Word Boundary Detection)
Identify the word
Isolated Word Recognition :
Identify W given SS (speech signal)
^
arg max ( | )W
W P W SS
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 38
7/27/2019 Cs626 Lect1to4 Intro Pos
39/131
Identifying the word
P(SS|W) = likelihood called phonological model intuitively more tractable!
P(W) = prior probability called language model
^
arg m ax ( | )
arg m ax ( ) ( | )
W
W
W P W S S
P W P S S W
# W appears in the corpus( )
# words in the corpusP W
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 39
7/27/2019 Cs626 Lect1to4 Intro Pos
40/131
Ambiguities in the context of
P(SS|W) or P(W|SS) Concerns
Sound Text ambiguity
whether v/s weather
right v/s write
bought v/s bot
Text Sound ambiguity
read (present tense) v/s read (past tense)
lead (verb) v/s lead (noun)
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 40
7/27/2019 Cs626 Lect1to4 Intro Pos
41/131
Primitives Phonemes (sound)
Syllables
ASCII bytes (machine representation)
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 41
7/27/2019 Cs626 Lect1to4 Intro Pos
42/131
Phonemes Standardized by the IPA (International
Phonetic Alphabet) convention
/t/ sound of t in tag
/d/ sound of d in dog
/D/ sound of the
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 42
7/27/2019 Cs626 Lect1to4 Intro Pos
43/131
Syllables
Advise (verb) Advice (noun)
ad viceadvise
Consists of1. Nucleus2. Onset
3. Coda
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 43
7/27/2019 Cs626 Lect1to4 Intro Pos
44/131
Pronunciation Dictionary
P(SS|W)= P(t o m ae t o |Word is tomato) = Product of arcprobabilities
t
s4
o m oae
t
aa
end
s1 s2 s3s5
s6 s7
1.0 1.0 1.0 1.01.0
1.0
0.73
0.27
Word
Pronunciation Automaton
Tomato
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 44
7/27/2019 Cs626 Lect1to4 Intro Pos
45/131
Foundational question
Generative vs. Discrimnative
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 45
7/27/2019 Cs626 Lect1to4 Intro Pos
46/131
How are two entities matched?
Entity A and Entity B:Match(A,B):
Two entities match iff their parts match Match(Parts(A), Parts(B))
Two entities match iff their properties match Match(Properties(A), Properties(B))
Heart of discriminative vs. generative scoring.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 46
7/27/2019 Cs626 Lect1to4 Intro Pos
47/131
Books, Journals, Proceedings
Main Text(s): Natural Language Understanding: James Allan
Speech and NLP: Jurafsky and Martin
Foundations of Statistical NLP: Manning and Schutze
Other References: Statistical NLP: Charniak
Journals Computational Linguistics, Natural Language Engineering, AI, AI
Magazine, IEEE SMC
Conferences ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT,
ICON, SIGIR, WWW, ICML, ECML
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 47
7/27/2019 Cs626 Lect1to4 Intro Pos
48/131
Allied Disciplines
Philosophy Semantics, Meaning of meaning, Logic(syllogism)
Linguistics Study of Syntax, Lexicon, Lexical Semantics etc.
Probability and Statistics Corpus Linguistics, Testing of Hypotheses,
System Evaluation
Cognitive Science Computational Models of Language Processing,
Language Acquisition
Psychology Behavioristic insights into Language Processing,
Psychological Models
Brain Science Language Processing Areas in Brain
Physics Information Theory, Entropy, Random Fields
Computer Sc. & Engg. Systems for NLP
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 48
7/27/2019 Cs626 Lect1to4 Intro Pos
49/131
Day wise schedule (1/4)
Day-1: Introduction: NLP as playground for rule based andstatistical techniques
Before break: Complete NLP architecture, Ambiguity, start ofPOS tagging
After Break: NLTK (open source python based framework ofcomprehensive NLP tools), POS tagging assignment
Day-2: Shallow parsing
Before break: Morph analysis and synthesis (segmentation,infection, declension, derivation etc., ), Rule based VsStatistical NLU comparison with POS tagging as case study,
Hidden Markov Model and Viterbi algorithm
After break: POS tagging assignment continued
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 49
7/27/2019 Cs626 Lect1to4 Intro Pos
50/131
Day wise schedule (2/4)
Day-3: Syntactic Parsing Before break: Parsing- classical and statistical, theory and
techniques
After break: Hands on with probabilistic parser
Day-4: Semantics
Before break: Rule based NLU: case study of semantic graphgeneration through Universal Networking Language (UNL)
After break: continue POS tagging and Parsing assignments
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 50
7/27/2019 Cs626 Lect1to4 Intro Pos
51/131
Day wise schedule (3/4)
Day-5: Lexical resources Before break: Wordnet, ConceptNet, FrameNet, VerbNet etc.
After break: Hands-on on Lexical Resources, NELL, NEIL
Day-6: Information Extraction, Text classification and basicsearch
Before break: Named Entity Recognition, Text Entailment,Lucene, Nutch etc.
After break: NER Hands-on, basic search, Open IE system
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 51
7/27/2019 Cs626 Lect1to4 Intro Pos
52/131
Day wise schedule (4/4)
Day-7: Affective NLP (cognitive and culture specific NLP) Before break: Sentiment Analysis, Pragmatics, Intent
recognition (Sarcasm, Thwarting), Eye-Tracking
After break: Machine learning techniques with sentimentanalysis as target
Day-8: Deep Learning
Before break: Word Vectors and embedding, Neural Nets,Neural language models
After break: Discussion on deep learning tool
Day-9 and 10: Projects and quiz
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 52
7/27/2019 Cs626 Lect1to4 Intro Pos
53/131
Summary
Both Linguistics and Computation needed
Linguistics is the eye, Computation the body
PhenomenonFomalizationTechniqueExperimentationEvaluationHypothesis Testing
has accorded to NLP the prestige it commands today
Natural Science like approach
Neither Theory Building nor Data Driven Pattern finding canbe ignored
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 53
7/27/2019 Cs626 Lect1to4 Intro Pos
54/131
Part of Speech Tagging
With Hidden Markov Model
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 54
7/27/2019 Cs626 Lect1to4 Intro Pos
55/131
NLP Trinity
Algorithm
Problem
LanguageHindi
Marathi
English
FrenchMorph
Analysis
Part of Speech
Tagging
Parsing
Semantics
CRF
HMM
MEMM
NLPTrinity
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 55
7/27/2019 Cs626 Lect1to4 Intro Pos
56/131
Part of Speech Tagging POS Tagging: attaches to each word in
a sentence a part of speech tag from a
given set of tags called the Tag-Set
Standard Tag-set : Penn Treebank (for
English).
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 56
7/27/2019 Cs626 Lect1to4 Intro Pos
57/131
Example
_ The_DT mechanisms_NNS that_WDT
make_VBP traditional_JJ hardware_NN
are_VBP really_RB being_VBG
obsoleted_VBN by_IN microprocessor-
based_JJ machines_NNS ,_, _ said_VBD
Mr._NNP Benton_NNP ._.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 57
7/27/2019 Cs626 Lect1to4 Intro Pos
58/131
Where does POS tagging fit in
Morphology
POS tagging
Chunking
Parsing
Semantics Extraction
Discourse and Corefernce
IncreasedComplexityOfProcessing
21 July, 2014
Pushpak Bhattacharyya: Intro,POS
58
7/27/2019 Cs626 Lect1to4 Intro Pos
59/131
Example to illustratecomplexity of POS taggng
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 59
7/27/2019 Cs626 Lect1to4 Intro Pos
60/131
POS tagging is disambiguation
That_F former_J Sri_Lanka_N skipper_N and_F ace_Jbatsman_N Aravinda_De_Silva_N is_F a_F man_N of_Ffew_J words_N was_F very_R much_R evident_J on_FWednesday_N when_F the_F legendary_J batsman_N ,_Fwho_F has_V always_R let_V his_N bat_N talk_V ,_Fstruggled_V to_F answer_V a_F barrage_N of_Fquestions_N at_F a_F function_N to_F promote_V the_Fcricket_N league_N in_F the_F city_N ._F
N (noun), V (verb), J (adjective), R(adverb) and F (other, i.e., function words).
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 60
7/27/2019 Cs626 Lect1to4 Intro Pos
61/131
POS disambiguation That_F/N/J (that can be complementizer (can be put under F),
demonstrative (can be put under J) or pronoun (can be put under N))
former_J
Sr i_N/J Lanka_N/ J (Sri Lanka together qualify the skipper)
skipper_N/V (skipper can be a verb too)
and_F ace_J/N (ace can be both J and N; Nadal served an ace)
batsman_N/J (batsman can be J as it qualifies Aravinda De Silva)
Aravinda_N De_N Silva_N is_F a_F
man_N/V (man can verb too as inman the boat)
of_F few_J
words_N/V (words can be verb too, as in he words is speechesbeautifully)
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 61
7/27/2019 Cs626 Lect1to4 Intro Pos
62/131
Behaviour of That That
That man is known by the company he keeps.(Demonstrative)
Man that is known by the company he keeps,gets a good job. (Pronoun)
That man is known by the company he keeps,is a proverb. (Complementation)
Chaotic systems: Systems where a smallperturbation in input causes a largechange in output
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 62
7/27/2019 Cs626 Lect1to4 Intro Pos
63/131
POS disambiguation was_F very_R m uch_R evident_J on_F Wednesday_N
when_F/N (when can be a relative pronoun (put under N) as in Iknow the time when he comes)
the_F legendary_J batsman_N
who_F/N
has_V always_R let_V his_N
bat_N/V
talk_V/N
struggle_V / N
answer_V/N
barrage_N/V
question_N/V
function_N/V
promote_V cricket_N league_N city_N
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 63
7/27/2019 Cs626 Lect1to4 Intro Pos
64/131
Mathematics of POS tagging
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 64
7/27/2019 Cs626 Lect1to4 Intro Pos
65/131
Argmax computation (1/2)
Best tag sequence= T*= argmax P(T|W)= argmax P(T)P(W|T) (by Bayes Theorem)
P(T) = P(t0=^ t1t2 tn+1=.)= P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0)
P(tn|tn-1tn-2t0)P(tn+1|tntn-1t0)= P(t0)P(t1|t0)P(t2|t1) P(tn|tn-1)P(tn+1|tn)
= P(ti|ti-1) Bigram AssumptionN+1
i = 0
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 65
7/27/2019 Cs626 Lect1to4 Intro Pos
66/131
Argmax computation (2/2)
P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1)
Assumption: A word is determined completely by its tag. This isinspired by speech recognition
= P(wo|to)P(w1|t1) P(wn+1|tn+1)
= P(wi|ti)
= P(wi|ti) (Lexical Probability Assumption)
n+1
i = 0
n+1
i = 1
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 66
7/27/2019 Cs626 Lect1to4 Intro Pos
67/131
Generative Model^_^ People_N Jump_V High_R ._.
^ N
V
V
N
A
N
.
LexicalProbabilities
BigramProbabilities
This model is called Generative model.Here words are observed from tags as states.This is similar to HMM.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 67
7/27/2019 Cs626 Lect1to4 Intro Pos
68/131
Typical POS tag steps
Implementation of Viterbi Unigram,
Bigram.
Five Fold Evaluation.
Per POS Accuracy.
Confusion Matrix.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 68
7/27/2019 Cs626 Lect1to4 Intro Pos
69/131
0
0.2
0.4
0.6
0.8
1
1.2
AJ0
AJ0-NN1
AJ0-VVG
AJC
AT0
AV0-AJ0
AVP-PRP
AVQ-CJS
CJS
CJS-PRP
CJT-DT0
CRD-PNI
DT0
DTQ
ITJ
NN1
NN1-NP0
NN1-VVG
NN2-VVZ
NP0-NN1
PNI
PNP
PNX
PRP
PRP-CJS
TO0
VBB
VBG
VBN
VDB
VDG
VDN
VHB
VHG
VHN
VM0
VVB-NN1
VVD-AJ0
VVG
VVG-NN1
VVN
VVN-VVD
VVZ-NN2
Series1
Per POS Accuracy for Bigram
Assumption.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 69
S h f i l C f i
7/27/2019 Cs626 Lect1to4 Intro Pos
70/131
Screen shot of typical ConfusionMatrix
AJ0AJ0-AV0
AJ0-NN1
AJ0-VVD
AJ0-VVG
AJ0-VVN AJC AJS AT0 AV0
AV0-AJ0 AVP
AJ0 2899 20 32 1 3 3 0 0 18 35 27 1
AJ0-AV0 31 18 2 0 0 0 0 0 0 1 15 0
AJ0-NN1 161 0 116 0 0 0 0 0 0 0 1 0
AJ0-VVD 7 0 0 0 0 0 0 0 0 0 0 0
AJ0-VVG 8 0 0 0 2 0 0 0 1 0 0 0
AJ0-VVN 8 0 0 3 0 2 0 0 1 0 0 0
AJC 2 0 0 0 0 0 69 0 0 11 0 0
AJS 6 0 0 0 0 0 0 38 0 2 0 0
AT0 192 0 0 0 0 0 0 0 7000 13 0 0
AV0 120 8 2 0 0 0 15 2 24 2444 29 11AV0-AJ0 10 7 0 0 0 0 0 0 0 16 33 0
AVP 24 0 0 0 0 0 0 0 1 11 0 737
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 70
7/27/2019 Cs626 Lect1to4 Intro Pos
71/131
HMM
Algorithm
Problem
LanguageHindi
Marathi
English
FrenchMorph
Analysis
Part of Speech
Tagging
Parsing
Semantics
CRF
HMM
MEMM
NLPTrinity
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 71
7/27/2019 Cs626 Lect1to4 Intro Pos
72/131
A Motivating Example
Urn 1# of Red = 30
# of Green = 50# of Blue = 20
Urn 3# of Red =60
# of Green =10# of Blue = 30
Urn 2# of Red = 10
# of Green = 40# of Blue = 50
Colored Ball choosing
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 72
7/27/2019 Cs626 Lect1to4 Intro Pos
73/131
Example (contd.)
U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
Given :
Observation : RRGGBRGR
State Sequence : ??
Not so Easily Computable.
and
R G B
U1 0.3 0.5 0.2
U2 0.1 0.4 0.5
U3 0.6 0.1 0.3
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 73
Emission probability tableTransition probability table
7/27/2019 Cs626 Lect1to4 Intro Pos
74/131
Diagrammatic representation (1/2)
U1
U2
U3
0.1
0.2
0.4
0.6
0.4
0.5
0.3
0.2
0.3
R, 0.6
G, 0.1
B, 0.3
R, 0.1
B, 0.5
G, 0.4
B, 0.2
R, 0.3 G, 0.5
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 74
7/27/2019 Cs626 Lect1to4 Intro Pos
75/131
Diagrammatic representation (2/2)
U1
U2
U3
R,0.02G,0.08B,0.10
R,0.24G,0.04B,0.12
R,0.06G,0.24B,0.30
R, 0.08G, 0.20B, 0.12
R,0.15G,0.25B,0.10
R,0.18G,0.03B,0.09
R,0.18G,0.03B,0.09
R,0.02G,0.08B,0.10
R,0.03G,0.05B,0.02
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 75
Cl i bl ith t t
7/27/2019 Cs626 Lect1to4 Intro Pos
76/131
Classic problems with respect toHMM
1.Given the observation sequence, find thepossible state sequences- Viterbi
2.Given the observation sequence, find itsprobability- forward/backward algorithm
3.Given the observation sequence find theHMM prameters.- Baum-Welch algorithm
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 76
7/27/2019 Cs626 Lect1to4 Intro Pos
77/131
Illustration of Viterbi
The start and end are important in asequence.
Subtrees get eliminated due to the Markov
Assumption.
POS Tagset
N(noun), V(verb), O(other) [simplified]
^ (start), . (end) [start & end states]
7/27/2019 Cs626 Lect1to4 Intro Pos
78/131
Illustration of Viterbi
Lexicon
people: N, V
laugh: N, V
.
.
.
Corpora for Training
^ w11_t11 w12_t12 w13_t13 .w1k_1_t1k_1 .
^ w21_t21 w22_t22 w23_t23 .w2k_2_t2k_2 .
.
.
^ wn1_tn1 wn2_tn2 wn3_tn3 .wnk_n_tnk_n .
7/27/2019 Cs626 Lect1to4 Intro Pos
79/131
Inference
^
NN
NV
.
^ N V O .
^ 0 0.6 0.2 0.2 0
N 0 0.1 0.4 0.3 0.2
V 0 0.3 0.1 0.3 0.3
O 0 0.3 0.2 0.3 0.2
. 1 0 0 0 0
This
transition
table will
change fromlanguage to
language
due to
language
divergences.
Partial sequence graph
7/27/2019 Cs626 Lect1to4 Intro Pos
80/131
Lexical Probability Table
Size of this table = # pos tags in tagset X vocabulary size
vocabulary size = # unique words in corpus
people laugh ...
^ 1 0 0 ... 0
N 0 1x10-3 1x10-5 ... ...
V 0 1x10-6 1x10-3 ... ...
O 0 0 1x10-9
... ...
. 1 0 0 0 0
7/27/2019 Cs626 Lect1to4 Intro Pos
81/131
Inference
New Sentence:^ people laugh .
p( ^ N N . | ^ people laugh .)
= (0.6 x 0.1) x (0.1 x 1 x 10-3) x (0.2 x 1 x 10-5)
^
NN
NV.
7/27/2019 Cs626 Lect1to4 Intro Pos
82/131
Computational Complexity
If we have to get the probability of each
sequence and then find maximum among
them, we would run into exponential number
of computations. If |s| = #states (tags + ^ + . )
and |o| = length of sentence ( words + ^ + . )
Then, #sequences = s|o|-2
But, a large number of partial computations
can be reused using Dynamic Programming.
7/27/2019 Cs626 Lect1to4 Intro Pos
83/131
Dynamic Programming
^
N V O
.3O2V1N .OVN5.OVN4
.OVN .OVN
people
laugh
0.6 x 1.0 =0.6
0.2
0.2
0.6 x 0.1 x10-3 = 6 x10-5
1 0.6 x 0.4x 10-3 = 2.4
-4
2 0.6 x 0.3 x10-3 = 1.8 x
-4
3 0.6 x 0.2 x10-3 = 1.2 x
-4
No need to expand N4and N5 because they
will never be a part of
the winning sequence.
7/27/2019 Cs626 Lect1to4 Intro Pos
84/131
Computational Complexity
Retain only those N / V / O nodes which ends
in the highest sequence probability.
Now, complexity reduces from |s||o| to
|s|.|o| Here, we followed the Markov assumption of
order 1.
7/27/2019 Cs626 Lect1to4 Intro Pos
85/131
Points to ponder wrt HMM andViterbi
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 85
7/27/2019 Cs626 Lect1to4 Intro Pos
86/131
Viterbi Algorithm Start with the start state.
Keep advancing sequences that are
maximum amongst all those ending inthe same state
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 86
7/27/2019 Cs626 Lect1to4 Intro Pos
87/131
Viterbi Algorithm
^
N V O
N V O N V O N V O
(0.6) (0.2) (0.2)
(0.06*10^-3)(0.24*10^-3)
(0.18*10^-3)
(0.06*10^-6)
(0.02*10^-6)
(0.06*10^-6)
(0) (0) (0)
Claim: We do not need to draw all the subtrees in the algorithm
Tree for the sentence: ^ People laugh .
People
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 87
7/27/2019 Cs626 Lect1to4 Intro Pos
88/131
Effect of shifting probability
mass Will a word be always given the same tag? No. Consider the example:
^ people the city with soldiers . (i.e.,populate) ^ quickly people the city .
In the first sentence people is most likelyto be tagged as noun, whereas in the
second, probability mass will shift andpeople will be tagged as verb, since itoccurs after an adverb.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 88
7/27/2019 Cs626 Lect1to4 Intro Pos
89/131
Tail phenomenon and
Language phenomenon Long tail Phenomenon: Probability is very low but not zero
over a large observed sequence.
Language Phenomenon: people which is predominantly tagged as Noun displays
a long tail phenomenon. laugh is predominantly tagged as Verb.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 89
7/27/2019 Cs626 Lect1to4 Intro Pos
90/131
Viterbi phenomenon (Markov
process)N1 N2
N V O N V O
(6*10^-5) (6*10^-8)
LAUGH
Next step all the probabilities will be multiplied by identical probability(lexical and transition). So children of N2 will have probability less thanthe children of N1.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 90
7/27/2019 Cs626 Lect1to4 Intro Pos
91/131
What does P(A|B) mean? P(A|B)= P(B|A)
If P(A)=P(B)
P(A|B) means??
Causality?? B causes A??
Sequentiality?? A follows B?
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 91
7/27/2019 Cs626 Lect1to4 Intro Pos
92/131
Back to the Urn Example
Here :
S = {U1, U2, U3}
V = { R,G,B}
For observation: O ={o1 on}
And State sequence
Q ={q1 qn}
is
U1 U2 U3
U1 0.1 0.4 0.5
U2 0.6 0.2 0.2
U3 0.3 0.4 0.3
R G B
U1 0.3 0.5 0.2U2 0.1 0.4 0.5
U3 0.6 0.1 0.3
A =
B=
)( 1 ii UqP
92
7/27/2019 Cs626 Lect1to4 Intro Pos
93/131
Observations and states
O1 O2 O3 O4 O5 O6 O7 O8
OBS: R R G G B R G R
State: S1 S2 S3 S4 S5 S6 S7 S8
Si = U1/U2/U3; A particular state
S: State sequence
O: Observation sequence
S* = best possible state (urn) sequence
Goal: Maximize P(S*|O) by choosing best S
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 93
7/27/2019 Cs626 Lect1to4 Intro Pos
94/131
Goal Maximize P(S|O) where S is the State
Sequence and O is the Observation
Sequence
))|((maxarg* OSPS S
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 94
False Start
7/27/2019 Cs626 Lect1to4 Intro Pos
95/131
False Start
),|()...,|().,|().|()|(
)|()|(
718213121
8181
OSSPOSSPOSSPOSPOSP
OSPOSP
By Markov Assumption (a statedepends only on the previous state)
),|()...,|().,|().|()|( 7823121 OSSPOSSPOSSPOSPOSP
O1 O2 O3 O4 O5 O6 O7 O8
OBS: R R G G B R G R
State: S1 S2 S3 S4 S5 S6 S7 S8
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 95
7/27/2019 Cs626 Lect1to4 Intro Pos
96/131
Bayes Theorem)(/)|().()|( BPABPAPBAP
P(A) -: PriorP(B|A) -: Likelihood
)|().(maxarg)|(maxarg SOPSPOSPSS
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 96
7/27/2019 Cs626 Lect1to4 Intro Pos
97/131
State Transitions Probability
)|()...|().|().|().()(
)()(
718314213121
81
SSPSSPSSPSSPSPSP
SPSP
By Markov Assumption (k=1)
)|()...|().|().|().()( 783423121 SSPSSPSSPSSPSPSP
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 97
Ob ti S
7/27/2019 Cs626 Lect1to4 Intro Pos
98/131
Observation Sequence
probability),|()...,|().,|().|()|( 81718812138112811 SOOPSOOPSOOPSOPSOP
Assumption that ball drawn depends onlyon the Urn chosen
)|()...|().|().|()|( 88332211 SOPSOPSOPSOPSOP
)|()...|().|().|(
).|()...|().|().|().()|(
)|().()|(
88332211
783423121
SOPSOPSOPSOP
SSPSSPSSPSSPSPOSP
SOPSPOSP
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 98
Grouping terms
7/27/2019 Cs626 Lect1to4 Intro Pos
99/131
Grouping terms
P(S).P(O|S)
= [P(O0|S0).P(S1|S0)].
[P(O1|S1). P(S2|S1)].[P(O2|S2). P(S3|S2)].
[P(O3|S3).P(S4|S3)].
[P(O4|S4).P(S5|S4)].
[P(O5|S5).P(S6|S5)].
[P(O6|S6).P(S7|S6)].
[P(O7|S7).P(S8|S7)].
[P(O8|S8).P(S9|S8)].
We introduce the statesS0 and S9 as initial
and final statesrespectively.
After S8 the next stateis S9 with probability
1, i.e., P(S9|S8)=1O0 is -transition
O0 O1 O2 O3 O4 O5 O6 O7 O8
Obs: R R G G B R G R
State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 99
Introducing useful notation
7/27/2019 Cs626 Lect1to4 Intro Pos
100/131
Introducing useful notation
S0 S1
S8
S7
S9
S2 S3 S4 S5 S6
O0 O1 O2 O3 O4 O5 O6 O7 O8
Obs: R R G G B R G R
State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9
RRG G B R
G
R
P(Ok|Sk).P(Sk+1|Sk)=P(SkSk+1)Ok
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 100
7/27/2019 Cs626 Lect1to4 Intro Pos
101/131
Probabilistic FSM
(a1:0.3)
(a2:0.4)
(a1:0.2)
(a2:0.3)
(a1:0.1)
(a2:0.2)
(a1:0.3)
(a2:0.2)
The question here is:
what is the most likely state sequence given the output sequence
seen
S1 S2
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 101
7/27/2019 Cs626 Lect1to4 Intro Pos
102/131
Developing the tree
Start
S1 S2
S1 S2 S1 S2
S1 S2 S1 S2
1.0 0.0
0.1 0.3 0.2 0.3
1*0.1=0.1 0.3 0.0 0.0
0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06
. .
. .
a1
a2
Choose the winning
sequence per state
per iteration
0.2 0.4 0.3 0.2
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 102
7/27/2019 Cs626 Lect1to4 Intro Pos
103/131
Tree structure contd
S1 S2
S1 S2 S1 S2
0.1 0.3 0.2 0.3
0.027 0.012..
0.09 0.06
0.09*0.1=0.009 0.018
S1
0.3
0.0081
S2
0.2
0.0054
S2
0.4
0.0048
S1
0.2
0.0024
.
a1
a2
The problem being addressed by this tree is )|(maxarg* ,2121 aaaaSPSs
a1-a2-a1-a2 is the output sequence and the model or the machine
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 103
7/27/2019 Cs626 Lect1to4 Intro Pos
104/131
Path found:(working backward)
S1 S2 S1 S2 S1
a2a1a1 a2
Problem statement: Find the best possible sequence
),|(maxarg* OSPSs
MachineorModelSeq,OutputSeq,State, OSwhere
},,,{MachineorModel 0 TASS
Start symbol State collection Alphabet
set
Transitions
T is defined as kjijki SaSP ,,)(
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 104
Tabular representation of the
7/27/2019 Cs626 Lect1to4 Intro Pos
105/131
Tabular representation of the
tree
a1 a2 a1 a2
S11.0 (1.0*0.1,0.0*0.2
)=(0.1,0.0)(0.02,0.09)
(0.009, 0.012) (0.0024,0.0081)
S20.0 (1.0*0.3,0.0*0.3
)=(0.3,0.0)(0.04,0.0
6)(0.027,0.018) (0.0048,0.005
4)
Ending state
Latest symbol
observed
Note: Every cell records the winning probability ending in that state
Final winner
The bold faced values in each cell shows the sequence probability ending in thatstate. Going backward from final winner sequence which ends in state S2 (indicatedBy the 2nd tuple), we recover the sequence.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 105
Algorithm
7/27/2019 Cs626 Lect1to4 Intro Pos
106/131
Algorithm(following James Alan, Natural Language Understanding
(2nd
edition), Benjamin Cummins (pub.), 1995
Given:1. The HMM, which means:
a. Start State: S1
b. Alphabet: A = {a1, a2, ap}c. Set of States: S = {S1, S2, Sn}
d. Transition probability
which is equal to
2. The output string a1a2aT
To find:The most likely sequence of states C1C2CT which produces thegiven output sequence, i.e., C1C2CT =
kjijk
i SaSP ,,)(
)|,( ikj SaSP
],,...,|([maxarg 21 TC
aaaCP
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 106
7/27/2019 Cs626 Lect1to4 Intro Pos
107/131
Algorithm contd
Data Structure:1. A N*T array called SEQSCORE to maintain the
winner sequence always (N=#states, T=length ofo/p sequence)
2. Another N*T array called BACKPTR to recover thepath.
Three distinct steps in the Viterbi implementation1.
Initialization2. Iteration
3. Sequence Identification
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 107
1 Initialization
7/27/2019 Cs626 Lect1to4 Intro Pos
108/131
1. Initialization
SEQSCORE(1,1)=1.0
BACKPTR(1,1)=0For(i=2 to N) do
SEQSCORE(i,1)=0.0
[expressing the fact that first state is S1]
2. Iteration
For(t=2 to T) do
For(i=1 to N) do
SEQSCORE(i,t) = Max(j=1,N)
BACKPTR(I,t) = index jthat gives the MAX above
)](*))1(,([ SiaSjPtjSEQSCORE k
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 108
7/27/2019 Cs626 Lect1to4 Intro Pos
109/131
3. Seq. Identification
C(T) = i that maximizes SEQSCORE(i,T)
For i from (T-1) to 1 do
C(i) = BACKPTR[C(i+1),(i+1)]
Optimizations possible:1. BACKPTR can be 1*T
2. SEQSCORE can be T*2
Homework:- Compare this with A*, Beam Search [Homework]Reason for this comparison:
Both of them work for finding and recovering sequence
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 109
Viterbi Algorithm for the Urn problem (first two
7/27/2019 Cs626 Lect1to4 Intro Pos
110/131
Viterbi Algorithm for the Urn problem (first twosymbols)
S0
U1 U2 U3
0.5
0.3
0.2
U1 U2 U3
0.03
0.08
0.15
U1 U2 U3 U1 U2 U3
0.06
0.02
0.020.18
0.24
0.18
0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036
R
21 July, 2014 Pushpak Bhattacharyya: Intro,POS
110
Markov process of order>1 (say 2)
7/27/2019 Cs626 Lect1to4 Intro Pos
111/131
Markov process of order 1 (say 2)
Same theory works
P(S).P(O|S)
= P(O0|S0).P(S1|S0).[P(O1|S1). P(S2|S1S0)].
[P(O2|S2). P(S3|S2S1)].
[P(O3|S3).P(S4|S3S2)].
[P(O4|S4).P(S5|S4S3)].
[P(O5|S5).P(S6|S5S4)].
[P(O6|S6).P(S7|S6S5)].
[P(O7|S7).P(S8|S7S6)].
[P(O8|S8).P(S9|S8S7)].
We introduce the statesS0 and S9 as initial
and final statesrespectively.
After S8 the next stateis S9 with probability
1, i.e., P(S9|S8S7)=1O0 is -transition
O0 O1 O2 O3 O4 O5 O6 O7 O8
Obs: R R G G B R G R
State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9
21 July, 2014
Pushpak Bhattacharyya: Intro,POS 111
7/27/2019 Cs626 Lect1to4 Intro Pos
112/131
Probability of observationsequence
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 112
Why probability of observation
7/27/2019 Cs626 Lect1to4 Intro Pos
113/131
y p ysequence?: Language modeling problem
1.P(The sun rises in the east)2.P(The sun rise in the east)
Less probable because of grammaticalmistake.
3.P(The svn rises in the east) Less probable because of lexical mistake.
4.P(The sun rises in the west) Less probable because of semantic mistake.
Probabilities computed in the context of corpora
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 113
7/27/2019 Cs626 Lect1to4 Intro Pos
114/131
Uses of language model
1.Detect well-formedness Lexical, syntactic, semantic, pragmatic,
discourse2.Language identification
Given a piece of text what language does itbelong to.Good morning - EnglishGuten morgen - German
Bon jour - French3.Automatic speech recognition4.Machine translation
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 114
How to compute
7/27/2019 Cs626 Lect1to4 Intro Pos
115/131
How to compute
P(o0o1o2o3om)ationMarginaliz)|()(
S
SOPOP
Consider the observation sequence,
13210
210
..
......
mm SSSSSS
OmOOO
Where Si s represent the state sequences.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 115
7/27/2019 Cs626 Lect1to4 Intro Pos
116/131
Computing P(o0o1o2o3om)
)]|().|()].....[|().|()[()|().....|().|(
).|()....|().|().(
)|...()...(
)|()(),(
101000
1100
112010
2101210
mmmm
mm
mm
mm
SSPSOPSSPSOPSPSOPSOPSOP
SSPSSPSSPSP
SOOOOPSSSSP
SOPSPSOP
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 116
7/27/2019 Cs626 Lect1to4 Intro Pos
117/131
Forward and BackwardProbability Calculation
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 117
Forward probability F(k i)
7/27/2019 Cs626 Lect1to4 Intro Pos
118/131
Forward probability F(k,i)
Define F(k,i)= Probability of being instate Si having seen o0o1o2ok
F(k,i)=P(o0o1o2ok , Si )
With m as the length of the observedsequence
There are N states
P(observed sequence)=P(o0o1o2..om)=p=0,N P(o0o1o2..om , Sp)
=p=0,N F(m , p)21 July, 2014
Pushpak Bhattacharyya: Intro,POS 118
Forward probability (contd )
7/27/2019 Cs626 Lect1to4 Intro Pos
119/131
Forward probability (contd.)F(k , q)
= P(o0o1o2..ok, Sq)= P(o0o1o2..ok, Sq)
= P(o0o1o2..ok-1 , ok ,Sq)
= p=0,N P(o0o1o2..ok-1 , Sp , ok ,Sq)
= p=0,N P(o0o1o2..ok-1 , Sp ).
P(ok ,Sq|o0o1o2..ok-1 , Sp)
= p=0,N F(k-1,p). P(ok ,Sq|Sp)
= p=0,N F(k-1,p). P(Sp Sq)ok
O0 O1 O2 O3 Ok Ok+1 Om-1 Om
S0 S1 S2 S3 Sp Sq Sm Sfinal
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 119
Backward probability B(k i)
7/27/2019 Cs626 Lect1to4 Intro Pos
120/131
Backward probability B(k,i)
Define B(k,i)= Probability of seeingokok+1ok+2om given that the state wasSi
B(k,i)=P(okok+1ok+2om \ Si ) With m as the length of the whole
observed sequence
P(observed sequence)=P(o0
o1
o2
..om
)
= P(o0o1o2..om| S0)
=B(0,0)21 July, 2014
Pushpak Bhattacharyya: Intro,POS 120
Backward probability (contd.)
7/27/2019 Cs626 Lect1to4 Intro Pos
121/131
p y ( )B(k , p)
= P(okok+1ok+2om \ Sp)= P(ok+1ok+2om , ok|Sp)
= q=0,N P(ok+1ok+2om , ok , Sq|Sp)
= q=0,N P(ok ,Sq|Sp)
P(ok+1ok+2om|ok ,Sq ,Sp )= q=0,N P(ok+1ok+2om|Sq ). P(ok ,
Sq|Sp)
= q=0,N B(k+1,q). P(Sp Sq)ok
O0 O1 O2 O3 Ok Ok+1 Om-1 Om
S0 S1 S2 S3 Sp Sq Sm Sfinal
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 121
How Forward Probability
7/27/2019 Cs626 Lect1to4 Intro Pos
122/131
How Forward Probability
Works Goal of Forward Probability: To find P(O)
[the probability of Observation Sequence].
E.g. ^ People laugh .
^ .
N N
V V
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 122
Translation and Lexical
7/27/2019 Cs626 Lect1to4 Intro Pos
123/131
Translation and Lexical
Probability Tables^ N V .^ 0 0.7 0.3 0
N 0 0.2 0.6 0.2
V 0 0.6 0.2 0.2
. 1 0 0 0
People Laugh
^ 1 0 0
N 0 0.8 0.2
V 0 0.1 0.9
. 1 0 0
Inefficient Computation:
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 123
)(),()( jo
i
iSS
SSPSOPOPj
Computation in various paths
7/27/2019 Cs626 Lect1to4 Intro Pos
124/131
Computation in various paths
of the Tree PeopleLaugh
Path 1: ^ N N.
P(Path1) = (1.0x0.7)x(0.8x0.2)x(0.2x0.2) People
LaughPath 2: ^ N V
.
P(Path2) = (1.0x0.7)x(0.8x0.6)x(0.9x0.2) People
LaughPath 3: ^ V N
.P(Path3) = (1.0x0.3)x(0.1x0.6)x(0.2x0.2)
PeopleLaughPath 4: ^ V V
.P(Path4) = (1.0x0.3)x(0.1x0.2)x(0.9x0.2)
^
V
N
V
N
V
N
.
.
.
.
People Laugh
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 124
7/27/2019 Cs626 Lect1to4 Intro Pos
125/131
Computations on the TrellisF = accumulated F x output pr obabili ty xtr ansiti on pr obability
= 0.7x1.0
= 0.3x1.0
= x (0.2x0.3) + x (0.6x0.1) = x (0.6x0.8) + x (0.2x0.1)
= x (0.2x0.2) + x (0.2x0.9)^ .N N
V V
People Laugh
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 125
7/27/2019 Cs626 Lect1to4 Intro Pos
126/131
Number of Multiplications
Tree
Each path has 5multiplications + 1addition.
There are 4 paths in thetree.
Therefore, total of 20multiplications and 3
additions.
Trellis , -> 1 multiplication , -> 1 multiplication = x (1 mult) + x
(1 mult)= 4 multiplications + 1
addition Similarly, for and , 4
multiplications and 1addition each.
So, total of 14multiplications and 3additions.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 126
Complexity
7/27/2019 Cs626 Lect1to4 Intro Pos
127/131
Complexity
Let |S| = #StatesAnd |O| = Observation length - |{^, .}|
Stage 1 of Trellis: |S| multiplications
Stage 2 of Trellis: |S| nodes; each node needscomputation over |S| arcs.
Each Arc = 1 multiplication
Accumulated F = 1 more multiplication
Total 2|| multiplications
Same for each stage before reading .
At final stage ( . ) -> 2|S| multiplications Therefore, total multiplications = |S| + 2|| (|O| -
1) + 2|S|
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 127
Summary : Forward Algorithm
7/27/2019 Cs626 Lect1to4 Intro Pos
128/131
Summary : Forward Algorithm
1. Accumulate F over each stage of trellis.
2. Take sum of F values multiplied by
(
).3. Complexity = |S| + 2|| (|O| - 1) + 2|S|
= 2|||O| - 2||+ 3|S|
= O(||
. |O|)i.e., linear in the length of input and quadraticin number of states.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 128
7/27/2019 Cs626 Lect1to4 Intro Pos
129/131
Exercise1. Backward Probability
a) Derive Backward Algorithm.
b)
Compute its complexity.2. Express P(O) in terms of both Forward and
Backward probability.
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 129
Possible project topics (will
7/27/2019 Cs626 Lect1to4 Intro Pos
130/131
Possible project topics (will
keep adding) Scrabble: auto-completion of words
(human vs. m/c)
Humour detection using wordnet(incongruity theory)
Multistage POS tagging
21 July, 2014Pushpak Bhattacharyya: Intro,
POS 130
Reading List
7/27/2019 Cs626 Lect1to4 Intro Pos
131/131
g
TnT(http://www.aclweb.org/anthology-new/A/A00/A00-
1031.pdf)
Brill Tagger(http://delivery.acm.org/10.1145/1080000/1075553/p112-brill.pdf?ip=182.19.16.71&acc=OPEN&CFID=129797466&CFTO
KEN=72601926&__acm__=1342975719_082233e0ca9b5d1d67a9997c03a649d1)
Hindi POS Tagger built by IIT Bombay(http://www.cse.iitb.ac.in/pb/papers/ACL-2006-Hindi-POS-Tagging.pdf)
Projection(http://www.dipanjandas.com/files/posInduction.pdf)