Cs626 Lect1to4 Intro Pos

Embed Size (px)

Citation preview

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    1/131

    Speech, NLP and the Web

    Pushpak Bhattacharyya

    CSE Dept.,

    IIT Bombay

    Lecture 1-4: Introduction, POS

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 1

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    2/131

    Basic information

    Slot 4: Mon- 11.30, Tue- 8.30, Thu- 9.30AM

    Venue: F.C. Kohli auditorium

    TA team: Aditya, Geetanjali, Sandeep, Sagar, Naman

    [email protected]

    [email protected] [email protected]

    [email protected]

    [email protected]

    Course notes: http://www.cse.iitb.ac.in/~pb/cs626-2014

    No midsem, end sem, assignments and paper reading for newentrants, projects for others

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 2

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    3/131

    NLP- a foundation: Noisy Channel Model

    Sequence w is transformed into sequence t

    T*=argmax(P(T|W))= argmax(P(T).P(W|T))w w

    W*=argmax(P(W|T))= argmax(P(W).P(T|W))T T

    W t

    3

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    4/131

    5 representative problems

    using noisy channel modeling

    Statistical Spell Checking

    Automatic Speech Recognition Part of Speech Tagging: discussed in

    detail in subsequent classes

    Probabilistic Parsing

    Statistical Machine Translation

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 4

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    5/131

    Some general observationsA*= argmax [P(A|B)]

    A

    = argmax [P(A).P(B|A)]A

    Computing and using P(A) and P(B|A), both need

    (i) looking at the internal structures of A and B

    (ii) making independence assumptions(iii) putting together a computation from smallerparts

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 5

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    6/131

    Corpus

    A collection of text called corpus, is used for collectingvarious language data

    With annotation: more information, but manual laborintensive

    Practice: label automatically; correct manually

    The famous Brown Corpus contains 1 million tagged words.

    Switchboard: very famous corpora 2400 conversations,

    543 speakers, many US dialects, annotated with orthographyand phonetics

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 6

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    7/131

    What is NLP

    Branch of AI

    2 Goals Science Goal: Understand the way

    language operates

    Engineering Goal: Build systems that

    analyse and generate language; reduce theman machine gap

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 7

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    8/131

    Perpectivising NLP: Areas of AI and

    their inter-dependencies

    Search

    Vision

    PlanningMachine

    Learning

    Knowledge

    RepresentationLogic

    Expert

    SystemsRoboticsNLP

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 8

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    9/131

    NLP: Two pictures

    NLP

    Vision Speech

    Algorithm

    Problem

    LanguageHindi

    Marathi

    English

    FrenchMorph

    Analysis

    Statistics and Probability

    +

    Knowledge Based

    Part of SpeechTagging

    Parsing

    Semantics

    CRF

    HMM

    MEMM

    NLPTrinity

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 9

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    10/131

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexity

    OfProcessing

    NLP Architecture

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 10

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    11/131

    A famous sentence (1/2)

    Buffalo buffaloes Buffalo buffaloesbuffalo buffalo Buffalo buffaloes Buffalobuffaloes buffalo

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 11

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    12/131

    A famous sentence (2/2)

    Buffalo buffaloes Buffalo buffaloesbuffalo buffalo Buffalo buffaloes Buffalobuffaloes buffalo

    Buffalo:

    Animal

    City

    bully

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 12

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    13/131

    NLP: mu ti ayere ,multidimensional

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics

    Discourse and Coreference

    Increased

    ComplexityOfProcessing

    Algorithm

    Problem

    LanguageHindi

    Marathi

    English

    FrenchMorph

    Analysis

    Part of SpeechTagging

    Parsing

    Semantics

    CRF

    HMM

    MEMM

    NLPTrinity

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    14/131

    Multilinguality: Indian situation Major streams

    Indo European

    Dravidian

    Sino Tibetan

    Austro-Asiatic

    Some languages are rankedwithin 20 in the world in termsof the populations speakingthem

    Hindi and Urdu: 5th (~500milion)

    Bangla: 7th (~300 million)

    Marathi 14th (~70 million)

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    15/131

    NLP architecture and stages of

    processing- ambiguity at every stage

    Phonetics and phonology

    Morphology Lexical Analysis

    Syntactic Analysis

    Semantic Analysis Pragmatics

    Discourse

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 15

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    16/131

    Phonetics: processing of speechsound and associated challenges

    Homophones: bank (finance) vs. bank (river bank)

    Near Homophones: maatraa vs. maatra (hin)

    Word Boundary

    (aajaayenge) (aa jaayenge (will come) or aaj aayenge

    (will come today) I got [ua]plate

    His research is in human languages

    Disfluency: ah, um, ahem etc.

    (near homophone trouble) The king of Abu Dhabi expired and there wasnational mourning for 7 days. Some children were playing in the eveningwhen a person chided them, "Do not play; it is mourning time". Thechildren said, "No it is evening time and we will play".

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 16

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    17/131

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexity

    OfProcessing

    NLP Architecture

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 17

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    18/131

    Morphology

    Word formation rules from root words

    Nouns: Plural (boy-boys); Gender marking (czar-czarina)

    Verbs: Tense (stretch-stretched);Aspect (e.g. perfective sit-hadsat); Modality (e.g. request khaanaa khaaiie)

    First crucial first step in NLP

    Languages rich in morphology: e.g., Dravidian, Hungarian,Turkish

    Languages poor in morphology: Chinese, English

    Languages with rich morphology have the advantage of easierprocessing at higher stages of processing

    A task of interest to computer science: Finite State Machines forWord Morphology

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 18

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    19/131

    Lexical Analysis

    Dictionary and word properties

    dog

    noun (lexical property)take-s-in-plural (morph property)animate (semantic property)4-legged (-do-)

    carnivore (-do)

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 19

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    20/131

    Lexical Disambiguation

    part of Speech Disambiguation Dog as a noun (animal)

    Dog as a verb (to pursue)

    Sense Disambiguation Dog (as animal)

    Dog (as a very detestable person)

    The chair emphasised the need for adult education

    Very common in day to day communications

    Satellite Channel Ad: Watch what you want, when youwant (two senses of watch)

    Ground breaking ceremony/research

    (ToI: 14/1/14) India eradicates polio, says WHO

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 20

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    21/131

    Technological developments bring in newterms, additional meanings/nuances for

    existing terms Justify as injustify the right margin (word

    processing context)

    Xeroxed: a new verb

    Digital Trace: a new expression

    Communifaking: pretending to talk onmobile when you are actually not

    Discomgooglation: anxiety/discomfort at

    not being able to access internet Helicopter Parenting: over parenting

    Obamagain, Obama care, modinomics

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 21

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    22/131

    Ambiguity of Multiwords

    The grandfather kicked the bucket after suffering from cancer.

    This job is a piece of cake

    Put the sweater on

    He is the dark horse of the match

    Google Translations of above sentences:

    .

    . . .

    21 July, 2014

    Pushpak Bhattacharyya: Intro,

    POS 22

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    23/131

    Ambiguity of Named Entities

    Bengali:English: Government is restless at home. (*)Chanchal Sarkar is at home

    Amsterdam airport: Baby Changing Room

    Hindi: English: everyday bold world

    Actually name of a Hindi newspaper in Indore

    High degree of overlap between NEs and MWEs

    Treat differently - transliterate do not translate

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 23

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    24/131

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexityOfProcessing

    NLP Architecture

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 24

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    25/131

    Structure

    S

    NPVP

    V NP

    I

    like mangoes

    21 July, 2014

    Pushpak Bhattacharyya: Intro,POS

    25

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    26/131

    Structural Ambiguity

    Scope1.The old men and women were taken to safe locations(old men and women) vs. ((old men) and women)2. No smoking areas will allow Hookas inside

    Preposition Phrase Attachment I saw the boy with a telescope

    (who has the telescope?) I saw the mountain with a telescope

    (world knowledge: mountain cannot be an instrument of

    seeing)

    Very ubiquitous: newspaper headline 20 years later, BMCpays father 20 lakhs for causing sons death

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 26

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    27/131

    Garden pathing

    The only minus possibly was the needto face the audience more and more

    insightful question answer

    The old man the boat

    The horse raced past the garden fell

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 27

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    28/131

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexityOfProcessing

    NLP Architecture

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 28

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    29/131

    Semantic Analysis

    Representation in terms of

    Predicate calculus/SemanticNets/Frames/Conceptual Dependencies andScripts

    John gave a book to Mary

    Give action: Agent: John, Object: Book, Recipient:Mary

    Challenge: ambiguity in semantic role labeling

    (Eng) Visiting aunts can be a nuisance

    (Hin) aapko mujhe mithaai khilaanii padegii(ambiguous in Marathi and Bengali too; not inDravidian languages)

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 29

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    30/131

    Coreference: challenge

    Binding of co-referring nouns andpronouns

    The monkey ate the banana, because itwas hungry

    The monkey ate the banana, because it

    was ripe and sweet The monkey ate the banana, because it

    was lunch time

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 30

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    31/131

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexityOfProcessing

    NLP Architecture

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 31

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    32/131

    Pragmatics

    Very hard problem Model user intention

    Tourist (in a hurry, checking out of the hotel,motioning to the service boy): Boy, go upstairs

    and see if my sandals are under the divan. Do notbe late. I just have 15 minutes to catch the train.

    Boy (running upstairs and coming back panting):yes sir, they are there.

    World knowledge

    WHY INDIA NEEDS A SECOND OCTOBER (ToI,2/10/07)

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 32

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    33/131

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexityOfProcessing

    NLP Architecture

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 33

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    34/131

    Discourse

    Processing of sequence of sentencesMother to John:

    John go to school. It is open today. Should youbunk? Father will be very angry.

    Ambiguity of open

    bunk what?Why will the father be angry?

    Complex chain of reasoning and application ofworld knowledge

    Ambiguity of father

    father as parentor

    father as headmaster

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 34

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    35/131

    Complexity of Connected Text

    John was returning from schooldejected today was the math test

    He couldnt control the class

    Teacher shouldnt have made him

    responsible

    After all he is just a janitor

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 35

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    36/131

    Textual Humour (1/2)

    1. Teacher (angrily): did you miss the class yesterday?Student: not much

    2. A man coming back to his parked car sees thesticker "Parking fine". He goes and thanks thepoliceman for appreciating his parking skill.

    3. John: I got a Jaguar car for my unemployedyoungest son.Jack: That's a great exchange!

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 36

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    37/131

    Textual Humour (2/2)

    A teacher-student exchange

    Teacher: What do you think is the capital ofEthiopia?

    Student: What do you think?

    Teacher (angrily): I do not think Iknow

    Student: I do not think I know 21 July, 2014

    Pushpak Bhattacharyya: Intro,POS 37

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    38/131

    Example of Application of Noisy Channel Model:Probabilistic Speech Recognition (Isolated

    Word)[8] Problem Definition : Given a sequence of speech

    signals, identify the words.

    2 steps :

    Segmentation (Word Boundary Detection)

    Identify the word

    Isolated Word Recognition :

    Identify W given SS (speech signal)

    ^

    arg max ( | )W

    W P W SS

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 38

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    39/131

    Identifying the word

    P(SS|W) = likelihood called phonological model intuitively more tractable!

    P(W) = prior probability called language model

    ^

    arg m ax ( | )

    arg m ax ( ) ( | )

    W

    W

    W P W S S

    P W P S S W

    # W appears in the corpus( )

    # words in the corpusP W

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 39

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    40/131

    Ambiguities in the context of

    P(SS|W) or P(W|SS) Concerns

    Sound Text ambiguity

    whether v/s weather

    right v/s write

    bought v/s bot

    Text Sound ambiguity

    read (present tense) v/s read (past tense)

    lead (verb) v/s lead (noun)

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 40

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    41/131

    Primitives Phonemes (sound)

    Syllables

    ASCII bytes (machine representation)

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 41

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    42/131

    Phonemes Standardized by the IPA (International

    Phonetic Alphabet) convention

    /t/ sound of t in tag

    /d/ sound of d in dog

    /D/ sound of the

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 42

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    43/131

    Syllables

    Advise (verb) Advice (noun)

    ad viceadvise

    Consists of1. Nucleus2. Onset

    3. Coda

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 43

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    44/131

    Pronunciation Dictionary

    P(SS|W)= P(t o m ae t o |Word is tomato) = Product of arcprobabilities

    t

    s4

    o m oae

    t

    aa

    end

    s1 s2 s3s5

    s6 s7

    1.0 1.0 1.0 1.01.0

    1.0

    0.73

    0.27

    Word

    Pronunciation Automaton

    Tomato

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 44

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    45/131

    Foundational question

    Generative vs. Discrimnative

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 45

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    46/131

    How are two entities matched?

    Entity A and Entity B:Match(A,B):

    Two entities match iff their parts match Match(Parts(A), Parts(B))

    Two entities match iff their properties match Match(Properties(A), Properties(B))

    Heart of discriminative vs. generative scoring.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 46

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    47/131

    Books, Journals, Proceedings

    Main Text(s): Natural Language Understanding: James Allan

    Speech and NLP: Jurafsky and Martin

    Foundations of Statistical NLP: Manning and Schutze

    Other References: Statistical NLP: Charniak

    Journals Computational Linguistics, Natural Language Engineering, AI, AI

    Magazine, IEEE SMC

    Conferences ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT,

    ICON, SIGIR, WWW, ICML, ECML

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 47

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    48/131

    Allied Disciplines

    Philosophy Semantics, Meaning of meaning, Logic(syllogism)

    Linguistics Study of Syntax, Lexicon, Lexical Semantics etc.

    Probability and Statistics Corpus Linguistics, Testing of Hypotheses,

    System Evaluation

    Cognitive Science Computational Models of Language Processing,

    Language Acquisition

    Psychology Behavioristic insights into Language Processing,

    Psychological Models

    Brain Science Language Processing Areas in Brain

    Physics Information Theory, Entropy, Random Fields

    Computer Sc. & Engg. Systems for NLP

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 48

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    49/131

    Day wise schedule (1/4)

    Day-1: Introduction: NLP as playground for rule based andstatistical techniques

    Before break: Complete NLP architecture, Ambiguity, start ofPOS tagging

    After Break: NLTK (open source python based framework ofcomprehensive NLP tools), POS tagging assignment

    Day-2: Shallow parsing

    Before break: Morph analysis and synthesis (segmentation,infection, declension, derivation etc., ), Rule based VsStatistical NLU comparison with POS tagging as case study,

    Hidden Markov Model and Viterbi algorithm

    After break: POS tagging assignment continued

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 49

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    50/131

    Day wise schedule (2/4)

    Day-3: Syntactic Parsing Before break: Parsing- classical and statistical, theory and

    techniques

    After break: Hands on with probabilistic parser

    Day-4: Semantics

    Before break: Rule based NLU: case study of semantic graphgeneration through Universal Networking Language (UNL)

    After break: continue POS tagging and Parsing assignments

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 50

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    51/131

    Day wise schedule (3/4)

    Day-5: Lexical resources Before break: Wordnet, ConceptNet, FrameNet, VerbNet etc.

    After break: Hands-on on Lexical Resources, NELL, NEIL

    Day-6: Information Extraction, Text classification and basicsearch

    Before break: Named Entity Recognition, Text Entailment,Lucene, Nutch etc.

    After break: NER Hands-on, basic search, Open IE system

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 51

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    52/131

    Day wise schedule (4/4)

    Day-7: Affective NLP (cognitive and culture specific NLP) Before break: Sentiment Analysis, Pragmatics, Intent

    recognition (Sarcasm, Thwarting), Eye-Tracking

    After break: Machine learning techniques with sentimentanalysis as target

    Day-8: Deep Learning

    Before break: Word Vectors and embedding, Neural Nets,Neural language models

    After break: Discussion on deep learning tool

    Day-9 and 10: Projects and quiz

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 52

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    53/131

    Summary

    Both Linguistics and Computation needed

    Linguistics is the eye, Computation the body

    PhenomenonFomalizationTechniqueExperimentationEvaluationHypothesis Testing

    has accorded to NLP the prestige it commands today

    Natural Science like approach

    Neither Theory Building nor Data Driven Pattern finding canbe ignored

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 53

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    54/131

    Part of Speech Tagging

    With Hidden Markov Model

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 54

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    55/131

    NLP Trinity

    Algorithm

    Problem

    LanguageHindi

    Marathi

    English

    FrenchMorph

    Analysis

    Part of Speech

    Tagging

    Parsing

    Semantics

    CRF

    HMM

    MEMM

    NLPTrinity

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 55

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    56/131

    Part of Speech Tagging POS Tagging: attaches to each word in

    a sentence a part of speech tag from a

    given set of tags called the Tag-Set

    Standard Tag-set : Penn Treebank (for

    English).

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 56

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    57/131

    Example

    _ The_DT mechanisms_NNS that_WDT

    make_VBP traditional_JJ hardware_NN

    are_VBP really_RB being_VBG

    obsoleted_VBN by_IN microprocessor-

    based_JJ machines_NNS ,_, _ said_VBD

    Mr._NNP Benton_NNP ._.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 57

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    58/131

    Where does POS tagging fit in

    Morphology

    POS tagging

    Chunking

    Parsing

    Semantics Extraction

    Discourse and Corefernce

    IncreasedComplexityOfProcessing

    21 July, 2014

    Pushpak Bhattacharyya: Intro,POS

    58

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    59/131

    Example to illustratecomplexity of POS taggng

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 59

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    60/131

    POS tagging is disambiguation

    That_F former_J Sri_Lanka_N skipper_N and_F ace_Jbatsman_N Aravinda_De_Silva_N is_F a_F man_N of_Ffew_J words_N was_F very_R much_R evident_J on_FWednesday_N when_F the_F legendary_J batsman_N ,_Fwho_F has_V always_R let_V his_N bat_N talk_V ,_Fstruggled_V to_F answer_V a_F barrage_N of_Fquestions_N at_F a_F function_N to_F promote_V the_Fcricket_N league_N in_F the_F city_N ._F

    N (noun), V (verb), J (adjective), R(adverb) and F (other, i.e., function words).

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 60

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    61/131

    POS disambiguation That_F/N/J (that can be complementizer (can be put under F),

    demonstrative (can be put under J) or pronoun (can be put under N))

    former_J

    Sr i_N/J Lanka_N/ J (Sri Lanka together qualify the skipper)

    skipper_N/V (skipper can be a verb too)

    and_F ace_J/N (ace can be both J and N; Nadal served an ace)

    batsman_N/J (batsman can be J as it qualifies Aravinda De Silva)

    Aravinda_N De_N Silva_N is_F a_F

    man_N/V (man can verb too as inman the boat)

    of_F few_J

    words_N/V (words can be verb too, as in he words is speechesbeautifully)

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 61

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    62/131

    Behaviour of That That

    That man is known by the company he keeps.(Demonstrative)

    Man that is known by the company he keeps,gets a good job. (Pronoun)

    That man is known by the company he keeps,is a proverb. (Complementation)

    Chaotic systems: Systems where a smallperturbation in input causes a largechange in output

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 62

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    63/131

    POS disambiguation was_F very_R m uch_R evident_J on_F Wednesday_N

    when_F/N (when can be a relative pronoun (put under N) as in Iknow the time when he comes)

    the_F legendary_J batsman_N

    who_F/N

    has_V always_R let_V his_N

    bat_N/V

    talk_V/N

    struggle_V / N

    answer_V/N

    barrage_N/V

    question_N/V

    function_N/V

    promote_V cricket_N league_N city_N

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 63

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    64/131

    Mathematics of POS tagging

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 64

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    65/131

    Argmax computation (1/2)

    Best tag sequence= T*= argmax P(T|W)= argmax P(T)P(W|T) (by Bayes Theorem)

    P(T) = P(t0=^ t1t2 tn+1=.)= P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0)

    P(tn|tn-1tn-2t0)P(tn+1|tntn-1t0)= P(t0)P(t1|t0)P(t2|t1) P(tn|tn-1)P(tn+1|tn)

    = P(ti|ti-1) Bigram AssumptionN+1

    i = 0

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 65

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    66/131

    Argmax computation (2/2)

    P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1)

    Assumption: A word is determined completely by its tag. This isinspired by speech recognition

    = P(wo|to)P(w1|t1) P(wn+1|tn+1)

    = P(wi|ti)

    = P(wi|ti) (Lexical Probability Assumption)

    n+1

    i = 0

    n+1

    i = 1

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 66

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    67/131

    Generative Model^_^ People_N Jump_V High_R ._.

    ^ N

    V

    V

    N

    A

    N

    .

    LexicalProbabilities

    BigramProbabilities

    This model is called Generative model.Here words are observed from tags as states.This is similar to HMM.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 67

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    68/131

    Typical POS tag steps

    Implementation of Viterbi Unigram,

    Bigram.

    Five Fold Evaluation.

    Per POS Accuracy.

    Confusion Matrix.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 68

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    69/131

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    AJ0

    AJ0-NN1

    AJ0-VVG

    AJC

    AT0

    AV0-AJ0

    AVP-PRP

    AVQ-CJS

    CJS

    CJS-PRP

    CJT-DT0

    CRD-PNI

    DT0

    DTQ

    ITJ

    NN1

    NN1-NP0

    NN1-VVG

    NN2-VVZ

    NP0-NN1

    PNI

    PNP

    PNX

    PRP

    PRP-CJS

    TO0

    VBB

    VBG

    VBN

    VDB

    VDG

    VDN

    VHB

    VHG

    VHN

    VM0

    VVB-NN1

    VVD-AJ0

    VVG

    VVG-NN1

    VVN

    VVN-VVD

    VVZ-NN2

    Series1

    Per POS Accuracy for Bigram

    Assumption.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 69

    S h f i l C f i

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    70/131

    Screen shot of typical ConfusionMatrix

    AJ0AJ0-AV0

    AJ0-NN1

    AJ0-VVD

    AJ0-VVG

    AJ0-VVN AJC AJS AT0 AV0

    AV0-AJ0 AVP

    AJ0 2899 20 32 1 3 3 0 0 18 35 27 1

    AJ0-AV0 31 18 2 0 0 0 0 0 0 1 15 0

    AJ0-NN1 161 0 116 0 0 0 0 0 0 0 1 0

    AJ0-VVD 7 0 0 0 0 0 0 0 0 0 0 0

    AJ0-VVG 8 0 0 0 2 0 0 0 1 0 0 0

    AJ0-VVN 8 0 0 3 0 2 0 0 1 0 0 0

    AJC 2 0 0 0 0 0 69 0 0 11 0 0

    AJS 6 0 0 0 0 0 0 38 0 2 0 0

    AT0 192 0 0 0 0 0 0 0 7000 13 0 0

    AV0 120 8 2 0 0 0 15 2 24 2444 29 11AV0-AJ0 10 7 0 0 0 0 0 0 0 16 33 0

    AVP 24 0 0 0 0 0 0 0 1 11 0 737

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 70

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    71/131

    HMM

    Algorithm

    Problem

    LanguageHindi

    Marathi

    English

    FrenchMorph

    Analysis

    Part of Speech

    Tagging

    Parsing

    Semantics

    CRF

    HMM

    MEMM

    NLPTrinity

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 71

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    72/131

    A Motivating Example

    Urn 1# of Red = 30

    # of Green = 50# of Blue = 20

    Urn 3# of Red =60

    # of Green =10# of Blue = 30

    Urn 2# of Red = 10

    # of Green = 40# of Blue = 50

    Colored Ball choosing

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 72

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    73/131

    Example (contd.)

    U1 U2 U3

    U1 0.1 0.4 0.5

    U2 0.6 0.2 0.2

    U3 0.3 0.4 0.3

    Given :

    Observation : RRGGBRGR

    State Sequence : ??

    Not so Easily Computable.

    and

    R G B

    U1 0.3 0.5 0.2

    U2 0.1 0.4 0.5

    U3 0.6 0.1 0.3

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 73

    Emission probability tableTransition probability table

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    74/131

    Diagrammatic representation (1/2)

    U1

    U2

    U3

    0.1

    0.2

    0.4

    0.6

    0.4

    0.5

    0.3

    0.2

    0.3

    R, 0.6

    G, 0.1

    B, 0.3

    R, 0.1

    B, 0.5

    G, 0.4

    B, 0.2

    R, 0.3 G, 0.5

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 74

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    75/131

    Diagrammatic representation (2/2)

    U1

    U2

    U3

    R,0.02G,0.08B,0.10

    R,0.24G,0.04B,0.12

    R,0.06G,0.24B,0.30

    R, 0.08G, 0.20B, 0.12

    R,0.15G,0.25B,0.10

    R,0.18G,0.03B,0.09

    R,0.18G,0.03B,0.09

    R,0.02G,0.08B,0.10

    R,0.03G,0.05B,0.02

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 75

    Cl i bl ith t t

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    76/131

    Classic problems with respect toHMM

    1.Given the observation sequence, find thepossible state sequences- Viterbi

    2.Given the observation sequence, find itsprobability- forward/backward algorithm

    3.Given the observation sequence find theHMM prameters.- Baum-Welch algorithm

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 76

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    77/131

    Illustration of Viterbi

    The start and end are important in asequence.

    Subtrees get eliminated due to the Markov

    Assumption.

    POS Tagset

    N(noun), V(verb), O(other) [simplified]

    ^ (start), . (end) [start & end states]

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    78/131

    Illustration of Viterbi

    Lexicon

    people: N, V

    laugh: N, V

    .

    .

    .

    Corpora for Training

    ^ w11_t11 w12_t12 w13_t13 .w1k_1_t1k_1 .

    ^ w21_t21 w22_t22 w23_t23 .w2k_2_t2k_2 .

    .

    .

    ^ wn1_tn1 wn2_tn2 wn3_tn3 .wnk_n_tnk_n .

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    79/131

    Inference

    ^

    NN

    NV

    .

    ^ N V O .

    ^ 0 0.6 0.2 0.2 0

    N 0 0.1 0.4 0.3 0.2

    V 0 0.3 0.1 0.3 0.3

    O 0 0.3 0.2 0.3 0.2

    . 1 0 0 0 0

    This

    transition

    table will

    change fromlanguage to

    language

    due to

    language

    divergences.

    Partial sequence graph

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    80/131

    Lexical Probability Table

    Size of this table = # pos tags in tagset X vocabulary size

    vocabulary size = # unique words in corpus

    people laugh ...

    ^ 1 0 0 ... 0

    N 0 1x10-3 1x10-5 ... ...

    V 0 1x10-6 1x10-3 ... ...

    O 0 0 1x10-9

    ... ...

    . 1 0 0 0 0

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    81/131

    Inference

    New Sentence:^ people laugh .

    p( ^ N N . | ^ people laugh .)

    = (0.6 x 0.1) x (0.1 x 1 x 10-3) x (0.2 x 1 x 10-5)

    ^

    NN

    NV.

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    82/131

    Computational Complexity

    If we have to get the probability of each

    sequence and then find maximum among

    them, we would run into exponential number

    of computations. If |s| = #states (tags + ^ + . )

    and |o| = length of sentence ( words + ^ + . )

    Then, #sequences = s|o|-2

    But, a large number of partial computations

    can be reused using Dynamic Programming.

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    83/131

    Dynamic Programming

    ^

    N V O

    .3O2V1N .OVN5.OVN4

    .OVN .OVN

    people

    laugh

    0.6 x 1.0 =0.6

    0.2

    0.2

    0.6 x 0.1 x10-3 = 6 x10-5

    1 0.6 x 0.4x 10-3 = 2.4

    -4

    2 0.6 x 0.3 x10-3 = 1.8 x

    -4

    3 0.6 x 0.2 x10-3 = 1.2 x

    -4

    No need to expand N4and N5 because they

    will never be a part of

    the winning sequence.

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    84/131

    Computational Complexity

    Retain only those N / V / O nodes which ends

    in the highest sequence probability.

    Now, complexity reduces from |s||o| to

    |s|.|o| Here, we followed the Markov assumption of

    order 1.

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    85/131

    Points to ponder wrt HMM andViterbi

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 85

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    86/131

    Viterbi Algorithm Start with the start state.

    Keep advancing sequences that are

    maximum amongst all those ending inthe same state

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 86

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    87/131

    Viterbi Algorithm

    ^

    N V O

    N V O N V O N V O

    (0.6) (0.2) (0.2)

    (0.06*10^-3)(0.24*10^-3)

    (0.18*10^-3)

    (0.06*10^-6)

    (0.02*10^-6)

    (0.06*10^-6)

    (0) (0) (0)

    Claim: We do not need to draw all the subtrees in the algorithm

    Tree for the sentence: ^ People laugh .

    People

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 87

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    88/131

    Effect of shifting probability

    mass Will a word be always given the same tag? No. Consider the example:

    ^ people the city with soldiers . (i.e.,populate) ^ quickly people the city .

    In the first sentence people is most likelyto be tagged as noun, whereas in the

    second, probability mass will shift andpeople will be tagged as verb, since itoccurs after an adverb.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 88

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    89/131

    Tail phenomenon and

    Language phenomenon Long tail Phenomenon: Probability is very low but not zero

    over a large observed sequence.

    Language Phenomenon: people which is predominantly tagged as Noun displays

    a long tail phenomenon. laugh is predominantly tagged as Verb.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 89

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    90/131

    Viterbi phenomenon (Markov

    process)N1 N2

    N V O N V O

    (6*10^-5) (6*10^-8)

    LAUGH

    Next step all the probabilities will be multiplied by identical probability(lexical and transition). So children of N2 will have probability less thanthe children of N1.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 90

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    91/131

    What does P(A|B) mean? P(A|B)= P(B|A)

    If P(A)=P(B)

    P(A|B) means??

    Causality?? B causes A??

    Sequentiality?? A follows B?

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 91

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    92/131

    Back to the Urn Example

    Here :

    S = {U1, U2, U3}

    V = { R,G,B}

    For observation: O ={o1 on}

    And State sequence

    Q ={q1 qn}

    is

    U1 U2 U3

    U1 0.1 0.4 0.5

    U2 0.6 0.2 0.2

    U3 0.3 0.4 0.3

    R G B

    U1 0.3 0.5 0.2U2 0.1 0.4 0.5

    U3 0.6 0.1 0.3

    A =

    B=

    )( 1 ii UqP

    92

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    93/131

    Observations and states

    O1 O2 O3 O4 O5 O6 O7 O8

    OBS: R R G G B R G R

    State: S1 S2 S3 S4 S5 S6 S7 S8

    Si = U1/U2/U3; A particular state

    S: State sequence

    O: Observation sequence

    S* = best possible state (urn) sequence

    Goal: Maximize P(S*|O) by choosing best S

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 93

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    94/131

    Goal Maximize P(S|O) where S is the State

    Sequence and O is the Observation

    Sequence

    ))|((maxarg* OSPS S

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 94

    False Start

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    95/131

    False Start

    ),|()...,|().,|().|()|(

    )|()|(

    718213121

    8181

    OSSPOSSPOSSPOSPOSP

    OSPOSP

    By Markov Assumption (a statedepends only on the previous state)

    ),|()...,|().,|().|()|( 7823121 OSSPOSSPOSSPOSPOSP

    O1 O2 O3 O4 O5 O6 O7 O8

    OBS: R R G G B R G R

    State: S1 S2 S3 S4 S5 S6 S7 S8

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 95

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    96/131

    Bayes Theorem)(/)|().()|( BPABPAPBAP

    P(A) -: PriorP(B|A) -: Likelihood

    )|().(maxarg)|(maxarg SOPSPOSPSS

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 96

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    97/131

    State Transitions Probability

    )|()...|().|().|().()(

    )()(

    718314213121

    81

    SSPSSPSSPSSPSPSP

    SPSP

    By Markov Assumption (k=1)

    )|()...|().|().|().()( 783423121 SSPSSPSSPSSPSPSP

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 97

    Ob ti S

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    98/131

    Observation Sequence

    probability),|()...,|().,|().|()|( 81718812138112811 SOOPSOOPSOOPSOPSOP

    Assumption that ball drawn depends onlyon the Urn chosen

    )|()...|().|().|()|( 88332211 SOPSOPSOPSOPSOP

    )|()...|().|().|(

    ).|()...|().|().|().()|(

    )|().()|(

    88332211

    783423121

    SOPSOPSOPSOP

    SSPSSPSSPSSPSPOSP

    SOPSPOSP

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 98

    Grouping terms

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    99/131

    Grouping terms

    P(S).P(O|S)

    = [P(O0|S0).P(S1|S0)].

    [P(O1|S1). P(S2|S1)].[P(O2|S2). P(S3|S2)].

    [P(O3|S3).P(S4|S3)].

    [P(O4|S4).P(S5|S4)].

    [P(O5|S5).P(S6|S5)].

    [P(O6|S6).P(S7|S6)].

    [P(O7|S7).P(S8|S7)].

    [P(O8|S8).P(S9|S8)].

    We introduce the statesS0 and S9 as initial

    and final statesrespectively.

    After S8 the next stateis S9 with probability

    1, i.e., P(S9|S8)=1O0 is -transition

    O0 O1 O2 O3 O4 O5 O6 O7 O8

    Obs: R R G G B R G R

    State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 99

    Introducing useful notation

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    100/131

    Introducing useful notation

    S0 S1

    S8

    S7

    S9

    S2 S3 S4 S5 S6

    O0 O1 O2 O3 O4 O5 O6 O7 O8

    Obs: R R G G B R G R

    State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

    RRG G B R

    G

    R

    P(Ok|Sk).P(Sk+1|Sk)=P(SkSk+1)Ok

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 100

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    101/131

    Probabilistic FSM

    (a1:0.3)

    (a2:0.4)

    (a1:0.2)

    (a2:0.3)

    (a1:0.1)

    (a2:0.2)

    (a1:0.3)

    (a2:0.2)

    The question here is:

    what is the most likely state sequence given the output sequence

    seen

    S1 S2

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 101

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    102/131

    Developing the tree

    Start

    S1 S2

    S1 S2 S1 S2

    S1 S2 S1 S2

    1.0 0.0

    0.1 0.3 0.2 0.3

    1*0.1=0.1 0.3 0.0 0.0

    0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06

    . .

    . .

    a1

    a2

    Choose the winning

    sequence per state

    per iteration

    0.2 0.4 0.3 0.2

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 102

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    103/131

    Tree structure contd

    S1 S2

    S1 S2 S1 S2

    0.1 0.3 0.2 0.3

    0.027 0.012..

    0.09 0.06

    0.09*0.1=0.009 0.018

    S1

    0.3

    0.0081

    S2

    0.2

    0.0054

    S2

    0.4

    0.0048

    S1

    0.2

    0.0024

    .

    a1

    a2

    The problem being addressed by this tree is )|(maxarg* ,2121 aaaaSPSs

    a1-a2-a1-a2 is the output sequence and the model or the machine

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 103

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    104/131

    Path found:(working backward)

    S1 S2 S1 S2 S1

    a2a1a1 a2

    Problem statement: Find the best possible sequence

    ),|(maxarg* OSPSs

    MachineorModelSeq,OutputSeq,State, OSwhere

    },,,{MachineorModel 0 TASS

    Start symbol State collection Alphabet

    set

    Transitions

    T is defined as kjijki SaSP ,,)(

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 104

    Tabular representation of the

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    105/131

    Tabular representation of the

    tree

    a1 a2 a1 a2

    S11.0 (1.0*0.1,0.0*0.2

    )=(0.1,0.0)(0.02,0.09)

    (0.009, 0.012) (0.0024,0.0081)

    S20.0 (1.0*0.3,0.0*0.3

    )=(0.3,0.0)(0.04,0.0

    6)(0.027,0.018) (0.0048,0.005

    4)

    Ending state

    Latest symbol

    observed

    Note: Every cell records the winning probability ending in that state

    Final winner

    The bold faced values in each cell shows the sequence probability ending in thatstate. Going backward from final winner sequence which ends in state S2 (indicatedBy the 2nd tuple), we recover the sequence.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 105

    Algorithm

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    106/131

    Algorithm(following James Alan, Natural Language Understanding

    (2nd

    edition), Benjamin Cummins (pub.), 1995

    Given:1. The HMM, which means:

    a. Start State: S1

    b. Alphabet: A = {a1, a2, ap}c. Set of States: S = {S1, S2, Sn}

    d. Transition probability

    which is equal to

    2. The output string a1a2aT

    To find:The most likely sequence of states C1C2CT which produces thegiven output sequence, i.e., C1C2CT =

    kjijk

    i SaSP ,,)(

    )|,( ikj SaSP

    ],,...,|([maxarg 21 TC

    aaaCP

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 106

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    107/131

    Algorithm contd

    Data Structure:1. A N*T array called SEQSCORE to maintain the

    winner sequence always (N=#states, T=length ofo/p sequence)

    2. Another N*T array called BACKPTR to recover thepath.

    Three distinct steps in the Viterbi implementation1.

    Initialization2. Iteration

    3. Sequence Identification

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 107

    1 Initialization

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    108/131

    1. Initialization

    SEQSCORE(1,1)=1.0

    BACKPTR(1,1)=0For(i=2 to N) do

    SEQSCORE(i,1)=0.0

    [expressing the fact that first state is S1]

    2. Iteration

    For(t=2 to T) do

    For(i=1 to N) do

    SEQSCORE(i,t) = Max(j=1,N)

    BACKPTR(I,t) = index jthat gives the MAX above

    )](*))1(,([ SiaSjPtjSEQSCORE k

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 108

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    109/131

    3. Seq. Identification

    C(T) = i that maximizes SEQSCORE(i,T)

    For i from (T-1) to 1 do

    C(i) = BACKPTR[C(i+1),(i+1)]

    Optimizations possible:1. BACKPTR can be 1*T

    2. SEQSCORE can be T*2

    Homework:- Compare this with A*, Beam Search [Homework]Reason for this comparison:

    Both of them work for finding and recovering sequence

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 109

    Viterbi Algorithm for the Urn problem (first two

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    110/131

    Viterbi Algorithm for the Urn problem (first twosymbols)

    S0

    U1 U2 U3

    0.5

    0.3

    0.2

    U1 U2 U3

    0.03

    0.08

    0.15

    U1 U2 U3 U1 U2 U3

    0.06

    0.02

    0.020.18

    0.24

    0.18

    0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036

    R

    21 July, 2014 Pushpak Bhattacharyya: Intro,POS

    110

    Markov process of order>1 (say 2)

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    111/131

    Markov process of order 1 (say 2)

    Same theory works

    P(S).P(O|S)

    = P(O0|S0).P(S1|S0).[P(O1|S1). P(S2|S1S0)].

    [P(O2|S2). P(S3|S2S1)].

    [P(O3|S3).P(S4|S3S2)].

    [P(O4|S4).P(S5|S4S3)].

    [P(O5|S5).P(S6|S5S4)].

    [P(O6|S6).P(S7|S6S5)].

    [P(O7|S7).P(S8|S7S6)].

    [P(O8|S8).P(S9|S8S7)].

    We introduce the statesS0 and S9 as initial

    and final statesrespectively.

    After S8 the next stateis S9 with probability

    1, i.e., P(S9|S8S7)=1O0 is -transition

    O0 O1 O2 O3 O4 O5 O6 O7 O8

    Obs: R R G G B R G R

    State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

    21 July, 2014

    Pushpak Bhattacharyya: Intro,POS 111

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    112/131

    Probability of observationsequence

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 112

    Why probability of observation

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    113/131

    y p ysequence?: Language modeling problem

    1.P(The sun rises in the east)2.P(The sun rise in the east)

    Less probable because of grammaticalmistake.

    3.P(The svn rises in the east) Less probable because of lexical mistake.

    4.P(The sun rises in the west) Less probable because of semantic mistake.

    Probabilities computed in the context of corpora

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 113

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    114/131

    Uses of language model

    1.Detect well-formedness Lexical, syntactic, semantic, pragmatic,

    discourse2.Language identification

    Given a piece of text what language does itbelong to.Good morning - EnglishGuten morgen - German

    Bon jour - French3.Automatic speech recognition4.Machine translation

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 114

    How to compute

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    115/131

    How to compute

    P(o0o1o2o3om)ationMarginaliz)|()(

    S

    SOPOP

    Consider the observation sequence,

    13210

    210

    ..

    ......

    mm SSSSSS

    OmOOO

    Where Si s represent the state sequences.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 115

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    116/131

    Computing P(o0o1o2o3om)

    )]|().|()].....[|().|()[()|().....|().|(

    ).|()....|().|().(

    )|...()...(

    )|()(),(

    101000

    1100

    112010

    2101210

    mmmm

    mm

    mm

    mm

    SSPSOPSSPSOPSPSOPSOPSOP

    SSPSSPSSPSP

    SOOOOPSSSSP

    SOPSPSOP

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 116

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    117/131

    Forward and BackwardProbability Calculation

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 117

    Forward probability F(k i)

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    118/131

    Forward probability F(k,i)

    Define F(k,i)= Probability of being instate Si having seen o0o1o2ok

    F(k,i)=P(o0o1o2ok , Si )

    With m as the length of the observedsequence

    There are N states

    P(observed sequence)=P(o0o1o2..om)=p=0,N P(o0o1o2..om , Sp)

    =p=0,N F(m , p)21 July, 2014

    Pushpak Bhattacharyya: Intro,POS 118

    Forward probability (contd )

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    119/131

    Forward probability (contd.)F(k , q)

    = P(o0o1o2..ok, Sq)= P(o0o1o2..ok, Sq)

    = P(o0o1o2..ok-1 , ok ,Sq)

    = p=0,N P(o0o1o2..ok-1 , Sp , ok ,Sq)

    = p=0,N P(o0o1o2..ok-1 , Sp ).

    P(ok ,Sq|o0o1o2..ok-1 , Sp)

    = p=0,N F(k-1,p). P(ok ,Sq|Sp)

    = p=0,N F(k-1,p). P(Sp Sq)ok

    O0 O1 O2 O3 Ok Ok+1 Om-1 Om

    S0 S1 S2 S3 Sp Sq Sm Sfinal

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 119

    Backward probability B(k i)

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    120/131

    Backward probability B(k,i)

    Define B(k,i)= Probability of seeingokok+1ok+2om given that the state wasSi

    B(k,i)=P(okok+1ok+2om \ Si ) With m as the length of the whole

    observed sequence

    P(observed sequence)=P(o0

    o1

    o2

    ..om

    )

    = P(o0o1o2..om| S0)

    =B(0,0)21 July, 2014

    Pushpak Bhattacharyya: Intro,POS 120

    Backward probability (contd.)

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    121/131

    p y ( )B(k , p)

    = P(okok+1ok+2om \ Sp)= P(ok+1ok+2om , ok|Sp)

    = q=0,N P(ok+1ok+2om , ok , Sq|Sp)

    = q=0,N P(ok ,Sq|Sp)

    P(ok+1ok+2om|ok ,Sq ,Sp )= q=0,N P(ok+1ok+2om|Sq ). P(ok ,

    Sq|Sp)

    = q=0,N B(k+1,q). P(Sp Sq)ok

    O0 O1 O2 O3 Ok Ok+1 Om-1 Om

    S0 S1 S2 S3 Sp Sq Sm Sfinal

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 121

    How Forward Probability

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    122/131

    How Forward Probability

    Works Goal of Forward Probability: To find P(O)

    [the probability of Observation Sequence].

    E.g. ^ People laugh .

    ^ .

    N N

    V V

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 122

    Translation and Lexical

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    123/131

    Translation and Lexical

    Probability Tables^ N V .^ 0 0.7 0.3 0

    N 0 0.2 0.6 0.2

    V 0 0.6 0.2 0.2

    . 1 0 0 0

    People Laugh

    ^ 1 0 0

    N 0 0.8 0.2

    V 0 0.1 0.9

    . 1 0 0

    Inefficient Computation:

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 123

    )(),()( jo

    i

    iSS

    SSPSOPOPj

    Computation in various paths

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    124/131

    Computation in various paths

    of the Tree PeopleLaugh

    Path 1: ^ N N.

    P(Path1) = (1.0x0.7)x(0.8x0.2)x(0.2x0.2) People

    LaughPath 2: ^ N V

    .

    P(Path2) = (1.0x0.7)x(0.8x0.6)x(0.9x0.2) People

    LaughPath 3: ^ V N

    .P(Path3) = (1.0x0.3)x(0.1x0.6)x(0.2x0.2)

    PeopleLaughPath 4: ^ V V

    .P(Path4) = (1.0x0.3)x(0.1x0.2)x(0.9x0.2)

    ^

    V

    N

    V

    N

    V

    N

    .

    .

    .

    .

    People Laugh

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 124

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    125/131

    Computations on the TrellisF = accumulated F x output pr obabili ty xtr ansiti on pr obability

    = 0.7x1.0

    = 0.3x1.0

    = x (0.2x0.3) + x (0.6x0.1) = x (0.6x0.8) + x (0.2x0.1)

    = x (0.2x0.2) + x (0.2x0.9)^ .N N

    V V

    People Laugh

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 125

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    126/131

    Number of Multiplications

    Tree

    Each path has 5multiplications + 1addition.

    There are 4 paths in thetree.

    Therefore, total of 20multiplications and 3

    additions.

    Trellis , -> 1 multiplication , -> 1 multiplication = x (1 mult) + x

    (1 mult)= 4 multiplications + 1

    addition Similarly, for and , 4

    multiplications and 1addition each.

    So, total of 14multiplications and 3additions.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 126

    Complexity

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    127/131

    Complexity

    Let |S| = #StatesAnd |O| = Observation length - |{^, .}|

    Stage 1 of Trellis: |S| multiplications

    Stage 2 of Trellis: |S| nodes; each node needscomputation over |S| arcs.

    Each Arc = 1 multiplication

    Accumulated F = 1 more multiplication

    Total 2|| multiplications

    Same for each stage before reading .

    At final stage ( . ) -> 2|S| multiplications Therefore, total multiplications = |S| + 2|| (|O| -

    1) + 2|S|

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 127

    Summary : Forward Algorithm

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    128/131

    Summary : Forward Algorithm

    1. Accumulate F over each stage of trellis.

    2. Take sum of F values multiplied by

    (

    ).3. Complexity = |S| + 2|| (|O| - 1) + 2|S|

    = 2|||O| - 2||+ 3|S|

    = O(||

    . |O|)i.e., linear in the length of input and quadraticin number of states.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 128

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    129/131

    Exercise1. Backward Probability

    a) Derive Backward Algorithm.

    b)

    Compute its complexity.2. Express P(O) in terms of both Forward and

    Backward probability.

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 129

    Possible project topics (will

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    130/131

    Possible project topics (will

    keep adding) Scrabble: auto-completion of words

    (human vs. m/c)

    Humour detection using wordnet(incongruity theory)

    Multistage POS tagging

    21 July, 2014Pushpak Bhattacharyya: Intro,

    POS 130

    Reading List

  • 7/27/2019 Cs626 Lect1to4 Intro Pos

    131/131

    g

    TnT(http://www.aclweb.org/anthology-new/A/A00/A00-

    1031.pdf)

    Brill Tagger(http://delivery.acm.org/10.1145/1080000/1075553/p112-brill.pdf?ip=182.19.16.71&acc=OPEN&CFID=129797466&CFTO

    KEN=72601926&__acm__=1342975719_082233e0ca9b5d1d67a9997c03a649d1)

    Hindi POS Tagger built by IIT Bombay(http://www.cse.iitb.ac.in/pb/papers/ACL-2006-Hindi-POS-Tagging.pdf)

    Projection(http://www.dipanjandas.com/files/posInduction.pdf)