Upload
ada
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church [email protected]. Applications. Recognition: Shannon’s Noisy Channel Model Speech, Optical Character Recognition (OCR), Spelling Transduction Part of Speech (POS) Tagging - PowerPoint PPT Presentation
Citation preview
1
Applications (2 of 2):Recognition, Transduction, Discrimination,
Segmentation, Alignment, etc.
Kenneth [email protected]
Dec 9, 2009
2
Applications• Recognition: Shannon’s Noisy Channel Model
– Speech, Optical Character Recognition (OCR), Spelling• Transduction
– Part of Speech (POS) Tagging– Machine Translation (MT)
• Parsing: ???• Ranking
– Information Retrieval (IR)– Lexicography
• Discrimination:– Sentiment, Text Classification, Author Identification, Word Sense Disambiguation (WSD)
• Segmentation– Asian Morphology (Word Breaking), Text Tiling
• Alignment: Bilingual Corpora, Dotplots• Compression• Language Modeling: good for everything
Dec 9, 2009
3
Speech LanguageShannon’s: Noisy Channel Model
• I Noisy Channel O• I΄ ≈ ARGMAXI Pr(I|O) = ARGMAXI Pr(I) Pr(O|I)Trigram Language Model
Word Rank More likely alternatives
We 9 The This One Two A Three Please In
need 7 are will the would also do
to 1
resolve 85 have know do…
all 9 The This One Two A Three Please In
of 2 The This One Two A Three Please In
the 1
important 657 document question first…
issues 14 thing point to
Channel Model
Application Input Output
Speech Recognition writer rider
OCR (Optical Character Recognition)
all a1l
Spelling Correction government goverment
ChannelModel
LanguageModel
ApplicationIndependent
Dec 9, 2009
4
Speech Language Using (Abusing) Shannon’s Noisy Channel Model: Part of Speech
Tagging and Machine Translation
• Speech– Words Noisy Channel Acoustics
• OCR– Words Noisy Channel Optics
• Spelling Correction– Words Noisy Channel Typos
• Part of Speech Tagging (POS): – POS Noisy Channel Words
• Machine Translation: “Made in America”– English Noisy Channel French
Didn’t have the guts to use this slide at Eurospeech (Geneva)Dec 9, 2009
5Dec 9, 2009
6
Spelling Correction
Dec 9, 2009
7Dec 9, 2009
8Dec 9, 2009
9Dec 9, 2009
10Dec 9, 2009
11
Evaluation
Dec 9, 2009
12
Performance
Dec 9, 2009
13
The Task is Hard without Context
Dec 9, 2009
14
Easier with Context
• actuall, actual, actually– … in determining whether the defendant actually
will die.• constuming, consuming, costuming• conviced, convicted, convinced• confusin, confusing, confusion• workern, worker, workers
Dec 9, 2009
15Dec 9, 2009
Easier with Context
16
Context Model
Dec 9, 2009
17Dec 9, 2009
18Dec 9, 2009
19Dec 9, 2009
20Dec 9, 2009
21
Future Improvements
• Add More Factors– Trigrams– Thesaurus Relations– Morphology– Syntactic Agreement– Parts of Speech
• Improve Combination Rules– Shrink (Meaty Methodology)
Dec 9, 2009
22Dec 9, 2009
23
Conclusion (Spelling Correction)
• There has been a lot of interest in smoothing– Good-Turing estimation– Knesser-Ney
• Is it worth the trouble?• Ans: Yes (at least for recognition applications)
Dec 9, 2009
24Dec 9, 2009
25Dec 9, 2009
26Dec 9, 2009
27Dec 9, 2009
28Dec 9, 2009
29Dec 9, 2009
30Dec 9, 2009
31Dec 9, 2009
32Dec 9, 2009
33Dec 9, 2009
34Dec 9, 2009
35Dec 9, 2009
36Dec 9, 2009
37Dec 9, 2009
38Dec 9, 2009
39Dec 9, 2009
40Dec 9, 2009
41Dec 9, 2009
42Dec 9, 2009
43Dec 9, 2009
44Dec 9, 2009
45Dec 9, 2009
46Dec 9, 2009
47
Aligning Words
Dec 9, 2009
48Dec 9, 2009
49Dec 9, 2009
50Dec 9, 2009
51Dec 9, 2009
52Dec 9, 2009
53Dec 9, 2009
54Dec 9, 2009
55Dec 9, 2009
56Dec 9, 2009
57Dec 9, 2009
58Dec 9, 2009
59Dec 9, 2009
60Dec 9, 2009
61Dec 9, 2009
62Dec 9, 2009
63Dec 9, 2009
64Dec 9, 2009