Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson...

Large Vocabulary Large Vocabulary Unconstrained Unconstrained Handwriting Handwriting RecognitionRecognition

J Subrahmonia

Pen Technologies

IBM T J Watson Research Center

Pen Technologies

Pen-based interfaces in mobile computing

Mathematical Formulation

H : Handwriting evidence on the basis of which a recognizer will make its decision– H = {h1, h2, h3, h4,…,hm}

W : Word string from a large vocabulary– W = {w1, w2, w3, w4,…., wn}

Recognizer :– )|( HWW p

Wargmax

Mathematical Formulation

argmax

SOURCECHANNEL

Source Channel Model

WRITER DIGITIZER FEATURE EXTRACTOR

DECODER

CHANNEL

Source Channel Model

argmax

Handwriting Modeling : HMMs

LanguageModeling

SEARCH STRATEGY

Hidden Markov Models

Memoryless Model

Add Memory

Hide Something

Markov Model Mixture Model

Hide Something

Add Memory

Hidden Markov Model

Alan B Poritz : Hidden Markov Models : A Guided Tour ICASSP 1988

Memoryless ModelCOIN : Heads (1) : probability p Tails (0) : probability 1-p

Flip the coin 10 times (IID Random sequence)

Sequence : 1 0 1 0 0 0 1 1 1 1

Probability = p*(1-p)*p*(1-p)*(1-p)*(1-p)*p*p*p*p = p)-(1p

Add Memory – Markov Model2 Coins : COIN 1 => p(1) = 0.9, p(0) = 0.1 COIN 2 => p(1) = 0.1, p(0) = 0.9

Experiment :Flip COIN 1, Note the outcomeIf ( outcome = Head) Flip Coin 1Else Flip Coin 2End

Sequence 110 0 : Probability = 0.9*0.9*0.1*0.9Sequence 1010 : Probability = 0.9*0.1*0.1*0.1

State Sequence Representation

1 : 0.9

0 : 0.1

1 : 0.1

0 : 0.9

Observed Output Sequence Unique State Sequence

Hide the states => Hidden Markov Model

0.90.90.1

0.10.9

0.90.1

Why use Hidden Markov Models Instead of Non-hidden?

Hidden Markov Models can be smaller – less parameters to estimate

States may be truly hidden– Position of the hand– Positions of articulators

Summary of HMM Basics We are interested in assigning probabilities p(H)

to feature sequences Memoryless model

– This model has no memory of the past Markov noticed that is some sequences the future

depends on the past. He introduced the concept of a STATE – a equivalence class of the past that influences the future

Hide the states : HMM

)()( hiH

)|()|( 1 ispp hih11,...,hihi

),()( SHH ppS

Hidden Markov Models

Given a observed sequence H– Compute p(H) for decoding– Find the most likely state sequence for a

given Markov model (Viterbi algorithm)– Estimate the parameters of the Markov

source (training)

Compute p(H)

0.4p(a)p(b)

0.50.5

0.70.3

s20.30.7

0.80.2

Compute p(H) – contd.

Compute p(H) where H = a a b b Enumerate all ways of producing h1=a

0.5x0.8

0.3x0.7

0.4x0.5

0.5x0.3

Compute p(H) – contd. Enumerate all ways of producing

h1=a h2=a

0.5x0.8

0.3x0.7

0.4x0.5

0.5x0.3

0.5x0.8

0.3x0.70.2

0.4x0.5

0.5x0.3

0.4x0.5

0.5x0.3

Compute p(H)

Can save computation by combining paths

Compute p(H)

Trellis Diagram

0 a aa aab aabb

.5x.8 .5x.8 .5x.2 .5x.2

.4x.5 .4x.5 .4x.5 .4x.5

.3x.7 .3x.7 .3x.3 .3x.3

.5x.3 .5x.3 .5x.7 .5x.7

.2 .2 .2 .2 .2

.1 .1 .1 .1 .1

Basic Recursion Prob (Node) =

sum (Prob(predecessor) x Prob (predecessor->node) ) Boundary condition : Prob (s, 0) = 1

0 a aa aab aabb

1.0 s1, a : 0.4

1.0 0.4 .16 .016 .0016

s1, a : 0.4 s1, a : 0.4 s1, a : 0.4

s1, 0 : .08s1, a : .21s2, a : .04

0.20.33 .182 .054 .01256

s1, 0 : 0.2s1, 0 : .032s1, a : .084s2, a : .066

s1, 0 : .0032s1, b : .0144s2, b : .0364

s1, 0 : .00032s1, b : .00144s2, b : .0108

s2, 0 : .033s1, a : .03

0.02 0.063 .0677 .0691 .020156

s2, 0 : 0.02s2, 0 : .0182s2, a : .0495

s2, 0 : .0054s2, b : .0637

s2, 0 : .001256s2, b : .0189

More Formally –Forward Algorithm

)|()|()(

ssPssPs

Find Most Likely Path for aabb- Dynamic Prog. or Viterbi

Max Prob (Node) =

MAX(Max(predecessor) x Prob (predecessor->node) )

0 a aa aab aabb

1.0 s1, a : 0.4 s1, a : .16 s1, b : .016 s1,b : .0016

s1, 0 : .08s1, a : .21s2, a : .04

s1, 0 : 0.2s1, 0 : .032s1, a : .084s2, a : .066

s1, 0 : .0032s1, b : .0144s2, b : .0168

s1, 0 : .00032s1, b : .00144s2, b : .00336

s2, 0 : .021s1, a : .03

s2, 0 : 0.02s2, 0 : .0084s2, a : .0315

s2, 0 :.00168s2, b : .0294

s2, 0 : .000336s2, b : .00588

Training HMM parameters1/3

1/21/2

p(a)p(b) =H = abaa

.000385 .000578 .000868

.001302 .001157 .002604 .001736

p(H) = .008632

Training HMM parameters1t

ic = A posterior probability of path i = )(Hppi

1c 2c 3c 4c 5c 6c 7c.045 .067 .134 .100 .201 .150 .301

46.0)(

363.0)(

637.0)(

838.0223)(

543211

cccctc

ccccctc

34.0)( 2 tp 20.0)( 3 tp

Training HMM parameters

29.0),(

71.0),(

246.0),(

592.02),(

543211

cccbtc

cccccatc

.29.68.32

1p 2p 3p 4p 5p 6p 7p

0.00108 0.00129 0.00404 0.00212 0.00537 0.00253 0.00791

008632.002438.0)( Hp

Keep on repeating : 600 iterations : p(H) = .037037037Another initial parameter set : p(H) = 0.0625

Converges to local maximum There are 7 (atleast) local maxima Final solution depends on starting point Speed of convergence depends on

starting point

Training HMM parameters : Forward Backward algorithm

Improves on enumerating algorithm by using the Trellis

Results in reduction from exponential computation to linear computation

Forward Backward Algorithm

. . . . . . . . . .

= Probability that hj is produced by and the complete output is H

),( Htp ij

it)().()().(1 bjiiaj st|PtPs hj

)(1 aj s = Probability of being in state and producing the output h1, .. hj-1

)( bj s = Probability of being in state and producing the output hj+1,..hm

Transition count

)(/),()|( HHH ptptC ji

)|()|()(

ssPssPs

Training HMM parameters Guess initial values for all parameters Compute forward and backward pass

probabilities Compute counts Re-estimate probabilities

BAUM-WELCH, BAUM-EAGON, FORWARD-BACKWARD, E-M

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson...

Documents

Chapter 4: Unconstrained Optimization - McMaster …xwu/part4.pdf · Chapter 4: Unconstrained Optimization † Unconstrained optimization problem minx F(x) or maxx F(x) † Constrained

Unconstrained Paving & Plastering: A New Idea for All ... · Unconstrained Plastering offers. 1.2 Unconstrained Plastering, an Unproven Concept Unconstrained Plastering is a new approach

Introduction to Unconstrained Optimization · Introduction to Unconstrained Optimization Micha el Baudin February 2011 Abstract This document is a small introduction to unconstrained

Dennis j e, schnabel b numerical methods for unconstrained optimization and nonlinear equations

Local, Unconstrained Function Optimization

UNCONSTRAINED MULTIVARIABLE OPTIMIZATION - OU

Chapter 7 Optimization. Content Introduction One dimensional unconstrained Multidimensional unconstrained Example

A Novel Connectionist System for Unconstrained Handwriting

UNCONSTRAINED MULTIVARIABLE OPTIMIZATION

Unconstrained 3D Face Reconstruction

HHS Public Access learning J Exp Psychol Gen Handwriting … · Recent research has demonstrated that handwriting practice facilitates letter categorization in young children. The

Optimization Unconstrained Optimization

Ubs europ unconstrained 130312

A database of unconstrained Vietnamese online handwriting

Implement unconstrained constrained planning

NLP Unconstrained Multivariable

Chap3 Unconstrained

Mathematica Tutorial: Unconstrained Optimization

A Novel Connectionist System for Unconstrained Handwriting ...people.idsia.ch/~juergen/tpami_2008.pdf · A Novel Connectionist System for Unconstrained Handwriting Recognition

Redwood unconstrained bond presentation