33
Efficient Lattice Rescoring using Recurrent Neural Network Language Models X. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland Proc. of ICASSP 2014 Introduced by Makoto Morishita 2016/02/25 MT Study Group

[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Embed Size (px)

Citation preview

Page 1: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Efficient Lattice Rescoring using Recurrent Neural Network Language ModelsX. Liu, Y. Wang, X. Chen, M. J. F. Gales & P. C. Woodland Proc. of ICASSP 2014

Introduced by Makoto Morishita 2016/02/25 MT Study Group

Page 2: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

What is a Language Model

• Language models assign a probability to each sentence.

2

W1 = speech recognition system

W2 = speech cognition system

W3 = speck podcast histamine

P(W1) = 4.021 * 10-3

P(W2) = 8.932 * 10-4

P(W3) = 2.432 * 10-7

Page 3: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

What is a Language Model

• Language models assign a probability to each sentence.

3

W1 = speech recognition system

W2 = speech cognition system

W3 = speck podcast histamine

P(W1) = 4.021 * 10-3

P(W2) = 8.932 * 10-4

P(W3) = 2.432 * 10-7

Best!

Page 4: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

In this paper…

• Authors propose 2 new methods to efficiently re-score speech recognition lattices.

4

0 1

7

9

2 3 4 5 6

8

high this is my mobile phone

phones

this

this

hi

hy

Page 5: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Language Models

Page 6: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

n-gram back off model

6

This is my mobile phone

hone

home2345

1

• Use n-gram words to estimate the next word probability.

Page 7: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

n-gram back off model

• Use n-gram words to estimate the next word probability.

7

This is my mobile phone

hone

home2345

1If bi-gram, use these words.

Page 8: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Feedforward neural network language model

• Use n-gram words and feedforward neural network.

8

[Y. Bengio et. al. 2002]

Page 9: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Feedforward neural network language model

9

[Y. Bengio et. al. 2002]

http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html

Page 10: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Recurrent neural network language model

• Use full history contexts and recurrent neural network.

10

[T. Mikolov et. al. 2010]

001

0

current word

history

sigmoid softmax

wi�1

si�2

si�1

si�1

P (wi|wi�1, si�2)

Page 11: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Language Model States

Page 12: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

LM states

12

• To use LM for re-scoring task, we need to store the states of LM to efficiently score the sentence.

Page 13: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

bi-gram

13

0 1 2 3

a

b

c

e

d

SR Lattice

bi-gram LM states

1aa

bc

e

1b

2c

2d

0<s> 3e

e

cd

d

Page 14: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

tri-gram

14

0 1 2 3

a

b

c

e

d

SR Lattice

tri-gramLM states

1<s>,aa

b

0<s>

2<s>,b

2a,c

2a,d

2a,c

2a,d

c

d

c

d

3e,d

3e,c

e

ee

e

Page 15: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

tri-gram

15

0 1 2 3

a

b

c

e

d

SR Lattice

tri-gramLM states

1<s>,aa

b

0<s>

2<s>,b

2a,c

2a,d

2a,c

2a,d

c

d

c

d

3e,d

3e,c

e

ee

e

States become larger!

Page 16: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Difference

• n-gram back off model & feedforward NNLM - Use only fixed n-gram words.

• Recurrent NNLM- Use whole past words (history). - LM states will grow rapidly. - It takes a lot of computational cost.

16

We want to reduce recurrent NNLM states

Page 17: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Hypothesis

Page 18: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Context information gradually diminishing

• We don’t have to distinguish all of the histories.

• e.g.I am presenting the paper about RNNLM. ≒ We are presenting the paper about RNNLM.

18

Page 19: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Similar history make similar vector

• We don’t have to distinguish all of the histories.

• e.g.I am presenting the paper about RNNLM. ≒ I am introducing the paper about RNNLM.

19

Page 20: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Proposed Method

Page 21: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

n-gram based history clustering

• I am presenting the paper about RNNLM. ≒ We are presenting the paper about RNNLM.

• If the n-gram is the same,we use the same history vector.

21

Page 22: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

History vector based clustering

• I am presenting the paper about RNNLM. ≒ I am introducing the paper about RNNLM.

• If the history vector is similar to other vector, we use the same history vector.

22

Page 23: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Experiments

Page 24: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Experimental results

24

4-gram back-off LMFeedforward NNLM

RNNLM Reranking

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Baseline

Page 25: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Experimental results

25

4-gram back-off LMFeedforward NNLM

RNNLM Reranking

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Baseline

Page 26: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Experimental results

26

4-gram back-off LMFeedforward NNLM

RNNLM Reranking

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Baseline

comparable WER and70% reduction in lattice size

Page 27: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

27

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Same WER and45% reduction in lattice size

Experimental results

Page 28: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

28

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Same WER and7% reduction in lattice size

Experimental results

Page 29: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Experimental results

29

4-gram back-off LMFeedforward NNLM

RNNLM Reranking

RNNLM n-gram based history clustering

RNNLM history vector based clustering

Baseline

Comparable WER and72.4% reduction in lattice size

Page 30: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Conclusion

Page 31: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Conclusion

• Proposed methods can achieve comparable WER with 10k-best re-ranking, as well as over 70% compression in lattice size.

• Small lattice size make computational cost smaller!

31

Page 32: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

References

• これもある意味Deep Learning,Recurrent Neural Network Language Modelの話 [MLAC2013_9日目]http://kiyukuta.github.io/2013/12/09/mlac2013_day9_recurrent_neural_network_language_model.html

32

Page 33: [Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Network Language Models

Prefix tree structuring

33