60
Vector Quantized Neural Networks for Acoustic Unit Discovery Benjamin van Niekerk, Leanne Nortje, Herman Kamper

Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantized Neural Networks for Acoustic Unit

Discovery

Benjamin van Niekerk, Leanne Nortje, Herman Kamper

Page 2: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

Content:● Discrete phonetic units.● ≅44 phonemes in English.

HH / Y / UW / M / ER

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

HUMOUR

Page 3: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

Content:● Discrete phonetic units.● ≅44 phonemes in English.

HH / Y / UW / M / ER

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

HUMOUR

Page 4: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

Content:● Discrete phonetic units.● ≅44 phonemes in English.

HH / Y / UW / M / ER

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

HUMOUR

Page 5: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

Content:● Discrete phonetic units.● ≅44 phonemes in English.

HH / Y / UW / M / ER

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

HUMOUR

Page 6: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

HH / Y UW/ M/ ER/

Content:● Discrete phonetic units.● ≅44 phonemes in English.

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

Page 7: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

HH / Y UW/ M/ ER/

Content:● Discrete phonetic units.● ≅44 phonemes in English.

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

Page 8: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

HH / Y UW/ M/ ER/

Content:● Discrete phonetic units.● ≅44 phonemes in English.

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

Page 9: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

HH / Y UW/ M/ ER/

Content:● Discrete phonetic units.● ≅44 phonemes in English.

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

Page 10: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

The Generative Factors of Speech

Content:● Discrete phonetic units.● ≅44 phonemes in English.

Prosody:● Rhythm● Intonation● Stresses

Timbre:● Quality of a particular voice.● Characterized by frequency

spectrum.

Page 11: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

What is Acoustic Unit Discovery?

The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!

Page 12: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

What is Acoustic Unit Discovery?

The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!

Page 13: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

What is Acoustic Unit Discovery?

Encoder

The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!

Page 14: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

What is Acoustic Unit Discovery?

Encoder

The goal is to learn discrete representations of speech that separate phonetic content from the other factors.…all without any labels or annotations!

Page 15: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Applications

Bootstrap training of low-resource speech systems:

Automatic speech recognition

Text-to-speech

Non-parallel voice conversion

Page 16: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Applications

Automatic speech recognition

Text-to-speech

Non-parallel voice conversion

Bootstrap training of low-resource speech systems:

Page 17: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Applications

Automatic speech recognition

Text-to-speech

Non-parallel voice conversion

Bootstrap training of low-resource speech systems:

Page 18: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Applications

Automatic speech recognition

Text-to-speech

Non-parallel voice conversion

Bootstrap training of low-resource speech systems:

Page 19: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

But, how do we learn discrete representations using neural networks?

Page 20: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

But, how do we learn discrete representations using neural networks?

A. van den Oord, O. Vinyals, and K. Kavukcuoglu. “Neural discrete representation learning.” Advances in Neural Information Processing Systems. 2017.

Page 21: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Codebook

Page 22: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 23: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 24: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 25: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 26: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 27: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 28: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 29: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantization Layer

Encoder

Codebook

Page 30: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge.

A Vector-Quantized Variational Autoencoder (VQ-VAE)1. A combination of Vector-Quantization and

Contrastive Predictive Coding (VQ-CPC)2.

Encoder

Decoder

VQ

layer

Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.

Page 31: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge.

A Vector-Quantized Variational Autoencoder (VQ-VAE)1. A combination of Vector-Quantization and

Contrastive Predictive Coding (VQ-CPC)2.

Encoder

Decoder

VQ

layer

Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.

Page 32: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Our contribution: we propose and compare two models for acoustic unit discovery in the ZeroSpeech 2020 Challenge.

A combination of Vector-Quantization and Contrastive Predictive Coding (VQ-CPC)2.A Vector-Quantized Variational

Autoencoder (VQ-VAE)1.

Encoder

Decoder

VQ

layer

Inspired by: J. Chorowski, et al. “Unsupervised speech representation learning using wavenet autoencoders.” IEEE/ACM transactions on audio, speech, and language processing. 2019.

Inspired by: A. van den Oord, et al. “Representation Learning with Contrastive Predictive Coding.” 2018.

Page 33: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Variational Autoencoder

Encoder

VQ

layer

Decoder

Page 34: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Variational Autoencoder

minimize reconstruction error

Encoder

VQ

layer

Decoder

Page 35: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Variational Autoencoder

Encoder

Decoder

VQ

layer

Information bottleneck

Page 36: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Variational Autoencoder

Encoder

Decoder

VQ

layer

Information bottleneck

Speaker

Page 37: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Variational Autoencoder

Encoder

Decoder

VQ

layer

Information bottleneck

SpeakerPowerful autoregressive model

Page 38: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Input

Prediction

Page 39: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Input

Encoder

Page 40: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Input

Encoder

VQ layer

Page 41: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Input

Encoder

VQ layer

Context model

Page 42: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Input

Encoder

VQ layer

Context model

Predictions

Page 43: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Context vector

Page 44: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Context vector

Positive example

Page 45: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Context vector

Positive example

Negative examples

Page 46: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Context vector

Positive example

Negative examples

Page 47: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector-Quantized Contrastive Predictive Coding

Context vector

Positive example

Negative examples

Page 48: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - Voice Conversion

Encoder

Decoder

VQ

layer

Evaluation Metrics:● Speaker similarity (1-5 scale).● Intelligibility (character error rate).● Mean opinion score (1-5 scale).

Page 49: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - Voice Conversion

Encoder

Decoder

VQ

layer

Evaluation Metrics:● Speaker similarity (1-5 scale).● Intelligibility (character error rate).● Mean opinion score (1-5 scale).

Page 50: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Source Converted Target Other Conversion

Evaluation - Voice Conversion

Page 51: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - Voice Conversion

Page 52: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - Voice Conversion

Page 53: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - Voice Conversion

Page 54: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - ABX Score

Triphone A:

beg

Encoder

Page 55: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - ABX Score

Triphone A:

beg

Triphone B:

bag

Encoder Encoder

Page 56: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - ABX Score

Triphone A:

beg

Triphone B:

bag

Triphone X:

beg

Encoder Encoder Encoder

Page 57: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - ABX Score

Triphone A:

bug

Triphone B:

bag

Triphone X:

beg

Encoder Encoder Encoder

Page 58: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Evaluation - ABX Score

Page 59: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Questions?

Page 60: Vector Quantized Neural Networks for Acoustic Unit Discoverynortje+kamper... · 2021. 2. 15. · Vector Quantization Layer Encoder Codebook. Our contribution: we propose and compare

Vector Quantized Variational Autoencoder

log-Mel spec

conv3(768)batchnorm

ReLU

conv3(768)batchnorm

ReLU

conv4stride2(768)batchnorm

ReLU

conv3(768)batchnorm

ReLU

conv3(768)batchnorm

ReLU

Encoder linear(64) VQ(512)

100H

z50

Hz

Bottleneck

jitter(0.5) embedding

Decoder

concat

upsample

biGRU(128)biGRU(128)

upsample

GRU(896)

linear(256)ReLU

linear(256)ReLU

softmaxsamplemu-law

embedding

speaker