Download pdf - A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

136

A Variational Parametric Model forAudio Synthesis

Krishna SubramaniSupervised by Prof Preeti Rao

Electrical EngineeringIndian Institute of Technology Bombay India

236

Audio Synthesis

I What comes to your mind when you hear lsquoAudio Synthesisrsquo

236

Audio Synthesis

Figure One of the early Moog Modular Synthesizers

236

Audio Synthesis

I More generally it involves us specifying controlling parametersto a synthesizer to obtain an audio output

I What are the main parameters which govern the audiogeneration(at a high level)

236

Audio Synthesis

1 Timbre

2 Pitch

3 Loudness

236

Audio Synthesis

I Early analog synthesizers used voltage controlled oscillatorsfilters amplifiers to generate the waveform and lsquoenvelopegeneratorsrsquo to shape it

I Data-driven statistical modeling + computing power=rArr Deep Learning for audio synthesis

236

Audio Synthesis

336

Generative Models for Audio Synthesis

I Rely on ability of algorithms to extract musically relevantinformation from vast amounts of data

I Autoregressive modeling Generative Adversarial Networks andVariational Autoencoders are some of the proposed generativemodeling methods

I Most methods in literature try to model audio signals directlyin the time or frequency domain

436

Our Nearest Neighbours

I [Sarroff and Casey 2014] first to use autoencoders to performframe-wise reconstruction of short-time magnitude spectra

X Learn perceptually relevant lower dimensionalrepresentations(lsquolatent spacersquo) of audio to help in synthesis

times lsquoGraininessrsquo in the reconstructed sound

I [Roche et al 2018] extended this analysis to try out differentautoencoder architectures

X Helped by the release of NSynth [Engel et al 2017]X Usability of the latent space in audio interpolation

I [Esling et al 2018] regularized the VAE latent space in orderto effect control over perceptual timbre of synthesizedinstruments

X Enabled them to lsquogeneratersquo and lsquointerpolatersquo audio in this space

436

X Helped by the release of NSynth [Engel et al 2017]

X Usability of the latent space in audio interpolation

436

536

I Major issue with previous methods was frame-wiseanalysis-synthesis based reconstruction which fails to modeltemporal evolution

I [Engel et al 2017] inspired by Wavenets [Oord et al 2016]autoregressive modeling capablities for speech extended it tomusical instrument synthesis

X Generation of realistic and creative soundstimes Complex network which was difficult to train and sample from

I [Wyse 2018] also autoregressively modelled the audio albeitby conditioning the waveform samples on additional parameterslike pitch velocity(loudness) and instrument class

X Was able to achieve better control over generation byconditioning

times Unable to generalize to untrained pitches

536

X Generation of realistic and creative sounds

times Complex network which was difficult to train and sample from

536

636

Why Parametric

I Rather than generating new timbres(lsquointerpolatingrsquo acrosssounds) we consider the problem of synthesis of a giveninstrument sound with flexible control over the pitch andloudness dynamics

I Pitch shifting without timbre modification uses a source-filtermodel with the filter(spectral envelope) being kept constant[Roebel and Rodet 2005]

I A powerful parametric representation over raw waveform orspectrogram has the potential to achieve high quality with lesstraining data

1 [Blaauw and Bonada 2016] recognized this in context ofspeech synthesis and used a vocoder representation to train agenerative model achieving promising results along the way

636

Why Parametric

736

Dataset

I Good-sounds dataset [Romani Picas et al 2015] consisting ofindividual note and scale recordings for 12 different instruments

I We work with the lsquoViolinrsquo played in mezzo-forte loudnessand choose the 4th octave(MIDI 60-71)

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

Figure Instances per note in the overall dataset

I Average note duration for the chosen octave is about 45s pernote

736

Dataset

Why we chose ViolinPopular in Indian Music Human voice-like timbre

Ability to produce continuous pitch

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

I We split the data to train(80) and test(20) instances acrossMIDI note labels

I Our model is trained with frames(duration 213ms) from thetrain instances and system performance is evaluated withframes from the test instances

936

Non-Parametric Reconstruction

I Framewise magnitude spectra reconstruction procedure in[Roche et al 2018] cannot generalize to untrained pitches

I We consider two cases - Train including and excluding MIDI 63along with its 3 neighbouring pitches on either side

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

I Repeat above across all input spectral frames and invertobtained spectrogram using Griffin-Lim [Griffin and Lim 1984]

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

Figure Input MIDI 63 1 1

Figure Including MIDI 63 2 2 Figure Excluding MIDI 63 3 3

var ocgs=hostgetOCGs(hostpageNum)for(var i=0iltocgslengthi++)if(ocgs[i]name==MediaPlayButton1)ocgs[i]state=false

1728

1136

Parametric Model

1 Frame-wise magnitude spectrum rarr harmonic representationusing Harmonic plus Residual(HpR) model[Serra et al 1997](currently we neglect the residual)

Sustain

FFTHpR

f0Magnitudes

I Output of HpR block =rArr log-dB magnitudes + harmonics

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

2 log-dB magnitudes + harmonics rarr TAE algorithm[Roebel and Rodet 2005 IMAI 1979]

I TAE =rArr Iterative Cepstral Liftering

A0(k) = log(|X (k)|)V0 = minusinfinwhile(Ai minus A0 lt ∆)Ai (k) = max(Aiminus1(k)Viminus1(k))

Vi = FFT (lifter(Ai ))

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

I No open source implementation available for the TAE thus weimplemented it following procedure highlighted in[Roebel and Rodet 2005 Caetano and Rodet 2012]

I Figure shows a TAE snap from [Caetano and Rodet 2012]

I Similar to results we get 1 4 2 5

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

Frequency(kHz)Magnitude(dB)

606571

I Spectral envelope shape varies across pitch

1 Dependence of envelope on pitch[Slawson 1981 Caetano and Rodet 2012]

2 Variation due the TAE algorithm

I TAE rarr smooth function to estimate harmonic amplitudes

1536

Parametric Model

I No of CCs(Cepstral CoefficientsKcc) depends on Samplingrate(Fs)pitch(f0)

Kcc leFs2f0

I For our network we choose the maximum Kcc(lowest pitch) =91 and zero pad the high pitch Kcc to 91 dimension vectors

1636

Generative Models

I Autoencoders [Hinton and Salakhutdinov 2006] - Minimizethe MSE(Mean Squared Error) between input and networkreconstruction

bull Simple to train and performs good reconstruction(since theyminimize MSE)

bull Not truly a generative model as you cannot generate new data

I Variational Autoencoders [Kingma and Welling 2013] -Inspired from Variational Inference enforce a prior on thelatent space

bull Truly generative as we can lsquogeneratersquo new data by samplingfrom the prior

I Conditional Variational Autoencoders[Doersch 2016 Sohn et al 2015] - Same principle as a VAEhowever learns the conditional distribution over an additionalconditioning variable

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Continuous latent space from which we can sample points(andsynthesize the corresponding audio)

Why CVAE over VAE

Conditioning on pitch =rArr Network captures dependenciesbetween the timbre and the pitch =rArr More accurate envelopegeneration + Pitch control

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

Network Architecture

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

I Network input is CCs rarr MSE represents perceptually relevantdistance in terms of squared error between the input andreconstructed log magnitude spectral envelopes

I We train the network on frames from the train instances

I For evaluation MSE values calculated ahead is the averagereconstruction error across all the test instance frames

1936

Network ArchitectureI Main hyperparameters -

1 β - Controls relative weighting between reconstruction andprior enforcement

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

Latent space dimensionality

MSE

β = 001β = 01β = 1

Figure CVAE varying β

I Tradeoff between both terms choose β = 01

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

I Main hyperparameters -

2 Dimensionality of latent space - networks reconstruction ability

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

Figure CVAE(β = 01) vs AE

I Steep fall initially flatter later Choose dimensionality = 32

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

I Network size [91 91 32 91 91]I 91 is the dimension of input CCs 32 is latent space

dimensionality

I Linear Fully Connected Layers + Leaky ReLU activations

I ADAM [Kingma and Ba 2014] with initial lr = 10minus3

I Training for 2000 epochs with batch size 512

2236

Experiments

I Two kinds of experiments to demonstrate networks capabilities

1 Reconstruction - Omit pitch instances during training and seehow well model reconstructs notes of omitted target pitch

2 Generation - How well model lsquosynthesizesrsquo note instances withnew unseen pitches

2236

Experiments

2236

Experiments

2336

Reconstruction

I Two training contexts -

1 T(times) is target X is training instancesMIDI T - 3 T - 2 T - 1 T T + 1 T + 2 T + 3Kept X X X times X X X

2 Octave endpointsMIDI 60 61 62 63 64 65Kept X times times times times timesMIDI 66 67 68 69 70 71Kept times times times times times X

I In each of the above cases we compute the MSE as theframe-wise spectral envelope match across all frames of all thetarget instances

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

I Both AE and CVAE reasonably reconstruct the target pitch1 6 2 7 3 8

I CVAE produces better reconstruction especially when thetarget pitch is far from the pitches available in the trainingdata(plot above) 4 9 5 10 6 11

I Conditioning helps to capture the pitch dependency of thespectral envelope more accurately

1704

1536

2536

Reconstruction

I To emulate the effect of pitch conditioning with an AE wetrain the AE by appending the pitch to the input CCs andreconstructing this appended input as shown below

f0

CCscAE

fprime

0

CCsprime

Figure lsquoConditionalrsquo AE(cAE)

I [Wyse 2018] followed a similar approach of appending theconditional variables to the input of his model

I the lsquocAErsquo is comparable to our proposed CVAE in that thenetwork might potentially learn something from f0

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

I We train the cAE on the octave endpoints and evaluateperformance as shown above

I Appending f0 does not improve performance It is infact worsethan AE

2736

Generation

I We are interested in lsquosynthesizingrsquo audio

I We see how well the network can generate an instance of adesired pitch(which the network has not been trained on)

Z

Dec

od

er

f0

Figure Sampling from the Network

I We follow the procedure in [Blaauw and Bonada 2016] - arandom walk to sample points coherently from the latent space

2836

Generation

I We train on instances across the entire octave sans MIDI 65and then generate MIDI 65

I On listening we can see that the generated audio still lacks thesoft noisy sound of the violin bowing

1 12 2 13 3 14

I For a more realistic synthesis we generate a violin note withvibrato 4 15

2184

2256

2184

2936

Putting it all together

I We explored autoencoder frameworks in generative models foraudio synthesis of instrumental tones

I Our parametric representation decouples lsquotimbrersquo and lsquopitchrsquothus relying on the network to model the inter-dependencies

I Pitch conditioning allows to generate the learnt spectralenvelope for that pitch thus enabling us to vary the pitchcontour continuously

2936

3036

Putting it all togetherI Our model is still far from perfect however and we have to

improve it further

1 An immediate thing to do will be to model the residual of theinput audio This will help in making the generated audio soundmore natural

2 We have not taken into account dynamics A more completesystem will involve capturing spectral envelope(timbral)dependencies on both pitch and loudness dynamics

3 Conducting more formal listening tests involving synthesis oflarger pitch movements or melodic elements from IndianClassical music

I Our contributions

1 To the best of our knowledge we have not come across anywork using a parametric model for musical tones in the neuralsynthesis framework especially exploiting the conditioningfunction of the CVAE

2 The TAE method has no known PYTHON implementation sowe plan to make our code open-source to the MIR communityfor research

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

[Blaauw and Bonada 2016] Blaauw M and Bonada J (2016)Modeling and transforming speech using variational autoencodersIn Interspeech pages 1770ndash1774

[Caetano and Rodet 2012] Caetano M and Rodet X (2012)A source-filter model for musical instrument sound transformationIn 2012 IEEE International Conference on Acoustics Speech and Signal Processing(ICASSP) pages 137ndash140 IEEE

[Doersch 2016] Doersch C (2016)Tutorial on variational autoencodersarXiv preprint arXiv160605908

[Engel et al 2017] Engel J Resnick C Roberts A Dieleman S Norouzi MEck D and Simonyan K (2017)Neural audio synthesis of musical notes with wavenet autoencodersIn Proceedings of the 34th International Conference on Machine Learning-Volume70 pages 1068ndash1077 JMLR org

[Esling et al 2018] Esling P Bitton A et al (2018)Generative timbre spaces regularizing variational auto-encoders with perceptualmetricsarXiv preprint arXiv180508501

3236

References II

[Griffin and Lim 1984] Griffin D and Lim J (1984)Signal estimation from modified short-time fourier transformIEEE Transactions on Acoustics Speech and Signal Processing 32(2)236ndash243

[Hinton and Salakhutdinov 2006] Hinton G E and Salakhutdinov R R (2006)Reducing the dimensionality of data with neural networksscience 313(5786)504ndash507

[IMAI 1979] IMAI S (1979)Spectral envelope extraction by improved cepstrumIEICE 62217ndash228

[Kingma and Ba 2014] Kingma D P and Ba J (2014)Adam A method for stochastic optimizationarXiv preprint arXiv14126980

[Kingma and Welling 2013] Kingma D P and Welling M (2013)Auto-encoding variational bayesarXiv preprint arXiv13126114

[Oord et al 2016] Oord A v d Dieleman S Zen H Simonyan K Vinyals OGraves A Kalchbrenner N Senior A and Kavukcuoglu K (2016)Wavenet A generative model for raw audioarXiv preprint arXiv160903499

3336

References III

[Roche et al 2018] Roche F Hueber T Limier S and Girin L (2018)Autoencoders for music sound modeling a comparison of linear shallow deeprecurrent and variational modelsarXiv preprint arXiv180604096

[Roebel and Rodet 2005] Roebel A and Rodet X (2005)Efficient Spectral Envelope Estimation and its application to pitch shifting andenvelope preservationIn International Conference on Digital Audio Effects pages 30ndash35 Madrid Spaincote interne IRCAM Roebel05b

[Romani Picas et al 2015] Romani Picas O Parra Rodriguez H Dabiri DTokuda H Hariya W Oishi K and Serra X (2015)A real-time system for measuring sound goodness in instrumental soundsIn Audio Engineering Society Convention 138 Audio Engineering Society

[Sarroff and Casey 2014] Sarroff A M and Casey M A (2014)Musical audio synthesis using autoencoding neural netsIn ICMC

[Serra et al 1997] Serra X et al (1997)Musical sound modeling with sinusoids plus noiseMusical signal processing pages 91ndash122

3436

References IV

[Slawson 1981] Slawson W (1981)The color of sound a theoretical study in musical timbreMusic Theory Spectrum 3132ndash141

[Sohn et al 2015] Sohn K Lee H and Yan X (2015)Learning structured output representation using deep conditional generative models

In Advances in neural information processing systems pages 3483ndash3491

[Wyse 2018] Wyse L (2018)Real-valued parametric conditioning of an rnn for interactive sound synthesisarXiv preprint arXiv180510808

3536

Audio examples description I

1 Input MIDI 63 to Spectral Model

2 Spectral Model Reconstruction(trained on MIDI63)

3 Spectral Model Reconstruction(not trained on MIDI63)

4 Input MIDI 60 note to Parametric Model

5 Parametric Reconstruction of input note

6 Input MIDI 63 Note

7 Parametric AE reconstruction of input

8 Parametric CVAE reconstruction of input

9 Input MIDI 65 note(endpoint trained model)

10 Parametric AE reconstruction of input(endpoint trained model)

11 Parametric CVAE reconstruction of input(endpoint trained model)

12 CVAE Generated MIDI 65 Violin note

13 Similar MIDI 65 Violin note from dataset

3636

Audio examples description II

14 CVAE Reconstruction of the MIDI 65 violin note

15 CVAE Generated MIDI 65 Violin note with vibrato

Notes

236

Audio Synthesis

236

Audio Synthesis

236

Audio Synthesis

236

Audio Synthesis

1 Timbre

2 Pitch

3 Loudness

236

Audio Synthesis

236

Audio Synthesis

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

236

Audio Synthesis

236

Audio Synthesis

236

Audio Synthesis

1 Timbre

2 Pitch

3 Loudness

236

Audio Synthesis

236

Audio Synthesis

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

236

Audio Synthesis

236

Audio Synthesis

1 Timbre

2 Pitch

3 Loudness

236

Audio Synthesis

236

Audio Synthesis

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

236

Audio Synthesis

1 Timbre

2 Pitch

3 Loudness

236

Audio Synthesis

236

Audio Synthesis

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

236

Audio Synthesis

236

Audio Synthesis

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

236

Audio Synthesis

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 8: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

336

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 9: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 10: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 11: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 12: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 13: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 14: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 15: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 16: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

436

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 17: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 18: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 19: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 20: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 21: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 22: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 23: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

536

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 24: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

636

Why Parametric

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 25: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

636

Why Parametric

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 26: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 27: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

736

Dataset

0 5 10 15 20 25 30 35

606162636465666768697071

282626262626

343030

262626

Instances

MID

I

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 28: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

836

Dataset

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 29: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 30: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 31: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

936

Sustain

FFT

En

cod

er

Z

Dec

od

er

AE

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 32: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1036

1728

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1136

Parametric Model

Sustain

FFTHpR

f0Magnitudes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1236

Parametric Model

f0Magnitudes

TAE

CCs

f0

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1336

Parametric Model

3216

2256

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1436

Parametric Model

0 1 2 3 4 5minus80

minus60

minus40

minus20

606571

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

1536

Parametric Model

Kcc leFs2f0

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 42: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 43: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 44: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1636

Generative Models

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 45: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1636

Generative Models

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 46: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1636

Generative Models

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 47: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1636

Generative Models

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 48: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 49: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 50: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 51: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1736

Generative Models

Why VAE over AE

Why CVAE over VAE

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 52: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1836

CCs

f0

En

cod

er

Z

Dec

od

er

CVAE

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 53: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 54: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1936

L prop MSE + βKLD

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 55: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 56: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

1936

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

β = 001β = 01β = 1

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 57: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 58: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 59: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2036

0 10 20 30 40 50 60 70

2

4

6

8middot10minus5

MSE

CVAEAE

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 60: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2136

dimensionality

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 61: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2236

Experiments

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 62: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2236

Experiments

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 63: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2236

Experiments

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2336

Reconstruction

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2336

Reconstruction

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2336

Reconstruction

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2436

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAE

1704

1536

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2536

Reconstruction

f0

CCscAE

fprime

0

CCsprime

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

2636

Reconstruction

60 62 64 66 68 70 72

1

2

3

4

5middot10minus2

Pitch

MSE

AECVAEcAE

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 71: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2736

Generation

Z

Dec

od

er

f0

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 72: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2836

Generation

1 12 2 13 3 14

2184

2256

2184

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 73: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 74: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 75: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

2936

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 76: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 77: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 78: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 79: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 80: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 81: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 82: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3036

improve it further

I Our contributions

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 83: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3136

References I

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 84: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3236

References II

3336

References III

3436

References IV

3536

3636

Notes

Page 85: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3336

References III

3436

References IV

3536

3636

Notes

Page 86: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3436

References IV

3536

3636

Notes

Page 87: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3536

3636

Notes

Page 88: A Variational Parametric Model for Audio Synthesiskrishnasubramani/data/ddp_stage… · analysis-synthesis based reconstruction which fails to model temporal evolution I[Engel et

3636

Notes