19
Video Prediction via Example Guidance ICML 2020 Poster Presented by Yueyu Hu STRUCT Paper Reading 1 2020/12/14

Video Prediction via Example Guidance

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Video Prediction via Example Guidance

Video Prediction via Example Guidance

ICML 2020 Poster

Presented by Yueyu Hu

STRUCT Paper Reading

12020/12/14

Page 2: Video Prediction via Example Guidance

Video Prediction

• Pioneer work, moving MNIST• Unsupervised Learning of Video Representations using LSTMs ICML'15

2020/12/14 7

Page 3: Video Prediction via Example Guidance

VAE Powered Methods• Stochastic Video Generation with a Learned Prior, ICML'18

2020/12/14 8

Page 4: Video Prediction via Example Guidance

Problems

• Prior Gaussian distribution?• Insufficient to cover future possibilities

• Multi-modal motion pattern• E.g. Moving MNIST: up or down?

• Sampling efficiency• How many samples are required to achieve

accurate prediction?

2020/12/14 9

Page 5: Video Prediction via Example Guidance

Existing solutions

• External information• Bounding boxes / Skeleton (pose)• Compositional Video Prediction, ICCV'19 (CMU/FAIR)

2020/12/14 10

Page 6: Video Prediction via Example Guidance

Insight & Claims

• “In contrast to these works above, we are motivated by one insight that prediction is based on similarity between the current situation and the past experiences.”

• Optimization, explicit distribution modeling• GAN, plausible predicted samples• Human skeleton topology, preserved• Real-world model

2020/12/14 11

Page 7: Video Prediction via Example Guidance

Step 1: Disentangle

• Adopts existing methods:• Stochastic video generation with a learned prior, ICML’18

• Disentangle model used in this work• Unsupervised Keypoint Learning for Guiding Class-

Conditional Video Prediction, NeurIPS’19• Pretrained models used as pose extractor in this work

2020/12/14 12

Page 8: Video Prediction via Example Guidance

Step 2: Retrieval

• Sequence X, motion feature F• , the whole training set• Nearest neighbor search, top K features

2020/12/14 13

Page 9: Video Prediction via Example Guidance

What does it find?

• They have common pasts• They are non-Gaussian

2020/12/14 14

Page 10: Video Prediction via Example Guidance

Next: Prediction

• Existing approaches

• z: latent representation• How to get z? Usually with a neural network

2020/12/14 15

(0,1)~zq

1. . )(z te g fz φ −=

The problem

Page 11: Video Prediction via Example Guidance

Approach

• Replace with a new one

• Get prior from samples• Make the predicted close to the prior• Issues: lack diversity of ; distribution of z

infeasible to represent the samples

2020/12/14 16

(0,1)~zq

( || )KLD p q

,t tµ σ

ˆ ˆ,t tµ σ

Page 12: Video Prediction via Example Guidance

Methods

• Calculated mean and variance• Sample z from this dist. • Predict multiple instances

2020/12/14 17

Best prediction j

Page 13: Video Prediction via Example Guidance

Experiment

• Datasets:• Moving MNIST• BAIR Robot Push• PennAction (SVG / Keypoint settings)

2020/12/14 18

Page 14: Video Prediction via Example Guidance

Moving MNIST

• Deterministic and Stochastic settings• D: Feed in motion information• S: Select best from 20 samples

2020/12/14 19

Page 15: Video Prediction via Example Guidance

Robot Arm

• Better trajectory, saturating at 100 samples• K saturates at 5

2020/12/14 20

Page 16: Video Prediction via Example Guidance

Penn Action• Class label and first frame are fed as inputs• Action Recognition and Fr´echet Video Distance

2020/12/14 21

Page 17: Video Prediction via Example Guidance

Cross Class Action PredictionFacilitated by the guidance of examples, our model

produces a visually natural tennis serve sequence,

which clearly demonstrates the generalization

capability of proposed model. We argue that the

majority of previous works are (implicitly) forced

to memorize motion categories in the training set.

In contrast to the paradigm, our work is relieved from

such burden because the retrieved examples contain

the category information in assistance of prediction.

We thus focus only on intra-class diversity. If given

examples with unseen motion categories, our model is

still able to give reasonable predictions, thanks to the

example guidance.

2020/12/14 22

Page 18: Video Prediction via Example Guidance

Conclusion

• Sampling methods might be a good idea• Video prediction techniques are still too far

away from being utilized in practical video coding

2020/12/14 23

Page 19: Video Prediction via Example Guidance

Thanks

242020/12/14