1
1. Color dropout as augmentation 2. Cycle-consistency + Scheduled sampling 3. Restricted attention for higher resolution. Consequences: longer tracks, reduced drift > Oral session (Video Analysis): 13:00 - 13:15 Thursday 12th > Paper, Code, Pretrained model available for download. Checkout: Self-supervised Learning for Video Correspondence Flow Zihang Lai, Weidi Xie VGG, University of Oxford The objective of this paper is self-supervised learning of matching correspondences along videos, which we term correspondence flow. Learning only from unlabeled videos, we propose to train a “pointer” that reconstructs a target frame by copying pixels from a reference frame. Introduction Our correspondences could be used to propagate many entities (e.g. segmentation masks, keypoints) along a video sequence. Qualitative results on DAVIS and JMHDB What to do with correspondence? A feature extractor that produce embeddings suitable for matching correspondences. What to learn? Objective: Learning pixel correspondence in videos without annotations! frame t frame t+1 feature t+1 feature t Matching Feature extractor How to learn? We outperform existing self-supervised learning approaches by a significant margin. Results Method Supervised J&F (Mean) Optical Flow 26.0 Vondrick et al. 34.0 Wang et al. 40.7 Ours 49.5 OSVOS 60.3 Method Supervised PCK @.1 Optical Flow 49.0 Vondrick et al. 45.2 Wang et al. 57.7 Ours 58.5 ImageNet 58.4 Video segmentation (DAVIS-2017) Keypoint tracking (JHMDB dataset) Find more... Frame t (Only R) model Frame t+1 (RGB) Search region Reference frame Target frame

Self-supervised Learning for Video Correspondence Flovgg/publications/2019/Lai19/poster.pdf · Cycle-consistency + Scheduled sampling 3. Restricted attention for higher resolution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Self-supervised Learning for Video Correspondence Flovgg/publications/2019/Lai19/poster.pdf · Cycle-consistency + Scheduled sampling 3. Restricted attention for higher resolution

● 1. Color dropout as augmentation

● 2. Cycle-consistency + Scheduled sampling

● 3. Restricted attention for higher resolution.

Consequences: longer tracks, reduced drift

> Oral session (Video Analysis): 13:00 - 13:15 Thursday 12th> Paper, Code, Pretrained model available for download. Checkout:

Self-supervised Learning for Video Correspondence FlowZihang Lai, Weidi Xie

VGG, University of Oxford

The objective of this paper is self-supervised learning of matching correspondences along videos, which we term correspondence flow. Learning only from unlabeled videos, we propose to train a “pointer” that reconstructs a target frame by copying pixels from a reference frame.

Introduction

Our correspondences could be used to propagate many entities (e.g. segmentation masks, keypoints) along a video sequence.

Qualitative results on DAVIS and JMHDB

What to do with correspondence?

Frame t(Only R) model

Frame t+1(RGB)

Frame t(RGB) model

Frame t+1(RGB)

Trai

ning

Test

ing

A feature extractor that produce embeddings suitable for matching correspondences.

What to learn?

Objective: Learning pixel correspondence in videos without annotations!

framet

framet+1 featuret+1

featuretMatching Feature

extractor

How to learn?

We outperform existing self-supervised learning approaches by a significant margin.

Results

Method Supervised J & F (Mean)

Optical Flow ✗ 26.0

Vondrick et al. ✗ 34.0

CycleTime ✗ 40.7

Ours ✗ 49.5

SiamMask ✓ 53.1

OSVOS ✓ 60.3

Method Supervised J&F (Mean)

Optical Flow ✗ 26.0

Vondrick et al. ✗ 34.0

Wang et al. ✗ 40.7

Ours ✗ 49.5

OSVOS ✓ 60.3

Method Supervised PCK @.1

Optical Flow ✗ 49.0

Vondrick et al. ✗ 45.2

Wang et al. ✗ 57.7

Ours ✗ 58.5

ImageNet ✓ 58.4

Video segmentation(DAVIS-2017)

Keypoint tracking(JHMDB dataset)

Find more...

Frame t(Only R) model

Frame t+1(RGB)

Search region

Reference frame Target frame