1
Virtual Depth Model 12 24 36 48 Feedback 67.94 70.57 71.09 71.12 Feedback Disconnected 36.23 62.14 67.99 71.34 (Recurrent Feedforward) T/4 4T/4 3T/4 2T/4 time road bike early prediction unroll object vehicle wheeled vehicle vessel road bike tandem bike bike car time curriculum learning taxonomic prediction objective func. 1 objective func. 2 objective func. 3 objective func. 4 Amir R. Zamir* Te-Lin Wu* Lin Sun William B. Shen Bertram E. Shi Jitendra Malik Silvio Savarese From Feedforward to Feedback A general-purpose feedback based learning model Can replace feedforward models as a blackbox Can be instantiated using existing recurrent models Core Advantages: Code, Results: http://feedbacknet.stanford.edu mountain bike Input layer 1 layer 2 layer 3 layer 4 layer N loss output Input output mountain bike Feedback Networks Core Advantages: (see below ) Early predictions at query time Taxonomy compliance in output New basis for Curriculum Learning Internal Representation: coarse-to-fine. Meaningfully and notably different from feedforward representation. Feedback Model Details Feedback Definition: when a system’s output is routed back into input as part of an iterative cause-and-effect process [13]. Two Requirements: (1) iterativeness, (2) rerouting a notion of posterior (output) back into the system in each iteration. Can instantiate feedback use existing RNNs (ConvLSTM[66]). Feedback Modules and Their Lengths Episodic Curriculum Learning Enabled by feedback (unlike feedforward). Any hierarchical output space or taxonomy can be used as a curriculum strategy. We use annealed loss function at each iteration. The Internal Representation: Image/Pre-Conv Conv BatchNorm ReLU Conv BatchNorm Conv BatchNorm ReLU Conv BatchNorm Conv BatchNorm ReLU Conv BatchNorm FC Loss Pooling Image/Pre-Conv Conv BatchNorm Conv BatchNorm Pooling Loss Conv BatchNorm FC Feedback Connection: Backprop for output loss at each time step Image/Pre-Conv Conv BatchNorm ReLU Conv BatchNorm ReLU Pooling Loss Conv BatchNorm FC Conv LSTM L Conv LSTM L 1 Conv LSTM L 2 Conv LSTM L 3 unroll Conv LSTM L t-2 Conv LSTM L t-1 Conv LSTM L t Conv LSTM Conv LSTM Conv LSTM Conv LSTM Conv LSTM Conv LSTM Conv LSTM L L 1 L 2 L 3 L t-2 L t-1 L t Feedback’s representation is notably dissimilar to feedforward’s. Activation Maps (Query: ) Layer 1 Layer 12 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 Layer 9 Layer 10 Layer 11 Feedback Feedforward Snake Feedback develops a course-to-fine representation. Unlike low-abstraction to high-abstraction of feedforward. Feedback Feedforward Time T-SNE Quantitative Results Rabbit Rabbit Rabbit Hamster Rabbit Porcupine Ray Ray Rabbit Chimp Chimp Girl Bear Girl Plates Plates Camel Chimp Mouse Mouse Mouse Bee Dolphin Bee Bear Cloud Shrew Clock Bowls Bowls Mower Cloud Cloud Cloud Cloud Clock Snake Snake Lizard Chair Snake Seal Snail Cloud Snake Rocket Rocket Rocket Bottle Rocket Sea Plain Plain Rocket Kangaroo Kangaroo Lion Train Fox Lizard Snail Beetle Fox VD=32 VD=24 VD=16 VD=8 D=32 D=24 D=16 D=8 Query Feedback FeedforwardResNet 45.01 39.02 37.84 38.11 36.27 36.56 36.57 36.12 25.05 28.68 33.45 37.11 5.72 5.95 15.06 27.44 27.87 31.98 4.33 3.65 5.3 34.41 0 5 10 15 20 25 30 35 40 45 8 16 24 32 E(N) (%) Physical/Virtual Depth FB Net curriculum trained FB Net FF (ResNet w/ aux loss) FF (ResNet w/o aux loss) FF (VGG w/ aux loss) FF (VGG w/o aux loss) FB FF 57.4 66.92 68.75 69.57 63.23 67.48 68.23 68.21 33.97 36.71 40.59 53.32 59.37 64.21 1.7 1.63 1.19 5.67 12.36 36.4 69.36 37 37.05 38.05 47.33 52.06 56.86 0.9 1.01 63.27 0 10 20 30 40 50 60 70 8 12 16 20 24 28 32 Top 1 Accuracy (%) Physical/Virtual Depth FB Net curriculum trained FB Net FF (ResNet w/ aux loss) FF (ResNet w/o aux loss) FF (VGG w/ aux loss) FF (VGG w/o aux loss) FB FF Early Prediction (CIFAR100) Taxonomy Compliance Endpoint Results (Stanford Cars) Endpoint Results (CIFAR100) Model Physical Virtual Top1 Top5 Depth Depth (%) (%) Feedback Net 12 48 71.12 91.51 8 32 69.57 91.01 4 16 67.83 90.12 48 - 70.04 90.96 32 - 69.36 91.07 Feedforward 12 - 66.35 90.02 (ResNet[19]) 8 - 64.23 88.95 128* - 70.92 91.28 110* - 72.06 92.12 64* - 71.01 91.48 48* - 70.56 91.60 32* - 69.58 91.55 Feedforward 48 - 55.08 82.1 (VGG[47]) 32 - 63.56 88.41 12 - 64.65 89.26 8 - 63.91 88.90 Stack-1: 66.29% Stack-2: 67.83% Stack-All: 65.85% 69.49 68.24 71.12 68.95 69.69 67.54 67 68 69 70 71 72 (2,24) (3,16) (4,12) (6,8) (8,6) (16,3) Top1 Accuracy (%) (Iteration, Physical Depth) Curriculum Learning (CIFAR100) Feed-forward Feedback X 1 X 2 X 3 X D X 1 1 X 2 1 X 1 2 X 1 3 X 2 2 X 3 1 X n m Computation Graphs Depth & Iteration Combination Study

CVPR17 poster final v3 - Stanford Universityfeedbacknet.stanford.edu/resources/feedbacknet_poster... · 2017-08-20 · Virtual Depth Model 12 24 36 48 Feedback 67.94 70.57 71.09 71.12

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CVPR17 poster final v3 - Stanford Universityfeedbacknet.stanford.edu/resources/feedbacknet_poster... · 2017-08-20 · Virtual Depth Model 12 24 36 48 Feedback 67.94 70.57 71.09 71.12

Virtual DepthModel 12 24 36 48

Feedback 67.94 70.57 71.09 71.12Feedback Disconnected 36.23 62.14 67.99 71.34(Recurrent Feedforward)

objectvehicle

wheeled vehicleroad bike

bike

T/4 4T/43T/42T/4time

road bike

time

curri

culum

lea

rning

taxon

omic

pred

iction

early

pr

edict

ion

objective func. 1 objective func. 2 objective func. 3 objective func. 4

unroll

objectvehicle

wheeled vehicle

vessel

road bike

tandem bike

bike

car

T/4 4T/43T/42T/4time

road bike

time

curri

culum

lea

rning

taxon

omic

pred

iction

early

pr

edict

ion

objective func. 1 objective func. 2 objective func. 3 objective func. 4

unroll

Amir R. Zamir* Te-Lin Wu* Lin Sun William B. Shen Bertram E. Shi Jitendra Malik Silvio Savarese

● From Feedforward to Feedback ● A general-purpose feedback based learning model ● Can replace feedforward models as a blackbox ● Can be instantiated using existing recurrent models

● Core Advantages:

Code, Results: http://feedbacknet.stanford.edu

Feedforward model.

mountain bike

Input

layer 1

layer 2

layer 3

layer 4

layer N

loss

output

Input

output

Feedback model unfolded.

mountain bikemountain bike

Input

output

Feedback model unfolded.

mountain bikemountain bike

Feedback Networks

● Core Advantages: (see below ↓ ) ● Early predictions at query time ● Taxonomy compliance in output ● New basis for Curriculum Learning

● Internal Representation: coarse-to-fine. Meaningfully and notably different from feedforward representation.

● Feedback Model Details ● Feedback Definition: when a system’s output is routed back

into input as part of an iterative cause-and-effect process [13]. ● ⇒ Two Requirements: (1) iterativeness, (2) rerouting a notion

of posterior (output) back into the system in each iteration. ● Can instantiate feedback use existing RNNs (ConvLSTM[66]).

● Feedback Modules and Their Lengths

● Episodic Curriculum Learning ● Enabled by feedback (unlike feedforward). ● Any hierarchical output space or taxonomy can be used as a

curriculum strategy. ● We use annealed loss function at each iteration.

● The Internal Representation:

Image/Pre-Conv

ConvBatchNormReLUConv

BatchNorm

ConvBatchNormReLUConv

BatchNorm

ConvBatchNormReLUConv

BatchNorm

FC

Loss

Pooling

Image/Pre-Conv

ConvBatchNorm

ConvBatchNorm

Pooling

Loss

ConvBatchNorm

FC

Feed

back

Con

nect

ion:

Ba

ckpr

op fo

r out

put l

oss a

t eac

h tim

e ste

p

Image/Pre-Conv

ConvBatchNormReLUConv

BatchNormReLU

Pooling

Loss

ConvBatchNorm

FC

Conv LSTM

L

Conv LSTM

L1

Conv LSTM

L2

Conv LSTM

L3

unroll Conv LSTM

Lt-2

Conv LSTM

Lt-1

Conv LSTM

Lt

Conv LSTM

Conv LSTM

Conv LSTM

Conv LSTM

Conv LSTM

Conv LSTM

Conv LSTM

L L1 L2 L3 Lt-2 Lt-1 Lt

● Feedback’s representation is notably dissimilar to feedforward’s.

Activation Maps (Query: )Layer 1 Layer 12Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 Layer 9 Layer 10 Layer 11

Feed

back

Feed

forw

ard

Rec

urre

nt

Feed

forw

ard

Snake

● Feedback develops a course-to-fine representation. Unlike low-abstraction to high-abstraction of feedforward.

Feedback Feedforward

Tim

e T-

SNE

● Quantitative Results Rabbit Rabbit Rabbit Hamster Rabbit Porcupine Ray RayRabbit

Chimp Chimp Girl Bear Girl Plates Plates CamelChimp

Mouse Mouse Mouse Bee Dolphin Bee Bear CloudShrew

Clock Bowls Bowls Mower Cloud Cloud Cloud CloudClock

Snake Snake Lizard Chair Snake Seal Snail CloudSnake

Rocket Rocket Rocket Bottle Rocket Sea Plain PlainRocket

Kangaroo Kangaroo Lion Train Fox Lizard Snail BeetleFox

VD=32 VD=24 VD=16 VD=8 D=32 D=24 D=16 D=8Query Feedback Feedforward�ResNet�

57.4

66.92

68.75 69.57

63.2367.48

68.23 68.21

33.97

36.7140.59

53.3259.37

64.21

1.7 1.63 1.195.67 12.36

36.4

69.36

37

37.05 38.05

47.3352.06

56.86

0.9 1.01

63.27

0

10

20

30

40

50

60

70

8 12 16 20 24 28 32

Top

1 A

ccur

acy

(%)

Physical/Virtual Depth

Early Prediction

FB Net curriculum trainedFB NetFF (ResNet w/ aux loss)FF (ResNet w/o aux loss)FF (VGG w/ aux loss)FF (VGG w/o aux loss)

45.01 39.02 37.84 38.1136.27

36.56 36.57 36.12

25.05

28.68 33.45

37.11

5.72 5.95

15.06

27.4427.87

31.98

4.33 3.65 5.3

34.41

0

5

10

15

20

25

30

35

40

45

8 16 24 32

E(N

) (%

)

Physical/Virtual Depth

Taxonomy Based Prediction

FB Net curriculum trainedFB NetFF (ResNet w/ aux loss)FF (ResNet w/o aux loss)FF (VGG w/ aux loss)FF (VGG w/o aux loss)

Early Prediction & Taxonomic Prediction

(details in the main paper)

FB

FF

FB

FF

Feedback considerably outperforms existing methods in early prediction and taxonomic prediction

57.4

66.92

68.75 69.57

63.2367.48

68.23 68.21

33.97

36.7140.59

53.3259.37

64.21

1.7 1.63 1.195.67 12.36

36.4

69.36

37

37.05 38.05

47.3352.06

56.86

0.9 1.01

63.27

0

10

20

30

40

50

60

70

8 12 16 20 24 28 32

Top

1 A

ccur

acy

(%)

Physical/Virtual Depth

Early Prediction

FB Net curriculum trainedFB NetFF (ResNet w/ aux loss)FF (ResNet w/o aux loss)FF (VGG w/ aux loss)FF (VGG w/o aux loss)

45.01 39.02 37.84 38.1136.27

36.56 36.57 36.12

25.05

28.68 33.45

37.11

5.72 5.95

15.06

27.4427.87

31.98

4.33 3.65 5.3

34.41

0

5

10

15

20

25

30

35

40

45

8 16 24 32

E(N

) (%

)

Physical/Virtual Depth

Taxonomy Based Prediction

FB Net curriculum trainedFB NetFF (ResNet w/ aux loss)FF (ResNet w/o aux loss)FF (VGG w/ aux loss)FF (VGG w/o aux loss)

Early Prediction & Taxonomic Prediction

(details in the main paper)

FB

FF

FB

FF

Feedback considerably outperforms existing methods in early prediction and taxonomic prediction

Early Prediction (CIFAR100) Taxonomy Compliance

Endpoint Results (Stanford Cars)Endpoint Results (CIFAR100)Model Physical Virtual Top1 Top5

Depth Depth (%) (%)Feedback Net 12 48 71.12 91.51

8 32 69.57 91.014 16 67.83 90.12

48 - 70.04 90.9632 - 69.36 91.07

Feedforward 12 - 66.35 90.02(ResNet[19]) 8 - 64.23 88.95

128* - 70.92 91.28110* - 72.06 92.1264* - 71.01 91.4848* - 70.56 91.6032* - 69.58 91.55

Feedforward 48 - 55.08 82.1(VGG[47]) 32 - 63.56 88.41

12 - 64.65 89.268 - 63.91 88.90

Stack-1: 66.29%

Stack-2: 67.83%

Stack-All: 65.85%

69.49

68.24

71.12

68.95

69.69

67.5467

68

69

70

71

72

(2,24) (3,16) (4,12) (6,8) (8,6) (16,3)

Top1

Acc

urac

y (%

)

(Iteration, Physical Depth)

Depth and Iteration Combination Study

Curriculum Learning

(details in the main paper)

CIFAR100 Stanford Cars Dataset

Feedback best leverages curriculum learning

Curriculum Learning (CIFAR100)

……

Feed-forward Feedback

X1

X2

X3

XD

X11

X21X12

X13 X22 X31

Xnm

Computation Graphs

Depth & Iteration Combination Study