2
The measurement rate of the event-based camera is on the order of a microsecond, its independent pixel architecture provides very high dynamic range, and the bandwidth of an event stream is much lower than a standard video stream. These superior properties of event-based cameras over the potential to overcome some limitation of conventional cameras Facts : 1673 registered persons 415 accepted papers (26,6%) 342 posters (21,9%) 45 spotlights (2,9%) 28 orals (1,8%) Top-down Neural Attention by Excitation Backprop, by Jianming Zhang, Zhe Lin, Jonathan Brandt Xiaohui Shen, and Stan Sclaroff A new backpropagation scheme, Excitation Backprop, based on a probabilistic Winner- Take-All formulation is proposed to model the top-down neural attention for CNN classifiers. Authors also presents a contrastive top-down attention, which captures the differential effect between a pair of contrastive top-down signals. This contrastive top- down attention can significantly improve the discriminativeness of the generated attention maps. European Conference On Computer Vision ECCV ’16 Trends 3DVTech Trends Report Amsterdam Octobre 2016 Short version Best papers CNN reduction Human pose SFM-MVS Other “Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera” by Hanme Kim, Stefan Leutenegger and Andrew Davison” from Imperial College London; This paper presents a method which can perform real-time 3D reconstruction from a single hand-held event camera with no additional sensing, and works in unstructured scenes of which it has no prior knowledge. It is based on three decoupled probabilistic filters, each estimating 6-DoF camera motion, scene logarithmic (log) intensity gradient and scene inverse depth relative to a keyframe. They build a real-time graph of these to track and model over an extended local workspace. Downloadable Codes & links • ECCV ’16 conference: http://eccv2016.org/ • SSD: Single Shot MultiBox Detector: SSD is a unified framework for object detection with a single network. You can use the code to train/evaluate a network for object detection task. https://github.com/weiliu89/caffe/tree/ssd Download this report at http://www.3dvtech.com/ Longer version or specific review of the conference can be asked, please contact 3DVTech. Best Student Award Emma Alexander for Focal Flow: Measuring Distance and Velocity with Defocus and Differential Motion She presents a new system for perceiving depth: the focal Flow sensor. It is an unactuated, monocular camera that simultaneously exploits defocus and differential motion to measure a depth map and a 3D scene velocity field. It does so using an optical- flow-like, per-pixel linear constraint that relates image derivatives to depth and velocity. Best paper awards. 3DVTech trends report ECCV ’16 Short version 3DVTech - CréACannes 11 avenue Maurice Chevalier 06150 Cannes La Bocca Téléphone : 06 21 13 81 28 Email : [email protected] www.3DVTech.com See: https://youtu.be/yHLyhdMSw7w Pixelwise View Selection for Unstructured Multi-View Stereo: COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline for robust and dense modeling from unstructured images. https://colmap.github.io

3DVTech ECCV ’16 Trends · 2016-10-17 · 3DVTech ECCV ’16 Trends Trends Report Amsterdam Octobre 2016 Short version Best papers ... This paper presents a method which can perform

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3DVTech ECCV ’16 Trends · 2016-10-17 · 3DVTech ECCV ’16 Trends Trends Report Amsterdam Octobre 2016 Short version Best papers ... This paper presents a method which can perform

The measurement rate of the event-based camera is on the order of a microsecond, its independent pixel architecture provides very high dynamic range, and the bandwidth of an event stream is much lower than a standard video stream. These superior properties of event-based cameras over the potential to overcome some limitation of conventional cameras

Facts : 1673 registered persons 415 accepted papers (26,6%) 342 posters (21,9%) 45 spotlights (2,9%) 28 orals (1,8%) “Top-down Neural Attention by Excitation Backprop”,

by Jianming Zhang, Zhe Lin, Jonathan Brandt Xiaohui Shen, and Stan Sclaroff

A new backpropagation scheme, Excitation Backprop, based on a probabilistic Winner-Take-All formulation is proposed to model the top-down neural attention for CNN classifiers. Authors also presents a contrastive top-down attention, which captures the differential effect between a pair of contrastive top-down signals. This contrastive top-down attention can significantly improve the discriminativeness of the generated attention maps.

European Conference

On Computer Vision

ECCV ’16 Trends 3DV T e c h

Tr e n d s Re p or t

Am s t er d am Oc t o br e

2 01 6

Short version

Best papers

CNN reduction

Human pose

SFM-MVS

Other

“Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera” by Hanme Kim, Stefan Leutenegger and Andrew Davison” from Imperial College London; This paper presents a method which can perform real-time 3D reconstruction from a single hand-held event camera with no additional sensing, and works in unstructured scenes of which it has no prior knowledge. It is based on three decoupled probabilistic filters, each estimating 6-DoF camera motion, scene logarithmic (log) intensity gradient and scene inverse depth relative to a keyframe. They build a real-time graph of these to track and model over an extended local workspace.

Downloadable Codes & links

• ECCV ’16 conference: http://eccv2016.org/

• SSD: Single Shot MultiBox Detector: SSD is a unified framework for object detection with a single

network. You can use the code to train/evaluate a network for object

detection task.

https://github.com/weiliu89/caffe/tree/ssd

Download this report at http://www.3dvtech.com/

Longer version or specific review of the conference can be asked, please contact 3DVTech.

Best Student

Award

Emma Alexander for Focal

Flow: Measuring Distance

and Velocity with Defocus

and Differential Motion

She presents a new system

for perceiving depth: the focal

Flow sensor. It is an

unactuated, monocular

camera that simultaneously

exploits defocus and

differential motion to

measure a depth map and a

3D scene velocity field.

It does so using an optical-

flow-like, per-pixel linear

constraint that relates image

derivatives to depth and

velocity.

Best paper awards.

3DVTech trends report – ECCV ’16 – Short version

3DVTech - CréACannes

11 avenue Maurice Chevalier 06150 Cannes La Bocca

Téléphone : 06 21 13 81 28

Email : [email protected]

www.3DVTech.com

See: https://youtu.be/yHLyhdMSw7w

• Pixelwise View Selection for Unstructured Multi-View Stereo:

COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View

Stereo (MVS) pipeline for robust and dense modeling from unstructured images.

https://colmap.github.io

Page 2: 3DVTech ECCV ’16 Trends · 2016-10-17 · 3DVTech ECCV ’16 Trends Trends Report Amsterdam Octobre 2016 Short version Best papers ... This paper presents a method which can perform

Binary-Weight-Networks and XNOR-Networks Human pose and 3D mesh computed by SMPL method

New challenges

and datasets

Having dataset for learning, for

evaluating or for comparing

results is an important concern

that was discussed all along the

conference. Here a list of

proposed datasets:

• Tracking

http://www.votchallenge.net/

https://motchallenge.net/

• Learning:

Common Object in Context

http://mscoco.org/ has now

detection and keypoint

challenges

New http://image-net.org/ 2016

challenges. Authors received

the PAMI Everingham Prize

• City/Road:

https://www.cityscapes-

dataset.com/

GTA-based game dataset :

http://download.visinf.tu-

darmstadt.de/data/from_games/

Toronto announced for 2017 a

new large dataset covering

aerial and street views and

maps of full Toronto area,

Features:

https://github.com/featw/hpatch

es for local descriptor matching

https://archive.ics.uci.edu/ml/dat

asets/SIFT10M

Other 2016 Datasets:

Chalearn 2016:

http://gesture.chalearn.org/2016

-looking-at-people-eccv-

workshop-challenge

Movie description:

https://sites.google.com/site/des

cribingmovies/

CNN reduction

Using CNN currently requires

large memory and processing

power which are not always

available in devices such as

smartphones. Some papers

proposed solutions to tackle

those limitations.

M. Rastegari proposes two

approximations for standard

CNN: the Binary-Weight-

Networks that approximates

filters with binary values, and

the XNOR-Networks that

approximates convolutions by

binary operations. This

results in 58 times faster

convolutional operations and

32 times memory savings.

Gao Huang propose in Deep

Networks with Stochastic

Depth, a training procedure

that enables the seemingly

contradictory setup to train

short networks and use deep

networks at test time.

The paper presents a novel Deep Network architecture that implements the

full feature point handling pipeline, that is, detection, orientation estimation,

and feature description. While previous works have successfully tackled

each one of these problems individually, thay show how to learn to do all

three in a unified manner while preserving end-to-end differentiability.

Full learning and learnt

descriptor outperforms the

state-of-the-art of feature

detection and description.

Code is available:

https://github.com/cvlab-epfl/LIFT

“LIFT: Learned Invariant Feature Transform”, by Kwang Yi, Eduard Trulls, Vincent Lepetit, and Pascal Fua

It is achieved by randomly

skip layers entirely during the

training.

Authors could reduce training

time by 25% while keeping

same detection accuracy.

In Less is More: Towards

Compact CNNs, Hao Zhou

propose to remove some

neurons during training using

sparse constraints.

Experimental results on four

well-known CNN

architectures demonstrate a

significant reduction in the

number of neurons and the

memory footprint of the

testing phase without

affecting the classification

accuracy (see below table).

Human Poses

A large number of papers

were focusing on recovering

the pose, the skeleton and

even the shape of humans

from a single image. Next are

some examples.

In Keep it SMPL: Automatic

Estimation of 3D Human

Pose and Shape from a

Single Image, Federica Bogo

proposes a two steps method

to automatically estimate the

3D pose of the human body

as well as its 3D shape. In

the first step 2D joints

locations are detected onto

the body using a CNN-based

method called DeepCut.

Neuron reduction by compacting CNNs

Then a gender specific top

down method called SMPL is

used to fit a 3D shape directly

onto the 2D joints. The

resulting model can be

immediately posed and

animated.

Code is available at:

http://smplify.is.tue.mpg.de/

In DeeperCut: A Deeper,

Stronger, and Faster Multi-

person Pose Estimation

Model, E. Insafutdinov is

augmenting the DeepCut

framework by improving the

2D joints location estimation

and by adding a step which

selects a body model from a

list of proposals body part

configurations.

They introduce novel image-

conditioned pairwise terms

between body parts that allow

to significantly push the

performance in the challenging

case of multi-people pose

estimation, and dramatically

reduce the run-time.

SFM and MVS

Pixelwise View Selection for

Unstructured Multi-View Stereo

by Johannes. Schönberger.

This paper presents a Multi-

View Stereo system for robust

and efficient dense modeling

from unstructured image

collections.

https://colmap.github.io/

Other papers

The Fast Bilateral Solver, J.

Barron. It is a generalization

of bilateral filter (Honorable

Mention paper).

https://github.com/poolio/bilateral_solver

Focal flow: Measuring depth

and velocity from defocus and

differential motion, E.

Alexander. A new depth

sensing technology

Jointly estimated poses estimated by DeeperCut

Workshop Geometry

meets Deep learning

Recovering 3D geometry of the

world from 2D and 3D visual data

is a central task in computer

vision. The traditional approaches

for geometric vision problems are

mostly based on handcrafted

geometric representations and

image features.

The goal of this workshop was

to encourage the interplay

between 3D vision and deep

learning by presented invited

talks from 11 experts in both

domains:

Best paper has been attributed

to Learning Covariant Feature

Detectors by Karel Lenc.

Detection is different from

description, Karel presented a

way to improve description by

learning to extract viewpoint

invariant features from images.

Code is available:

https://github.com/lenck/ddet

ImageNet and

COCO Visual

Recognition

Challenges Joint

Workshop

This challenge is the main

competition of object

classification and detection. The

workshop presented last

methods and results on a new

dataset.

Best results in object detection

from image and video were

achieved by Chinese university

of Hongkong :

http://www.ee.cuhk.edu.hk/~wlo

uyang/projects/GBD/index.html

Scene Classification Challenge

has been won by HIKVIsion.