IEEE 2017 Conference on Computer Vision and Pattern Recognition Surveillance Video Parsing with Single Frame Supervision Si Liu 1 , Changhu Wang 2 , Ruihe Qian 1 , Han Yu 1 , Renda Bao 1 , Yao Sun 1 1 Institute of Information Engineering, Chinese Academy of Sciences,Beijing 100093,China 2 Toutiao AI Lab Introduction : We develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage to parse one Surveillance video. Function • Segment the video frames into several labels, e.g., face, pants, left- leg. Train & Test • During training, only a single frame per video is labeled(red box) • During testing, a parsing window is slided along the video. The parsing result(orange box) is determined by itself, the long-range frame(green box ) and the short-range frame(blue box). Network Approach : Contribution • Single Frame Supervision : the first attempt to segment the human parts in the surveillance video by labeling single frame per training video. • Good performance: the feature learning, pixelwise classification, correspondence mining and the temporal fusion are updated in a unified optimization process and collaboratively contribute to the parsing results. • Applicable : the proposed SVP framework is end-to-end and thus very applicable for real usage. Frame Parsing Sub-network • Video V = 1 ,…, . The single labeled frame is . The frame parsing sub-network produces the rough label maps for the triplet, donated as ෨ − , ෨ − , ෨ . Optical Flow Estimation Sub-network • Estimate the dense cor-respondence between adjacent frames on the fly. Temporal Fusion Sub-network • Apply the obtained optical flow ,− and ,− to ෨ − ෨ − , producing − − . To alleviate the influence of imperfect optical flow, the pixel-wise flow confidence −, and −, are estimated. Experiment: home page: http://liusi - group.com Figture4. Step by step illustration of SVP. 1∼4 columns: the long-range frame, its the parsing result, the warped parsing result and the confidence map. 5∼8 columns: the short-range frame, its parsing result, the warped parsing result and the confidence map. 9∼12 columns: test image, the rough parsing result, refined parsing result and ground truth parsing result . project page: http://liusi - group.com/projects/SVP Figture2. The proposed single frame supervised video paring (SVP) network. The network is trained end-to-end. Figture3. 1,2 columns: the long-range frame, its the parsing result. 3,4 columns: the short-range frame, its the parsing result. 1,2 columns: the test frame, its the parsing result. Figture5. The test image, the groundtruth label, results of the EM-Adapt and SVP are shown sequentially Figture1. The process diagram of Training and testing

Surveillance Video Parsing with Single Frame Supervision

Download PDF Report

Upload
others
View
3
Download
0

Embed Size (px)

Citation preview

Page 1: Surveillance Video Parsing with Single Frame Supervision

IEEE 2017 Conference on

Computer Vision and Pattern

Recognition

Surveillance Video Parsing with Single Frame SupervisionSi Liu1, Changhu Wang2, Ruihe Qian1, Han Yu1, Renda Bao1, Yao Sun1

1 Institute of Information Engineering, Chinese Academy of Sciences,Beijing 100093,China 2 Toutiao AI Lab

Introduction : We develop a Single frame Video Parsing (SVP) method which

requires only one labeled frame per video in training stage to

parse one Surveillance video.

Function

• Segment the video frames into several labels, e.g., face, pants, left-

leg.

Train & Test

• During training, only a single frame per video is labeled(red box)

• During testing, a parsing window is slided along the video. The parsing

result(orange box) is determined by itself, the long-range frame(green

box ) and the short-range frame(blue box).

Network

Approach :

Contribution

• Single Frame Supervision : the first attempt to segment the human parts in the

surveillance video by labeling single frame per training video.

• Good performance: the feature learning, pixelwise classification, correspondence

mining and the temporal fusion are updated in a unified optimization process and

collaboratively contribute to the parsing results.

• Applicable : the proposed SVP framework is end-to-end and thus very applicable for

real usage.

Frame Parsing Sub-network

• Video V = 𝐼1, … , 𝐼𝑁 . The single labeled frame is 𝐼𝑁. The frame parsing sub-network

produces the rough label maps for the triplet, donated as ෨𝑃𝑡−𝑙 , ෨𝑃𝑡−𝑠, ෨𝑃𝑡 .

Optical Flow Estimation Sub-network

• Estimate the dense cor-respondence between adjacent frames on the fly.

Temporal Fusion Sub-network

• Apply the obtained optical flow 𝐹𝑡,𝑡−𝑙 and 𝐹𝑡,𝑡−𝑠 to ෨𝑃𝑡−𝑙𝑎𝑛𝑑 ෨𝑃𝑡−𝑠 , producing 𝑃𝑡−𝑙𝑎𝑛𝑑 𝑃𝑡−𝑆. To alleviate the influence of imperfect optical flow, the pixel-wise flow

confidence 𝐶𝑡−𝑙,𝑡 and 𝐶𝑡−𝑠,𝑡 are estimated.

Experiment:

home page:http://liusi-group.com

Figture4. Step by step illustration of SVP. 1∼4 columns: the long-range frame, its the parsing result, the warped parsing result and the confidence map. 5∼8 columns: the short-range frame, its parsing result, the warped parsing result and the confidence map. 9∼12 columns: test image, the rough parsing result, refined parsing result and ground truth parsing result.

project page:http://liusi-group.com/projects/SVP

Figture2. The proposed single frame supervised video paring (SVP)

network. The network is trained end-to-end.

Figture3. 1,2 columns: the long-range frame, its the parsing result. 3,4 columns: the short-range frame, its the parsing result. 1,2 columns: the test frame, its the parsing result.

Figture5. The test image, the groundtruth label, results of the EM-Adapt and SVP are shown sequentially

Figture1. The process diagram of Training and testing

Learning for semantic parsing using statistical syntactic parsing techniques

Technology

Bare-Bones Dependency Parsing - Uppsala Universitystp.lingfil.uu.se/~nivre/docs/BareBones.pdf · I Parsing methods for bare-bones dependency parsing I Chart parsing ... Eisner 2000]:

Documents

Macroprudential Surveillance and Insurance Supervision Commissioner Susan Donegan November 19, 2014 Regional Training Seminar for Insurance Supervisors

Documents

Parsing III (Top-down parsing: recursive descent & LL(1) )

Documents

Bare-Bones Dependency Parsing - Uppsala Universitynivre/docs/BareBones.pdf · I Parsing methods for bare-bones dependency parsing I Chart parsing techniques ... [Kuhlmann and Satta

Documents

Weighted Parsing, Probabilistic Parsing

Documents

Top-Down Parsing - recursive descent - predictive parsing

Documents

Syntax and Parsing of Semitic Languages - Tsarfatytsarfaty.com/pdfs/semitic.pdfSyntax and Parsing of Semitic Languages 3 1.1 Parsing Systems Syntactic Analysis A parsing system is

Documents

Deep Supervision with Shape Concepts for Occlusion-Aware ...huy/lib/exe/fetch.php?media=cvpr17.pdf · Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing Chi

Documents

Parsing Context-Free Grammars, Parsing, Syntax Trees

Documents

PARSING WITH CONTEXT-FREE GRAMMARS cc437. PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding

Documents

Supervision & Surveillance in Castilla-La Mancha, Spain · Supervision & Surveillance in Castilla-La Mancha, Spain Verónica García Chiquero Milano, 01/10/2015 “GEOGRAPHICAL AREA”

Documents

Suitability, Supervision and Surveillance Tuesday, May 22 …© 2018 Financial Industry Regulatory Authority, Inc. All rights reserved. 1 Suitability, Supervision and Surveillance

Documents

Supervision et surveillance de systèmes dynamiques Basées sur des modèles à événements discrets

Documents

$Deep Supervision with Shape Concepts for Occlusion-Aware ...cli53/papers/chi_cvpr17_supp.pdfDeep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing {Supplementary$