VIDEO-SURVEILLANCE IN CLOUD Platform and …imagelab.ing.unimore.it/imagelab/pubblicazioni/VISERAS...nments, as d rati ,+, R. Ve in of attrac ple) is show f attraction in p-windows

VIDEO-SURVEILLANCE IN CLOUDPlatform and software aaS for people detection and soft-biometry

R. Cucchiara°,*, A. Prati°,+, R. Vezzani°,*, S. Calderara°,*, C. Grana°,* °SOFTECH-ICT, *Università di Modena e Reggio Emilia, +Università IUAV di Venezia

Abstract: In this paper we will describe the recent experiences at ImageLab about architectural and algorithmic studies, mainly devoted to people surveillance. The research, prototypes and results are part of three different projects: (1) BESAFE, with NATO in the Science for Peace programme with Hebrew University of Jerusalem (Israel); (2) THIS, a CIP EU project with both universities and SMEs, such as Bridge129 from Italy; (3) VISERAS, a regional project funded by LEPIDA spa in collaboration, among others, with IBM Italia.

Key words: Video surveillance as a service; people detection; soft biometry; cloud computing

1. INTRODUCTION

Video-surveillance is an important application field of computer engineering, involving multidisciplinary studies, starting from sensors, acquisition and storage systems, display interfaces and networks to algorithm and software development.

Although this discipline is relatively old in ICT, having the first video-surveillance installations in the middle of ‘60, only the hardware architecture has shown a monotonic growth from research to the market. In 1969 the first system was installed at Municipality Building in NY, while in 1993 the first digital one was installed at World trade Center in NY too, after the arrival in the market of the first DVR in 1985 to create Digital CCTV systems. In the new century the panorama transformed a single camera system to a more or less large networks of camera systems, from simple platforms for building automation security to very large implementations, such as the ones of

2 R. Cucchiara ,*, A. Prati ,+, R. Vezzani ,*, S. Calderara ,*, C.Grana ,*

Chicago Virtual Shield of 2006 from IBM (initially with 3000 connected cameras in a single network) to the 2 million camera system of All-Seeing-Eye in Shenzen China, started in 2009 up to the new IoT (Internet of Things) video-surveillance project in 2010 including millions of cameras and other RFID, infrared, smoke detector sensors for Chongqing Municipality.

Conversely, software components are still not stable, and from an initial use of video processing for coding and data transfer only in the middle of ’80, ten years ago the commercial systems started to include some simple software modules of motion detection, and only in these last few years the term “video analytics” has been commercially adopted: generically speaking, it indicates all the software tools which provide automatic video processing with computer vision and pattern recognition to extract knowledge from the observed scene.

In this context, the last ten years of research successes are now on the market but the way to have well-assessed and precise tools for each scenario and for large installations is still an arduous and winding road. The results in background suppression and appearance-based tracking are still important but their limits of applicability have been well assessed, for instance with moving/changing background or moving cameras.

The target classification and the behavior analysis of targets - especially people and vehicles - have still not reached a stable solution. However, an undoable changing is visible in this new decade of the second millennium: the time from basic, theoretical research in computer vision (with machine learning and statistical pattern recognition), to the prototypal implementation, the pre-competitive transfer and the creation of final products is very shortened. This very short time-to-market is made necessary by the very large demand of final applications in all the fields of security, real-time surveillance and forensics. For these reasons, theoretical researches are more and more connected with practical implementations.

Accordingly, in this paper we will describe the recent experiences at ImageLab about architectural and algorithmic studies, mainly devoted to people surveillance. The research, prototypes and results are part of three different projects:

1) BESAFE, with NATO in the Science for Peace programme with Hebrew University of Jerusalem (Israel);

2) THIS, a CIP EU project with both universities and SMEs, such as Bridge.129 from Italy;

3) VISERAS, a regional project funded by LEPIDA spa in collaboration, among others, with IBM Italia.

VIDEO-SURVEILLANCE IN CLOUD 3

The projects have been cited from the most theoretical to the most practical one, but their studies and results are very strictly connected. In this work we will present the new trends in basic and industrial research for both hardware (here used with the sense of architecture and not of the single device) and software (that here means algorithms for knowledge extraction) layers.

2. VIDEO-SURVEILLANCE IN CLOUD

The project VISERAS “VIdeo Surveillance in Emilia Romagna As a Service” is a good example of timely industrial research project which joins previous experiences of surveillance systems and new trends of distribute computation in a cloud. The project aimed at defining an architecture for providing not systems of video-surveillance but services applied on data remotely acquired and remotely stored. The availability of a cloud architecture makes affordable new solutions which can spread the surveillance capabilities also to that public bodies (eg. small villages) which cannot afford the installation of a complete multi-camera, even distributed but proprietary, system.

The availability in Emilia Romagna of the Lepida’s fiber-channel network (one of the most important example in Italy of solutions to overcome the digital divide) suggested the possibility to exploit the wideband network for creating remote services for security. Consequently, a design of an architecture of surveillance over the concept of cloud was very natural by including the three main components of cloud, namely Application as a service (AaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Fig. 1(a) shows a sketch which represents our view of aaS within the project: the left side of the cloud represents the three components just mentioned, whereas the right side represents the corresponding components of the video surveillance system.

The most innovative part is, probably, the middle one: while exploiting remote server capabilities as a common IaaS is now concretely adopted for many applications (mailing, document storages, etc.), the concept of PaaS extends the horizon to common services where interaction between different content providers and content users must be very strict.

In traditional distributed video surveillance platforms, all cameras data are centralized and visible by the central control center which may set some partial views only for human operators. Cameras data can be viewed in streaming and synchronized with time-stamps, depending on the network traffic. Videos are processed to extract knowledge specifically annotated (eg.

4

in MILS IB(see Fig. 1(b

In our “Federated by themselvcameras canpolice, etc.)simultaneouplatform levthe computearchitecture integrating proprietary ImageLab lifull paper vmode and gdepicts som

R. Cucch

BM architectub)). proposal, inusers” which

ves or others:n be allowed, and the sam

usly-streamedvel is requireer vision mhas been im

open-source libraries (as ibs). This ar

version, allowgave differente aspects of t

hiara ,*, A. P

ure) and stan

nstead, the h are groups : for instanced to differentme or other pd videos. Thed, made mo

modules mustmplemented

tools such the MILS tochitecture, w

wed to createt privileges athe project.

(a) Architectu

Prati ,+, R. Ve

ndard a-post

platform imof users wh

e, in a smart t public bodpublic bodieshus, a more core complex t be also syin a mixturas Zonemin

ools of IBM)which will bee multiple viand permissi

ure VISERAS i

ezzani ,*, S. C

teriori querie

mplements tich can sharecity system m

dies (city mus could be fecomplex synif some eve

ynchronized. re of systemsnder, opportu) and researce described iiews of evenons to specif

n a cloud

Calderara ,*Gran

es are availa

the concept e data producmany groups

unicipality, stderated to vi

nchronizationents detected

The model s and tools, unely modifich libraries (in details in nts in a federfic users. Fig

*, C.na ,*

able

of ced s of tate iew n at

by of by

ied, (the the

rate g. 1

VIDEO-SUR

RVEILLANC

(b) Arc

(c)

CE IN CLOUD

chitecture integr

Output of imp

D

rating vendors a

proved ZoneMi

and research too

inder system

ols

5

6

3. S

A similaservices whineeds. A comfind people people segsuppression appearance Fig. 2).

R. Cucch

Figure 1

PATIO-TE

ar architecturich can be usmmon problein the space

gmentation methods, n

independentl

hiara ,*, A. P

(d) Multip

. VISERAS: vid

EMPORA

re gives spased, combineem to every pe and across

was mostlnow the newly from their

Prati ,+, R. Ve

ple events evalu

deo surveillanc

AL PEOPL

ace to re-thined and impropeople securthe time. W

ly adopted w way of per motion is b

ezzani ,*, S. C

uation

e as a service

LE DETEC

nk surveillanoved accordinrity analysis a

While in the pconsiderin

eople detectibecoming pop

Calderara ,*Gran

CTION

nce modulesng with specapplication ispast the termng backgrouion by learnpular [1-5] (

*, C.na ,*

s as ific s to

m of und

ning (see

VIDEO-SUR

At the saand similaritare exploredwithin the Twhich explowindows-bafaster mode

RVEILLANC

Figur

ame time, traty matching d. Here, weTHIS projecoits particle ased multilaye people in im

CE IN CLOUD

(a) People seg

(b) The ap

re 2. People det

cking approais not always

e will discusct, about the

windows insyer approach mages and v

D

gmentation vs.

pproach at Imag

tection in time a

ach as the enss applicable ass some inte

use of the stead of slidpermits to be

videos [5]. S

detection

gelab

and space

semble of moand other diferesting resua new searc

ding windowsetter localize

Some results

otion predictfferent solutioults, carried ch methodolos. The partie and detect iare depicted

7

tion ons out ogy icle in a d in

8

Fig. 3, wheparticle and

(b)

People mto predict mthe image. Ftracing instetransductiveor very clutpapers [6-8]

R. Cucch

ere the concpdf–based se

(a) The presen

) Sliding-wind

Fig

must be detecmore or less For this reasead of trackie semi-supervttered enviro].

hiara ,*, A. P

cept of “basearch for peo

nce of a basin o

dows vs particle

gure 3. People l

cted in the tiaccurately thson, we havng, or more vised method

onments, as d

Prati ,+, R. Ve

sin of attracople) is show

of attraction in p

e-windows appr

localization in i

ime too, indehe motion ane obtained iin general inds, that becodepicted in F

ezzani ,*, S. C

ction” (fundawn.

people localizati

roaches in locali

images.

ependently ofnd the next pinteresting ren people folloome valuableFig. 4 and in

Calderara ,*Gran

amental to u

ion

izing people

f the possibipeople statusesults in peopowing basede also in cron our publish

*, C.na ,*

use

lity s in ople d on owd hed

VIDEO-SUR

4. SF

In the lasin a cloud psuch as peopaspects, dres

One veryservices is biometry, mbiometric cconversely interactions.

RVEILLANC

Figure 4. Tr

OFT-BIOFORENSIC

st section of platform, suffple search fosses, if possiby general frafocusing on

meaning thecues which

are often . Soft biome

CE IN CLOUD

ransductive peo

METRY FCS

the paper weficiently geneor similarity uble to compu

amework for people aspe

e possibility are effectiv

difficult to etry instead a

D

ople tracing in v

FOR SURV

e will discusseral to be adusing soft bi

ute high, and both real-tim

ect similarityby recogni

ve, precise be accepte

accepts to b

video sequences

VEILLAN

s one of the popted as a coiometric cuesfinally trajec

me and off-liy. This is thizing peopleand possibl

ed and neede less accura

s.

NCE AND

possible servommon servis (mainly visctory type). ine surveillanhe basis of se without hly unique, d collaboratate (but alw

9

vice ice, sual

nce soft

hard but tive ays


with high recall), uses very simple cues which can be extracted with computer vision for instance by video cameras, and often needs of user interaction and feedbacks to improve precision.

We worked on 3D re-identification solutions matching 2D and 3D models in real-time (Fig. 5(a)) [9-12], and provided new interactive interfaces for search for aspect similarities, comparing and exploiting different visual features and different classification methods with relevance feedback (Fig. 5(b)). Finally, we exploited similar concepts for trajectory similarity with spectral-graph analysis after Voronoi tessellation [13] (Fig. 5(c)). This work is a result of the NATO Project BESAFE in collaboration with the Hebrew University of Jerusalem.

5. CONCLUDING REMARKS

This paper condensed the research results achieved within ImageLab in three recent projects which share video surveillance as topic. In particular, all the proposed solutions have tackled both the algorithmic and architectural perspective, by proposing the use of cloud-based architectures to build “video surveillance as a service” platforms. These preliminary results demonstrate that video surveillance systems can easily be exported to this type of architectures.

VIDEO-SUR

RVEILLANC

Figure 5

CE IN CLOUD

(a) 3D

(b) Search

(c) Search b5. Soft-biometry

D

D re-identificatio

by aspect simil

by trajectory simy services for pe

on

larities

milarity eople analysis

11


REFERENCES

[1] O. Tuzel, F. Porikli, and P. Meer, “Pedestrian detection via classification on

Riemannian manifolds,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 10, pp. 1713–1727, Oct. 2008

[2] Q. Zhu, M.-C. Yeh, K.-T. Cheng, and S. Avidan, “Fast human detection using a cascade of histograms of oriented gradients,” in Proceedings of Computer Vision and Pattern Recognition (CVPR), 2006, vol. 2, pp. 1491 – 1498.

[3] S. Paisitkriangkrai, C. Shen, and J. Zhang, “Fast pedestrian detection using a cascade of boosted covariance features,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 8, pp. 1140 –1151, Aug. 2008.

[4] P. Dollar, S. Belongie, and P. Perona, “The fastest pedestrian detector in the west,” in Proceedings of British Machine Vision Conference (BMVC), 2010, pp. 1–11

[5] G. Gualdi, A. Prati, R. Cucchiara, “Multi-Stage Particle Windows for Fast and Accurate Object Detection”, in IEEE Transactions on Pattern Analysis and Machine Intelligence to appear, 2012

[6] D. Coppi, S. Calderara, R. Cucchiara, “Appearance tracking by transduction in surveillance scenarios”, in Proceedings of the 8th IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), 2011

[7] D. Coppi, S. Calderara, R. Cucchiara, “Iterative active querying for surveillance data retrieval in crime detection and forensics”, in Proceedings of the 4th IET International Conference on Imaging for Crime Detection and Prevention (ICDP), 2011

[8] D. Coppi, S. Calderara, R. Cucchiara, “People appearance tracing in video by spectral graph transduction”, in Proceedings of the 2nd IEEE Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams (ARTEMIS), 2011

[9] D. Baltieri, R. Vezzani, R. Cucchiara, "3DPes: 3D People Dataset for Surveillance and Forensics" in Proceedings of the 1st International ACM Workshop on Multimedia access to 3D Human Objects, Scottsdale, Arizona, USA, pp. 59-64, Nov 28 - Dec 1, 2011

[10] D. Baltieri, A. Utasi, R. Vezzani, B. Csaba, T. Sziranyi, R. Cucchiara, "Multi-View People Surveillance using 3D information" in Proceedings of the Eleventh International Workshop on Visual Surveillance 2011, Barcelona, Spain, pp. 1817-1824, Nov. 13, 2011

[11] D. Baltieri, R. Vezzani, R. Cucchiara, "SARC3D: a new 3D body model for People Tracking and Re-identification" in Proceedings of the 16th International Conference on Image Analysis and Processing, LNCS 6978, Ravenna, Italy, pp. 197-206, Sept. 14-16, 2011

[12] D. Baltieri, R. Vezzani, R. Cucchiara, "3D Body Model Construction and Matching for Real Time People Re-Identification" in Proceedings of Eurographics Italian Chapter Conference 2010 (EG-IT 2010), Genova, Italy, Nov. 18-19, 2010

[13] S. Calderara, U. Heinemann, A. Prati, R. Cucchiara, N. Tishby “Detecting Anomalies in People Trajectories using Spectral Graph Analysis”, in Computer Vision and Image Understanding, vol. 115, no. 8, pp. 1099-1111, 2011

Documents

VIDEO-SURVEILLANCE IN CLOUD Platform and …imagelab.ing.unimore.it/imagelab/pubblicazioni/VISERAS...nments, as d rati ,+, R. Ve in of attrac ple) is show f attraction in p-windows