[IEEE 2008 IEEE International Workshop on Haptic Audio visual Environments and Games (HAVE 2008) - Ottawa, ON, Canada (2008.10.18-2008.10.19)] 2008 IEEE International Workshop on Haptic Audio visual Environments and Games - Augmented reality-based audio/visual surveillance system

  • Published on

  • View

  • Download

Embed Size (px)


  • HAVE 2008 - IEEE International Workshop onHaptic Audio Visual Environments and their ApplicationsOttawa, Canada, 18-19 October 2008

    Augmented Reality-Based Audio/Visual Surveillance System

    Mouhcine Guennoun1, Saad Khattak2, Bill Kapralos2,3, and Khalil El-Khatib2

    1Dpartement Math-Info, Facult des Sciences de Rabat,Universite Mohammed V Agdal,

    4 Avenue Ibn Battouta B.P. 1014 RP, Rabat, MarocEmail: mguennoun@gmail.com.

    2Faculty of Business and Information Technology, 3Health Education Technology Research Unit,University of Ontario Institute of Technology,

    2000 Simcoe St. North, Oshawa, Ontario, Canada. L1H 7K4.Email: saad.khattak@mycampus.uoit.ca {bill.kapralos, khalil.el-khatib}@uoit.ca.

    Abstract There are immediate needs for audio/visual surveillancesystems in a large number of areas including law enforcement, mil-itary, commercial, and personal security. A series of cameras con-nected to a local monitoring center via a wireless mesh networkcan provide instantaneous ad-hoc monitoring of several environ-ments. However, there are several issues that must be resolved par-ticularly when considering a large number of cameras monitoring alarge area. In particular, it is difficult to monitor and control sucha large number of cameras and as a result, important events maybe missed altogether. In addition, some of these surveillance sys-tems are set up in an instant ad-hoc manner, which means that anyoperator monitoring the system can easily lose the sense of direc-tion when switching between different camera views. To overcomethese limitations, in this paper we describe an ongoing project thatseeks the development of an instantaneous ad-hoc audio/visual-basedmesh network surveillance system which incorporates an augmented-reality-based three-dimensional graphical user interface to efficientlycontrol and monitor a large number of video surveillance cameras.The system also employs intelligent vision and intelligent audiotechniques to automatically detect and monitor particular events (e.g.,intruders entering the scene being monitored) in both the audio andvideo domain.

    Keywords Augmented reality, audio/visual surveillance, mesh-network, security.


    There are immediate needs for visual-based surveillanceapplications in a large number of areas including law en-forcement, military, commercial, and personal security. Infact, demands for better home-land security have increasedimmensely world-wide since the 9/11 attacks in New Yorkand Washington. As a result of the attacks, governmentagencies throughout the world, who are responsible for en-suring the safety of their populations and critical infrastruc-ture, are reviewing their policies and updating/upgrading theirsurveillance (home-land security), emergency preparednessand emergency management toolboxes.

    A priority amongst many of the emergency services, andin particular security forces, is the ability to establish instantad-hoc surveillance systems allowing for a particular environ-ment/area to be monitored even in the absence of any exist-ing infrastructure (e.g., communication channels). Buildingan instant ad-hoc surveillance system presents new challengesthat must be overcome. In particular, monitoring a large areatypically involves the deployment of a large number of sen-sors (e.g., video cameras) that must be monitored. Typically,surveillance systems include a control room which containsa large video wall and multiple screens displaying viewsfrom all of the surveillance cameras and a set of buttons andjoysticks that allow the operator to select and setup a partic-ular view (see Figure 1 for an example of a typical controlroom) [14]. The feed from each camera is directed to a sepa-rate monitor/window and a human operator is used to monitorand control these cameras. The operator is responsible for de-veloping an understanding of the real-world from the sequenceof 2D images provided by the cameras. Given the potentiallylarge amount of data/information provided by each camera,this task becomes quite overwhelming and no longer feasiblefor a human operator as the number of sensors increases todozens and perhaps even hundreds (for example, in Las Ve-gas, a typical casino contains approximately 1700 video cam-eras that are monitored by a number of security personnel [6]).Of course, the problem of overwhelming the control operatorwith too much information can be eliminated by increasingthe number of operators. Although this is a simple solution,it is not always cost effective and requires further training re-sources. In addition, as the number of cameras increases it isunlikely that the monitors displaying the video feeds will scaleaccordingly and thus, switching between video feeds on a par-ticular monitor cannot be avoided. Furthermore, when consid-ering a two-dimensional user interface to monitor and controlthe cameras, the cameras are spatially disconnected or in otherwords, located in different locations within the environment[15]. Abruptly switching between the video feed of spatiallydisconnected cameras can cause a human operator to lose their

    978-1-4244-2669-0/08/$25.00 2008 IEEE

  • Fig. 1. An example of surveillance camera control room.

    situation awareness, or the perception of the elements in theenvironment within a volume of time and space, the compre-hension of their meaning and the projection of their status inthe near future [4], [5] (or in other words, knowing what isgoing on so you can figure out what to do [1]) becomes diffi-cult if not impossible leading to confusion while following anevent of interest [15].

    Ensuring the operator has quality situation awareness andin particular, a good sense of direction of the area being sur-veyed, is critical particularly when the operator is required todirect resources to a specific location. Most operators come toacquire this awareness by monitoring the same area over andover again; operators can also promenade in the surveyed areato get an actual feeling of the area. The case is different whenoperators are called to monitor and control an instant ad-hocsurveillance system which they have not been previously ex-posed to (this is common in emergency response and militarycommand and control operations). In such situations, the op-erator can easily lose his/her situation awareness and this canlead to dire consequences [3], [7].

    Rather than increasing the number of human operators, theoperators task can be simplified and made more intuitive byemploying three-dimensional user interfaces that include vir-tual and augmented reality methods and techniques where thevirtual world is combined with the real world data. Augmentedreality can be defined as the merging of synthetic sensory in-formation into a users perception of a real environment withthe ultimate goal of combining the interactive real world withan interactive computer-generated world such that they bothappear as one seamless environment [20]. Augmented realityjoins together a number of technologies to display informa-tion to a user in a manner that instantly applies to a given taskor situation allowing for some uniquely tailored applicationsin specific circumstances [17]. Given that our world is three-dimensional, it is most intuitive that we interact with remotespaces within a three-dimensional virtual environment wherethe user is able to explore the spatial configuration of the envi-ronment and construct cognitive maps of the space [15]. Three-dimensional user interfaces have been employed in a numberof applications. It has also been demonstrated that the user is

    capable of obtaining a better understanding of terrains by nav-igating using three-dimensional interfaces [2], [21].

    Here we describe an ongoing research project that seeks thedevelopment of an instant audio/visual-based surveillance sys-tem that is intended to provide surveillance capabilities over alarge area using a large number of sensors. Starting with a floorplan of the environment to be monitored, the system constructsan instant three-dimensional virtual model of it. Within thethis virtual environment, real-world information (e.g., the livevideo feed from the surveillance cameras) is incorporated inorder to provide the operator an intuitive and meaningful man-ner of monitoring the environment and controlling the surveil-lance system. The system itself is comprised of a numberof components that operate in conjunction to perform the in-tended task. The primary components include: i) audio sys-tem (sound localization via beamforming), ii) video system,iii) three-dimensional augmented reality-based graphical userinterface, iv) networking (mesh network), and v) quality of ser-vice (QoS) management system. In this paper, emphasis isplaced on the three-dimensional augmented reality-based userinterface whose purpose is to allow a human operator to mon-itor and control a large number of cameras in a simple and in-tuitive manner without overwhelming them with information.This work builds on our previous work that examined the useof a virtual reality interface for the control of a surveillancecameras [18].

    The remainder of the paper is organized as follows. Sec-tion II provides background information regarding surveillancesystems and the methods and techniques used to control thesensors within such systems. A description of the proposedsystem is provided in Section III where details are providedregarding the system as a whole in addition to specifics re-garding the three-dimensional user interface, audio, and videosub-systems. Finally, a summary and plans for future work ispresented in Section IV.


    Given the importance of surveillance and the problems as-sociated with traditional 2D interfaces to control a large num-ber of cameras, a number of research efforts have investigatedthe incorporation of a three-dimensional interface into a videosurveillance system. Ou et al. [15] describe a video surveil-lance system that incorporates augmented reality (AR) tech-nology. A virtual environment of the area being monitored iscreated. The user monitors the scene from a particular cam-eras viewpoint as done in traditional surveillance systems.However, when the camera being monitored is switched, thescene seamlessly fades into a virtual world scene that is in-sync with the real life scene allowing the user to fly fromthe view of the virtual scene to that of the new camera and thereal-world view [15]. The virtual world is also used to pro-vide real world information to the user. Sebe et al. [19] de-scribe a method of visualization based on augmented reality

  • for a video surveillance system. Their method combines dy-namic imagery with three-dimensional models in real-time tohelp users comprehend multiple time-dependent video streamsfrom arbitrary views of the scene while accounting for situ-ation awareness. Fleck et al. [6] have developed a distributednetwork of cameras for tracking and handover of multiple peo-ple in real-time. The tracking results are embedded as livetextures within an integrated three-dimensional model of theworld that can be viewed from arbitrary viewpoints irrespectiveof the users movements. Ott et al. [14] have developed a secu-rity and surveillance system incorporating advanced mixed andvirtual reality technologies that is intended to be applied to thesurveillance of public areas such as stadiums, universities, etc.A mini-blimp carrying surveillance cameras is tele-operatedusing a virtual reality interface that employs force feedback.Their control room is comprised of a four-sided cave automaticvirtual environment (CAVE) where the video streams from themini-blimp are displayed simultaneously thus fully immersingthe operator. The operator is able to control the cameras viaa joystick and eye-tracking technology is used to select (focuson) part of the image. Kawasaki and Takai [11] describe a sim-ple and inexpensive augmented reality-based surveillance sys-tem that can assist in providing an intuitive understanding ofthe environment being monitored. A three-dimensional modelof the area to be monitored is developed in advance. From thevideo streams, moving objects are reconstructed and incorpo-rated into three-dimensional model.


    To ensure the operator builds and maintains situation aware-ness, our work focuses on the development of an instantthree-dimensional audio/visual-based surveillance system thatis intended to provide surveillance capabilities over a large areausing a large number of camera sensors. The system requiresand begins with a description of the environment being mon-itored in the form of a floor plan (currently, we are usingan XML descriptive file for the floor plan). Based on thisfloor plan, the system builds an instant three-dimensional vir-tual model of the surveyed area. An example is provided inFigure 2. In this example, a single camera has been mountedin a typical university laboratory (rectangular shaped room ofsize 30 m 40 m). Figure 2(a) illustrates the view form thecamera while Figure 2(b) illustrates the resulting 3D model.The model can be made arbitrarily complex (e.g., modelingof the desks, etc.) but is purposely kept simple to provide aquick overview of the physical make-up of the environmentbeing monitored. Furthermore, we assume that GPS locationand orientation information of objects (and cameras in partic-ular) in the environment is known. The surveillance camerasare small (lightweight), portable, and battery-powered ensur-ing they can be easily deployed by a human or a robot withinthe environment being monitored. These cameras are deployedwithin a certain distance from each other. Each camera knowsand communicates its location (e.g., position determined via

    (a) Environment being monitored.

    (c) Simple 3D model.

    Fig. 2. Sample 3D model. (a) Environment being modeled and (b) 3D modelrepresenting the modeled room with a single surveillance camera. The scene

    being monitored is purposely kept simple for illustration purposes.

    GPS) and orientation to the central management center; cam-eras also have wireless meshing capability.

    Upon construction of the model, the system then uses one ormore service discovery protocols [12], [13] to find each cameraconnected to the mesh network. For each real-world camera, avirtual representation will be added to the model and placed inthe corresponding position. In other words, there will be a one-to-one mapping between each real-world camera and its corre-sponding virtual model. An audio/visual stream is sent fromthe camera node to the central management center. Greater de-tails regarding the three-dimensional user interface and the au-dio and video components comprising the system are providedin the following sections

    A. Augmented Reality, Three-Dimensional User Interface

    As previously described, the user interface consists of athree-dimensional model of the environment being monitoredincluding a number of virtual cameras (one virtual camera foreach real-world camera), positioned in the model such tha...


View more >