9fcfd50d1bd3461721

7/27/2019 9fcfd50d1bd3461721

1/20

Intelligent video surveillance system: 3-tier context-aware

surveillance system with metadata

Yunyoung Nam & Seungmin Rho & Jong Hyuk Park

# Springer Science+Business Media, LLC 2010

Abstract This paper presents an intelligent video surveillance system with the metadata rule

for the exchange of analyzed information. We define the metadata rule for the exchange of

analyzed information between intelligent video surveillance systems that automatically

analyzes video data acquired from cameras. The metadata rule is to effectively index very

large video surveillance databases and to unify searches and management between distributed

or heterogeneous surveillance systems more efficiently. The system consists of low-level

context-aware, high-level context-aware and intelligent services to generate metadata for the

surveillance systems. Various contexts are acquired from physical sensors in monitoring areas

for the low-level context-aware system. The situation is recognized in the high-level context-

aware system by analyzing the context data collected in the low-level system. The system

provides intelligent services to track moving objects in Fields Of View (FOVs) and to recognize

human activities. Furthermore, the system supports real-time moving objects tracking with

Panning, Tilting and Zooming (PTZ) cameras in overlapping and non-overlapping FOVs.

Keywords Object identification . Object localization . Object tracking . CCTV.

Surveillance . PTZ camera . Metadata

1 Introduction

The advent of increased network bandwidth and improved image processing technologies

has led to the rapid emergence of intelligent video surveillance systems. Traditional

Multimed Tools Appl

DOI 10.1007/s11042-010-0677-x

Y. Nam

Center of Excellence for Ubiquitous System, Ajou University, Suwon, South Korea

e-mail: [email protected]

S. Rho (*)School of Electrical Engineering, Korea University, Seoul, South Korea


J. H. Park

Department of Computer Science and Engineering, Seoul National University of Science

and Technology, Seoul, South Korea


7/27/2019 9fcfd50d1bd3461721

2/20

closed-circuit television (CCTV) requires a relative few operators to continuously monitor a

significant number of cameras in areas, such as military installations, roads, and airports

that need security. Intelligent surveillance systems can provide automated services, such as

abrupt incursion detection, shortest path recommendation using traffic jam analysis,

robbery monitoring, and people counting.In general, video surveillance systems monitor the specific activities by analyzing a

recorded video and multi-channel monitoring. It is a considerably tedious task for people to

monitor multi-channel division screens for 24 h. In addition, when the target object moves

from one area to the other area, a camera handover is required. Accordingly, intelligence

video surveillance systems not only provide real-time abnormal event detection by

analyzing images acquired from cameras, but also acquire continuous video sequences

from adjacent cameras using panning, tilting and zooming (PTZ) with multi-channel, multi-

area, and an immediate acquisition of the monitoring area video with complex and vast

environments.

More and more cameras are being in place forming huge surveillance systems. Within

those systems required basic functionalities are identical. The video stream needs to be

transmitted from the site to an appropriate place where it will be archived. The video might

be looked at by a number of persons and in case of an incident it could be exported to the

appropriate authorities. In order to identify a requested stream it would be necessary to

enhance the pure video stream by appropriate metadata. The information about recording

time and place as well as camera parameters used for recording would be sufficient to

achieve basic interoperability. For efficient archival a packaging of the video and metadata

information into a file format should be supported. That file format should also provide for

the inclusion of user data and possibly additional MPEG-7 metadata. This metadata shouldprovide key functionality to support the activities of the CCTV manufacturers, installers

and users. This paper describes the metadata rule for exchanging analyzed information in

surveillance systems.

The remaining sections of this paper are organized as follows. Section 2 introduces

related work. The system architecture and metadata scheme are given in Section 3.

Section 4 presents video analysis methods for metadata generation. Section 5 shows

physical prototyping of an intelligent video surveillance system. Finally, we conclude this

paper and discuss our future work in Section 6.

2 Background

2.1 Surveillance systems

Surveillance systems have played an important role in the management of public places relating

to safety and security. The explosion in the number of cameras that should be monitored, the

accruing costs of offering monitoring personnel and the limitations of human operators to

uphold sustained levels of concentration, severely circumscribe the efficaciousness of these

systems. Alternatively, subsequent advances in information and communication technologiescan potentially offer considerable improvements. The deployment of technology to maintain

surveillance is used in modern urban environments [17].

In many surveillance applications, events of interest may occur rarely. For these unusual

events (or abnormal, rare events), it is difficult to collect sufficient training data for

supervised learning to develop unusual event models. In this case, many unusual event

detection algorithms [7, 25, 26, 28] that require a large amount of training data become

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

3/20

unsuitable. Several algorithms have been proposed to address the difficulty of unusual

event recognition with sparse training data. Zelnik-Manor et al. [25] and Zhong et al. [28]

clustered the divided video clips into different groups based on a similarity measure. The

groups with relatively small numbers of video clips are detected as unusual events.

However, since unusual events have insufficient training data, clusters for these events maynot be sufficiently representative to predict future unusual events. Zhang et al. [26]

proposed a method by developing the unusual event model from that of usual events. This

method provides a hint on how to deal with the lack-of-training-data issue. However, they

obtained all unusual event models by adapting from the general usual-event model, while in

reality, the usual events and unusual events can be vastly different in nature. In this paper,

we develop abnormal activity recognition based on a motion history image and a moving

trajectory of objects.

2.2 Tampering detection

Position, angle, and power of a camera can be arbitrarily changed intentionally or

accidentally. We implemented a tampering module using image difference calculation to

deal with this situation. Scene change detection is essential for the implementation of the

tampering module to detect tampering. Numerous scene change detection schemes have

been proposed. Nam [14] detected a gradual scene change, such as fade-in, fade-out, and

overlaps by B-spline curve. However, this is inappropriate for the tampering module

implementation, because it does not cope with abrupt scene changes. In addition, Zhao [27]

and Huang [9] detected moving picture scene changes using color histogram and pixel-

based features. The color or pixel reflects the global feature of the image to detect thechange of a screen. Unlike tampering detection, scene change detection in moving images

is more difficult than orientation detection of a camera. Thus, it needs to be modified for

our tampering method. In addition, Ribnick [20] used a short-term and long-term image

buffer to calculate image similarity using image chromaticity, L1-norm value and histogram

value. In this paper, we implemented tampering detection using the RGB color feature. The

trial and error method was used to set the threshold value.

An object in an image can be represented by the shape of a point, circle, square, contour,

and silhouette. A point-employed method describes an object as a set of points [22] or the

centroid [21]. In addition, an object also can be represented by an ellipse [5]. The contour

and silhouette method [24] recognizes the object as a contour that is the silhouette boundaryobtained inside the contour. It is suitable for tracking complex non-rigid shapes. Last, the

object skeleton method [1] uses the medial axis transformation to the silhouette of an

image. In this paper, we represent an object as the representative point of an object and use

it to track the object.

2.3 Human activity recognition

Human activity recognition is a challenging task due to the non-rigidness of the human

body, as human motion lacks a clear categorical structure: the motion can be often classifiedinto several categories simultaneously, because some activities have a natural compositional

structure in terms of basic action units and even the transition between simple activities

naturally has temporal segments of ambiguity and overlap. Human motion often displays

multiple levels of increasing complexity that range from action-units to activities and

behaviors. The more complex the human behavior, the more difficult it becomes to perform

recognition in isolation. Motions can occur in various timescales and as they often exhibit

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

4/20

long-term dependencies, long contexts of observations may need to be considered for correct

classification at particular time-steps. For instance, the motion class at a current time-step may

be hard to predict using only the previous state and the current image observation alone, but

may be less ambiguous if several neighboring states or observations possibly both backward

and forward in time are considered. However, this computation would be hard to perform usinga Hidden Markov Model (HMM) [18] where stringent independence assumptions among

observations are required to ensure computational tractability.

Many algorithms have been proposed to recognize human activities. Lv et al. [13] and

Ribeiro [19] focus on the selection of suitable feature sets for different events. Models, such

as HMM [7, 18], state machine [2], Adaboost [23], are also widely used for activity

recognition. However, most of the methods proposed in these works are inflexible to add

new activities. They are trained or constructed to recognize predefined events. If new

activities are added, the entire model has to be re-trained or the entire system has to be

re-constructed. Other methods [25, 28] tried to use a similarity metric, so that different events

can be clustered into different groups. This approach has more flexibility for newly added

events. However, due to the uncertain nature of the activity instances, it is difficult to find a

suitable feature set, such that all samples of an event are clustered closely around a center.

2.4 Object tracking based on multiple cameras

A single camera is insufficient to detect and track objects due to its limited field of view

(FOV) or occlusion. Many approaches address detection and tracking using overlapping or

non-overlapping multiple views. Tracking algorithms [3, 11] require camera calibration and

a computation of the handoff of tracked objects between overlapped cameras. It isnecessary to share a considerable common FOV with the first camera to accomplish this.

These requirements of overlapped cameras, however, are impractical due to the large

number of cameras required and the physical constraint upon their placement. Thus, it must

be able to deal with a non-overlapping region in the system, where an object is invisible to

any camera. Kettnaker and Zabih [12] presented a Bayesian solution to track objects across

multiple cameras where the cameras have a non-overlapping field of view. They used

constraints on the motion of the objects between cameras, which are positions, object

velocities and transition times. A Bayesian formulation of the problem was used to

reconstruct the paths of objects across multiple cameras. They required manual input of the

topology of allowable paths of movement and the transition probabilities. Huang andRussell [10] used a probabilistic approach that is a combination of appearance matching

and transition times of cars in non-overlapping cameras with known topology. The

appearance of a car is evaluated using the color and the transition times modeled as

Gaussian distributions.

3 System architecture and metadata scheme

The surveillance system operates continuously or only as required to monitor a particularevent. To develop the intelligent surveillance system, various contexts are acquired from

physical sensors in monitoring areas. The situation is recognized by analyzing the context

data collected from the physical sensors. Then, the surveillance system generates metadata.

The goal of the metadata rule is to effectively index very large video surveillance databases

and to enable unified searches and management between distributed or heterogeneous

surveillance systems more efficiently.

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

5/20

3.1 System architecture

Figure 1 depicts our intelligent video surveillance system consisting of three different

layers. First, the low-level context module in the bottom layer collects the measurable data

from sensing hardware in the monitoring area. In this paper, the system receives audio-visual data and RFID tag data from cameras, microphones, and RFID readers. Data

acquired from various sensors are transmitted to the high-level context aware module. The

high-level context module recognizes human actions, such as hugging, snatching,

trespassing and tampering, by analyzing audio-visual data. The abnormal context aware

module judges whether the context is normal; if it is abnormal, it constructs the community

and gives an instruction for the appropriate services, as shown in Fig. 2.

Figure 3 shows the intelligent surveillance system architecture. The components are

described as follows.

Sensing Infrastructure: Sensing Infrastructure is used to collect various data fromheterogeneous sensing hardware devices in a ubiquitous network environment. This

paper used cameras, GPS for a location awareness sensor, and microphone for a noise

sensor to acquire various data in the monitoring area. Data from the Sensing

Infrastructure are transmitted to the Context Aggregator and are modified as our

predefined format for the context awareness.

Context Database: Context Database refers to the module in which the modified data from

the Context Broker (which is used for the future awareness of context) are stored. The

corresponding data are represented as space safety index, personal safety index and so on.

Context Broker: Context Broker stores the data into the context DB that is transmitted from

the Context Aggregator. Data are processed for the usage of the corresponding space.

Community Manager: When an event occurs in a specific location according to our

predefined criteria, Community Manager gives instruction to its Service Invocator to

construct relevant services that are defined by Community Editor.

Community Editor: Community Editor constructs the community that makes a service

when a pre-defined event occurs in our monitoring area. The Community is

dynamically constructed and stored in the Community Template Repository.

Low-level Context-aware

High-level Context-aware

Abnormal Context

Recognition

Fig. 1 3-tier context-aware surveillance system

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

6/20

Service Discoverer and Invocator: When an event occurs, Context Manager finds an

appropriate service through the Service Discoverer; and, if it exists, Service Invocator

performs the relevant service stored in the Community Template.

When a tampering action occurs in the monitoring area, sensing data are transmitted to

Context Broker and Context Broker commands Index Agent to update the latest space

safety index in the Index Database. A camera application computes its space safety index in

the index DB. If the computed space safety index value exceeds the threshold, it commands

the camera to monitor the area by the PTZ function. Finally, the user agent sends an alarm

message to users by computation of the space safety index.

Fig. 3 System architecture

Topology-based Context

Propagation

Camera Association

Multi-camera Tracking

Object Localization

Tampering Detection

Fig. 2 Appropriated services for intelligent surveillance system

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

7/20

A multi-camera tracking scheme is applied for a continuous video acquisition of the

object movement. After the physical setting on the camera and system environments,

Field of View (FOVs) of fixed and PTZ surveillance cameras are automatically set by

an image similarity comparison. The fixed cameras cover the Region Of Interest (ROI)

and analyze the real-time images for object representation and tracking in case ofabnormal situations. The system automatically sends an alarm message to the

surveillance system when tampering or violence occurs. After receiving the message,

the system shows the object in images and indicates the object location in the safety

index, Google satellite map, and a 2D map. If the object moves in the FOV of the fixed

camera, a PTZ camera traces the object using PTZ control. Otherwise, if the object

disappears from the FOV of the fixed camera, our system attempts to obtain the object

through our autonomic collaboration method employing adjacent camera topology in

the non-overlapping zone.

The main purpose of the intelligent surveillance system is to provide real time event

detection based upon established rules. Monitoring and surveillance agents then receive

alerts in real time, allowing them to address threats and other events of importance

proactively within their environment. However, the surveillance systems have different

established event rules and message exchanging rules with vendors. Thus, metadata

standardization is required to enable the intelligent surveillance systems to exchange

analyzed data. The metadata rule is to help exchange analyzed information between

distributed systems or heterogeneous systems. We define the metadata rule for exchanging

analyzed information between intelligent video surveillance systems that automatically

analyzes video data acquired from cameras.

3.2 Metadata scheme

Surveillance metadata should be constructed with a camera unique ID, camera resolution,

power on/off status, and camera location information. When a moving object appears in the

FOV, the object color feature is constructed as metadata. The object color feature is

classified as head, body, upper, and the lower part in the HSI color space. In addition,

metadata consists of a unique ID, size, object location, camera location, type, action, and

additional information. Figure 4 shows the schema diagram of metadata. In the next section,

we will describe audio-visual data analysis methods to generate metadata.

4 Audio-visual data analysis methods

The described framework in this paper includes real-time video data analysis methods for

an automated surveillance system, which is listed as object identification, tampering

detection, object size analysis, object location analysis, moving object tracking. This

chapter describes audio-visual data analysis methods to develop intelligent video

surveillance and to generate metadata.

4.1 Object identification

We used the background subtraction scheme for object classification. This divides the

background and moving objects. The background subtraction algorithm captures a

sequence of images containing moving objects from a static single camera and detects

moving objects from the reference background image. We statistically analyze the reference

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

8/20

Metadata

extension1..

MetadataID

xs:ID

Object

1..

ObjectType

Camera

CameraType

Sync

SYNCType

File

Comment

xs:string

Object

1..

ObjectType1..

ObjectID

xs:ID

Color

ColorType

Size

Location

LocationType

Type

restriction

Action

xs:string

Comment

xs:string

SYNCSYNCType

Fig. 4 Schema diagram of metadata

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

9/20

background image in HSI colour space for fifty frames with different illuminations and all

pixels of the static background scene image are modeled as Gaussian distribution with

respect to the hue and saturation values. After the preprocessing for analyzing the

background image, a sequence of images containing a moving human captured from a

camera is converted into HSI colour images and subtracted from the reference backgroundimage. If the subtraction values are greater than the threshold values which are derived

based on the variance values from the background image, those pixels are determined as

belonging to the foreground pixels.

After background subtraction, the object is identified by moving direction and color

histogram of the object in our system. When objects move in the monitoring area,

pixel-level subtraction results in the separation of background and object image.

However, these subtracted data have numerous useless noisy and ungrouped pixels.

Thus, these pixels are eliminated and blobs for the grouped pixels are grouped as a

moving object. Figure 5 shows background extraction and object movement orientation

analysis.

4.2 Tampering detection

Image difference comparison is used for tampering and the camera position setting.

Figure 6 shows tampering detection; Fig. 6 (c) shows the subtraction result of Fig. 6 (a)

Camera

CameraType

CameraID

xs:ID

Resolution

Status

restriction

Location

LocationType

Color

ColorType

Body

Head

Upper

Lower

Fig. 4 (continued)

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

10/20

and (b). In Fig. 6 (c), the unchanged part of Fig. 6 (a) and 6 (b) has close-to-zero RGB

values. When the image difference value exceeds the 80% threshold of non-zero RGB

pixel over the entire image, an alarm message is initiated. When a tampering alarm is

received, the system predicts an objects movement and controls the adjacent cameras to

acquire a continuous object using the PTZ function. In our previous paper [15], an object

movement routine was graphically presented considering the spatial relation of the camera

and the time-spatial relation of the object appearance and disappearance.

4.3 Object size and location analysis

The size of an object is determined by the distance to the object and the focal length of the

camera. To determine the distance to an object of unknown size is possible using the

(a) Monitoring area (b) Tampering action (c) Image difference

Fig. 6 Tampering detection

TIME : 1_30_16_6_46_718center coordinate : (143,139)

angle : 243.47previous c oordinate : (143,139)(161,130) (147,122) (144,118)

(0,0)

TIME : 1_30_16_9_8_765cen ter coordinate : (312,160)

ang le : 115.21pre vious c oordinate : (312,160)(210,112) (194,112) (185,112)

(213,112)

Fig. 5 Background extraction and object movement orientation analysis

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

11/20

knowledge about the height of the camera and the bearing to the point where the object

meets the ground. Therefore, object size y is computed by focal length f, camera height yc,

tilting angle x,

y fyc fsin qx vc vt cos qx = fsin qx vc vb cos qx fyc

vc vt sin qx fcos qx; 1

where vc is the center coordinate of the object in a camera, vt is the top coordinate of the

object in a camera, vb is the bottom coordinate of the object in a camera.

An object location y is computed by object size y, camera location h and camera

height h.

y0 h0 h y tan q ; 2

where h is the GPS location of a camera (Fig. 7).

4.4 Activity recognition

Human activity is recognized by [4], as shown in Fig. 8. The method in [4] can recognize

human activities, such as walking, turning, punching and sitting.

The proposed system adopts several action classifiers as the movement direction

of the object to recognize the human action view-invariantly. Then, the proposed

system selects a classifier based on the moving path of the target object. We trainthe Multi-Layer Perceptron (MLP) using 320 actions obtained from four subjects. When

a punching action occurs, the method sends an alarm message to our surveillance

system.

4.5 Moving object tracking

In the case of a multi-camera tracking system, while one fixed camera shows the ROI,

the PTZ camera is controlled by analyzing the moving image of the fixed camera. At

(a) Side view (b) Top-down view ( is FOV)

h

y

y' h'y''

/2

h'

|h'-y''|

y'

'

(h-y)tan

Fig. 7 Calculation of object size and location in camera environments

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

12/20

this time, the PTZ camera should control PTZ dependent on the angle of view and

zoom level. Thus, additional physical position adjustment between the fixed and PTZ

camera is essential for object tracking. In this paper, the camera topology can be

adjusted by image similarity comparison automatically. The camera position setting

algorithm is as follows.

Algorithm 1: Camera Position Setting

1: SetCameraPosition(&FixedCamera, fZoomLev, fHeight);

2: SetCameraPosition(&PTZCamera, fZoomLev, fHeight);3: image Rep[], FixedImg;

FixedImg = SaveImgFromFixedCam( );

Do Panning PTZ Camera

{ Rep = SaveImage( );

} while (From leftmost to rightmost)

4: CalculateImgDiff(Rep[], FixedImg);

5: SetPosition (Min (Rep[]));

We set the fixed camera position to the specific height and zoom level, as shown

above. The PTZ camera was calibrated using the fixed cameras height and zoom level.Then, we collected the representative images through the PTZ camera panning, which

covered entire monitoring areas. We set the PTZ camera location that satisfies the

minimum difference by calculating the differences between the fixed camera images

and the representative images.

An object is detected by preprocessing, which subtracts the object from a

background image. The background image without any objects is stored, so that an

object can be extracted by subtracting object images from the background image when

needed. However, the background subtraction method could not be started immediately

after setting the physical camera until no object appears in the background image. Italso takes time to store the background image. In addition, the background image may

be newly stored with luminance and the object is changed by light and wind. In this

paper, we use a motion history method of OpenCV library [16], which has no

background learning process, to trace moving objects using the centric point in real-time.

The tracking process is as follows.

Fig. 8 Screenshot of activity recognition

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

13/20

Algorithm 2: Moving Object Tracking

Image buffer[];

Point objPoint[];

MotionSegmentation seg[];

int minDistance, nowobjPoint;

int angle[];

buffer[] = SaveImg ( );

cvCvtColor (buffer[], CV_BGR2GRAY );

cvAbsDiff (buffer[]);

seg[] = cvUpdateMotionHistory (buffer[],DURATION);

for (i = 1; i < Num(seq); i++)

{

extractObjFromSegmentation (seg[]);

}objPoint[] = GetCenterPointofObj (seg[]);

angle[] = GetObjAngle (seg[]);

for(i = 1; i < Num(seq); i++)

{

int tmpDistance;

tmpDistance = CalculateEuclideanDistance (prevobjPoint, objPoint[i]);

if (tmpDistance ACCEPTABLE_MIN_DISTANCE)

nowobjPoint = Compare (prevAngle, objAngle[i]);

elsenowobjPoint = objPoint[i];

}

}

As depicted in algorithm 2, images are stored in a buffer, and then are converted into

gray-scale images. Our system obtains the motion history of two images using the image

difference calculation. We adopted the cvUpdateMotionHistory of the OpenCV API to

update the motion history. Motion history can be updated by the non-zero pixel silhouette

image when motion occurs in the image. In this paper, we used the 1 second time-stamp for

the image storage time and exclude the non-zero pixel silhouette images that have asummation of width and height below 20 pixels.

As shown in Fig. 9, when objects move in the image for a specific period, a blue-marked

motion history is updated by the comparison of two image frames. We represent an object

as the center point in a circle that covers the whole object. In addition, a line from the

centric point to a circular arc shows the movement direction of the object, as shown in

Fig. 9. We can predict the centric point of an object in the next frame using the centric point

and direction of the object. First, we continuously store the center point and direction in the

buffer. The object center point in next frame is set by the point that has the shortest distance

between object center points of the previous frame and the next frame. The Euclidean

method [6] was used to measure distance between center points. In the case of an occlusion

of multiple objects, the movement direction between the previous frame and next frame is

used for the objects identification.

The centric point is used for Panning and Tilting the PTZ camera, as used in the following

algorithm, considering the zoom level difference between the fixed camera and the PTZ camera.

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

14/20

Algorithm 3: PTZ Function

1: int MovFactor, prevX, prevY, newX, newY;

2: MovFactor = (PTZCamZoomLev/FixedCamZoomLev);

3: newX = FixedCamCenterX + (OjbCenterX FixedCamCenterX) x MovFactor;4: newY = FixedCamCenterY + (OjbCenterY FixedCamCenterY) x MovFactor;

5: DoPTZ (newX, newY);

We can calculate the degree of panning and tilting using the given coordinates depicted

in the function DoPTZ (newX, newY).

5 Physical prototyping of intelligent video surveillance system

We performed experiments with seven CCD cameras (704480 resolutions) to evaluatethe performance of our system. We used PCs with Intel 64-bit Xeon 3.2 G Processors

and 2 GB of RAM as the hardware platform, and Microsoft SQL Server 2000 as the

underlying DBMS. The system automatically recognizes various dangerous situations in

public areas and classifies the safety level by means of the environments safety index

models using network camera collaboration. Figure 10 shows color-based object

identification using two cameras and violence recognition using an acoustic sensor. In

Fig. 10 (a), the face of an entering object is detected by Adaboost algorithm in the

entrance. Each object is classified by the HSI color model. Thus, the intelligent

surveillance system identifies and tracks unauthenticated people by analyzing the color

and pattern of clothing. In Fig. 10 (b), dangerous situations are recognized by analyzing

audio and visual data. Our system detects abnormal situations using the decibel (dB)

levels of scream pitches.

Figure 11 shows the physical prototype of the intelligent video surveillance system

composed of a screen (4.2 m2 m) with six projectors, one ceiling projector, four fix

cameras, four PTZ cameras, one speed dome camera and two acoustic sensors. Figure 11

(a) Real image (b) Silhouette image

Fig. 9 Object tracking

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

15/20

(a) Object identification

(b) Abnormal situation detection

Fig. 10 Object identification violence recognition using multiple sensors

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

16/20

(a) Control center

(b) Screenshot of ISS

(c) GIS using Google Earth

(d) USS monitor

Fig. 11 Physical prototyping

of intelligent video surveillance

system

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

17/20

(c) shows a satellite map that covers the entire boundary of the Earth. When an accident

occurs, the system zooms into the accident area. The system indicates the accident point

using a red circle in a 2D map. We use the Google Earth API functions [8] to mark the

monitoring area for efficiency.

In Fig. 11 (d), the level of spatial importance is computed using space features andfacilities. Based on this space safety index, we reconstruct the monitoring area when an

accident occurs in a specific space. For example, the safety index of an ATM facility is

higher than that of other areas. At this time, if violence happens in the ATM area, the level

of spatial importance is recalculated. Then, the system controls the adjacent camera to

monitor the ATM area using PTZ control. If violence occurs in the ATM area, the area is

marked with red in the 2D map.

6 Conclusions

In this paper, we have developed an intelligent surveillance system that provides

various services, such as object identification, object size analysis, object localization,

tampering detection, activity recognition, and moving object tracking. The surveillance

systems have different established event rules and message exchanging rules with

vendors. Thus, we have defined the metadata rules to exchange analyzed information

between distributed surveillance systems or heterogeneous surveillance systems. A 3-

tier context-awareness conceptual framework is presented to identify the design

principles of the intelligent surveillance system. Most importantly, the design

prototypes, as the convergence of computers and buildings, have shown the potentialfor a profound transformation of design practice in smart space design. The design

framework and the implementation of the prototypes have served as a logical basis to

elaborate broad design concepts and intelligent video computing technologies that may

be performed toward future smart surveillance systems. In future work, we will improve

robust object identification methods and create an administrative mobile device

interface.

Acknowledgment

This research was supported by the MKE(The Ministry of Knowledge Economy),Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA

(National IT Industry Promotion Agency) (NIPA-2010-C1090-1031-0004) and this research is also

supported by the Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier

R&D Program of the Ministry of Knowledge Economy (MKE), the Korean government, as a result of UCN s

subproject 10C2-T3-10M.

References

1. Ali A, Aggarwal J (2001) Segmentation and recognition of continuous human activity. In: Detection and

recognition of events in video, 2001. Proceedings. IEEE Workshop on, pp 2835

2. Ayers D, Shah M (2001) Monitoring human behavior from video taken in an office environment. Image

Vis Comput 19(12):833846

3. Cai Q, Aggarwal JK (1996) Tracking human motion using multiple cameras. In: ICPR96: Proceedings

of the International Conference on Pattern Recognition (ICPR 96) Volume III-Volume 7276.

Washington, DC, USA: IEEE Computer Society, pp 6872

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

18/20

4. Chae YN, Kim Y-H, Choi J, Cho K, Yang HS (2009) An adaptive sensor fusion based objects tracking and

human action recognition for interactive virtual environments. In: VRCAI 09: Proceedings of the 8th

International Conference on Virtual Reality Continuum and its Applications in Industry. New York, NY,

USA: ACM, pp 357362

5. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach

Intell 25:564

5756. Danielsson P (1980) Euclidean distance mapping. Comput Graph Image Process 14(3):227248

7. Duong T, Bui H, Phung D, Venkatesh S (2025 2005) Activity recognition and abnormality detection

with the switching hidden semi-markov model. In: Computer Vision and Pattern Recognition, 2005.

CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp 838845

8. Google Earth API. Online Available. http://code.google.com/apis/earth/

9. Huang C-L, Liao B-Y (2001) A robust scene-change detection method for video segmentation. IEEE

Trans Circuits Syst Video Technol 11(12):12811288

10. Huang T, Russell S (1997) Object identification in a Bayesian context. In: IJCAI97: Proceedings of the

Fifteenth international joint conference on Artifical intelligence. San Francisco, CA, USA: Morgan

Kaufmann Publishers Inc., p. 12761282

11. Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S (1995) An architecture for multiple

perspective interactive video. In: MULTI- MEDIA95: Proceedings of the third ACM International

Conference on Multimedia. New York, NY, USA: ACM, pp 201212

12. Kettnaker V, Zabih R (Jul 1999) Counting people from multiple cameras. In: Multimedia computing and

systems, 1999. IEEE International Conference on, vol. 2, pp 267271

13. Lv F, Kang J, Nevatia R, Cohen I, Medioni G (2004) Automatic tracking and labeling of human

activities in a video sequence. In PETS04

14. Nam J, Tewfik A (2005) Detection of gradual transitions in video sequences using b-spline interpolation.

IEEE Trans Multimedia 7(4):667679

15. Nam Y, Ryu J, Joo Choi Y, Duke Cho W (2007) Learning spatio-temporal topology of a multi-camera

network by tracking multiple people. World Acad Sci Eng Tech 4(4):254259

16. OpenCV, Open Computer Vision Library. http://sourceforge.net/projects/opencvlibrary/

17. Petrushin V, Wei G, Ghani R, Gershman A (2828 2005) Multiple sensor indoor surveillance: problems

and solutions. In: Machine Learning for Signal Processing, 2005 IEEE Workshop on, pp 349

35418. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition.

Proc IEEE 77(2):257286

19. Ribeiro PC, Santos-victor J (2005) Human activity recognition from video: modeling, feature selection

and classification architecture. In: International Workshop on Human Activity Recognition and

Modeling, pp 6170

20. Ribnick E, Atev S, Masoud O, Papanikolopoulos N, Voyles R (Nov. 2006) Real-time detection of

camera tampering. In: Video and signal based surveillance, 2006. AVSS 06. IEEE International

Conference on

21. Serby D, Meier E, Van Gool L (2326 2004) Probabilistic object tracking using multiple features. In: Pattern

recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp 184187 Vol.2

22. Veenman C, Reinders M, Backer E (2001) Resolving motion correspondence for densely moving points.

IEEE Trans Pattern Anal Mach Intell 23(1):547223. Viola P, Jones M, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J

Comput Vis 63:153161

24. Yilmaz A, Li X, Shah M (2004) Contour-based object tracking with occlusion handling in video

acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):15311536

25. Zelnik-Manorand L, Irani M (2001) Event-based analysis of video. In: Computer Vision and Pattern

Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 2,

pp 123130

26. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2025 2005) Semi-supervised adapted HMMs for

unusual event detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE

Computer Society Conference on, vol. 1, pp 611618

27. Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (Jul 1999) Improving color based video

shot detection. In: Multimedia computing and systems, 1999. IEEE International Conference on, vol. 2, pp752756 vol.2

28. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Computer vision and

pattern recognition, Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp

819826

Multimed Tools Appl
http://code.google.com/apis/earth/http://sourceforge.net/projects/opencvlibrary/http://sourceforge.net/projects/opencvlibrary/http://code.google.com/apis/earth/

7/27/2019 9fcfd50d1bd3461721

19/20

Yunyoung Nam received B.S, M.S. and Ph.D. degree in Information and Computer Engineering from Ajou

University, Korea in 2001, 2003, and 2007 respectively. He was a research engineer in the Center ofExcellence in Ubiquitous System from 2007 to 2009. He was a post-doctoral researcher at Stony Brook

University in 2009, New York. He is currently a research professor in Ajou University in Korea. He also

spent time as a visiting scholar at Center of Excellence for Wireless & Information Technology (CEWIT),

Stony Brook University - State University of New York Stony Brook, New York. His research interests

include multimedia database, ubiquitous computing, image processing, pattern recognition, context-

awareness, conflict resolution, wearable computing, and intelligent video surveillance.

Seungmin Rho received his MS and PhD Degrees in Information and Computer Engineering from Ajou

University, Korea, in Computer Science from Ajou University, Korea, in 2003 and 2008, respectively. In

20082009, he was a Postdoctoral Research Fellow at the Computer Music Lab of the School of Computer

Science in Carnegie Mellon University. He is currently working as a Research Professor at School of

Electrical Engineering in Korea University. His research interests include database, music retrieval,

multimedia systems, machine learning, knowledge management and intelligent agent technologies. He hasbeen a reviewer in Multimedia Tools and Applications (MTAP), Journal of Systems and Software,

Information Science (Elsevier), and Program Committee member in over 10 international conferences. He

has published 14 papers in journals and book chapters and 21 in international conferences and workshops.

He is listed in Whos Who in the World.

Multimed Tools Appl

7/27/2019 9fcfd50d1bd3461721

20/20

Dr. Jong Hyuk Park received his Ph.D. degree in Graduate School of Information Security from Korea

University, Korea. From December, 2002 to July, 2007, Dr. Park had been a research scientist of R&DInstitute, Hanwha S&C Co., Ltd., Korea. From September, 2007 to August, 2009, He had been a professor at

the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor

at the Department of Computer Science and Engineering, Seoul National University of Science and

Technology, Korea. Dr. Park has published about 100 research papers in international journals and

conferences. He has been serving as chairs, program committee, or organizing committee chair for many

international conferences and workshops. He is a president of Korea Information Technology Convergence

Society (KITCS). He is editor-in-chief (EiC) of International Journal of Information Technology,

Communications and Convergence (IJITCC), InderScience. He was EiCs of the International Journal of

Multimedia and Ubiquitous Engineering (IJMUE) and the International Journal of Smart Home (IJSH). He is

Associate Editor / Editor of 14 international journals including 8 journals indexed by SCI(E). In addition, he

has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John

Wiley, Oxford Univ. press, Hindawi, Emerald, Inderscience. His research interests include security anddigital forensics, ubiquitous and pervasive computing, context awareness, multimedia services, etc. He got

the best paper award in ISA-08 conference, April, 2008. And he got the outstanding leadership awards from

IEEE HPCC-09 and ISA-09, June, 2009.

Multimed Tools Appl

Documents

9fcfd50d1bd3461721