Upload
uriel-ouake
View
216
Download
0
Embed Size (px)
Citation preview
7/27/2019 9fcfd50d1bd3461721
1/20
Intelligent video surveillance system: 3-tier context-aware
surveillance system with metadata
Yunyoung Nam & Seungmin Rho & Jong Hyuk Park
# Springer Science+Business Media, LLC 2010
Abstract This paper presents an intelligent video surveillance system with the metadata rule
for the exchange of analyzed information. We define the metadata rule for the exchange of
analyzed information between intelligent video surveillance systems that automatically
analyzes video data acquired from cameras. The metadata rule is to effectively index very
large video surveillance databases and to unify searches and management between distributed
or heterogeneous surveillance systems more efficiently. The system consists of low-level
context-aware, high-level context-aware and intelligent services to generate metadata for the
surveillance systems. Various contexts are acquired from physical sensors in monitoring areas
for the low-level context-aware system. The situation is recognized in the high-level context-
aware system by analyzing the context data collected in the low-level system. The system
provides intelligent services to track moving objects in Fields Of View (FOVs) and to recognize
human activities. Furthermore, the system supports real-time moving objects tracking with
Panning, Tilting and Zooming (PTZ) cameras in overlapping and non-overlapping FOVs.
Keywords Object identification . Object localization . Object tracking . CCTV.
Surveillance . PTZ camera . Metadata
1 Introduction
The advent of increased network bandwidth and improved image processing technologies
has led to the rapid emergence of intelligent video surveillance systems. Traditional
Multimed Tools Appl
DOI 10.1007/s11042-010-0677-x
Y. Nam
Center of Excellence for Ubiquitous System, Ajou University, Suwon, South Korea
e-mail: [email protected]
S. Rho (*)School of Electrical Engineering, Korea University, Seoul, South Korea
e-mail: [email protected]
J. H. Park
Department of Computer Science and Engineering, Seoul National University of Science
and Technology, Seoul, South Korea
e-mail: [email protected]
7/27/2019 9fcfd50d1bd3461721
2/20
closed-circuit television (CCTV) requires a relative few operators to continuously monitor a
significant number of cameras in areas, such as military installations, roads, and airports
that need security. Intelligent surveillance systems can provide automated services, such as
abrupt incursion detection, shortest path recommendation using traffic jam analysis,
robbery monitoring, and people counting.In general, video surveillance systems monitor the specific activities by analyzing a
recorded video and multi-channel monitoring. It is a considerably tedious task for people to
monitor multi-channel division screens for 24 h. In addition, when the target object moves
from one area to the other area, a camera handover is required. Accordingly, intelligence
video surveillance systems not only provide real-time abnormal event detection by
analyzing images acquired from cameras, but also acquire continuous video sequences
from adjacent cameras using panning, tilting and zooming (PTZ) with multi-channel, multi-
area, and an immediate acquisition of the monitoring area video with complex and vast
environments.
More and more cameras are being in place forming huge surveillance systems. Within
those systems required basic functionalities are identical. The video stream needs to be
transmitted from the site to an appropriate place where it will be archived. The video might
be looked at by a number of persons and in case of an incident it could be exported to the
appropriate authorities. In order to identify a requested stream it would be necessary to
enhance the pure video stream by appropriate metadata. The information about recording
time and place as well as camera parameters used for recording would be sufficient to
achieve basic interoperability. For efficient archival a packaging of the video and metadata
information into a file format should be supported. That file format should also provide for
the inclusion of user data and possibly additional MPEG-7 metadata. This metadata shouldprovide key functionality to support the activities of the CCTV manufacturers, installers
and users. This paper describes the metadata rule for exchanging analyzed information in
surveillance systems.
The remaining sections of this paper are organized as follows. Section 2 introduces
related work. The system architecture and metadata scheme are given in Section 3.
Section 4 presents video analysis methods for metadata generation. Section 5 shows
physical prototyping of an intelligent video surveillance system. Finally, we conclude this
paper and discuss our future work in Section 6.
2 Background
2.1 Surveillance systems
Surveillance systems have played an important role in the management of public places relating
to safety and security. The explosion in the number of cameras that should be monitored, the
accruing costs of offering monitoring personnel and the limitations of human operators to
uphold sustained levels of concentration, severely circumscribe the efficaciousness of these
systems. Alternatively, subsequent advances in information and communication technologiescan potentially offer considerable improvements. The deployment of technology to maintain
surveillance is used in modern urban environments [17].
In many surveillance applications, events of interest may occur rarely. For these unusual
events (or abnormal, rare events), it is difficult to collect sufficient training data for
supervised learning to develop unusual event models. In this case, many unusual event
detection algorithms [7, 25, 26, 28] that require a large amount of training data become
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
3/20
unsuitable. Several algorithms have been proposed to address the difficulty of unusual
event recognition with sparse training data. Zelnik-Manor et al. [25] and Zhong et al. [28]
clustered the divided video clips into different groups based on a similarity measure. The
groups with relatively small numbers of video clips are detected as unusual events.
However, since unusual events have insufficient training data, clusters for these events maynot be sufficiently representative to predict future unusual events. Zhang et al. [26]
proposed a method by developing the unusual event model from that of usual events. This
method provides a hint on how to deal with the lack-of-training-data issue. However, they
obtained all unusual event models by adapting from the general usual-event model, while in
reality, the usual events and unusual events can be vastly different in nature. In this paper,
we develop abnormal activity recognition based on a motion history image and a moving
trajectory of objects.
2.2 Tampering detection
Position, angle, and power of a camera can be arbitrarily changed intentionally or
accidentally. We implemented a tampering module using image difference calculation to
deal with this situation. Scene change detection is essential for the implementation of the
tampering module to detect tampering. Numerous scene change detection schemes have
been proposed. Nam [14] detected a gradual scene change, such as fade-in, fade-out, and
overlaps by B-spline curve. However, this is inappropriate for the tampering module
implementation, because it does not cope with abrupt scene changes. In addition, Zhao [27]
and Huang [9] detected moving picture scene changes using color histogram and pixel-
based features. The color or pixel reflects the global feature of the image to detect thechange of a screen. Unlike tampering detection, scene change detection in moving images
is more difficult than orientation detection of a camera. Thus, it needs to be modified for
our tampering method. In addition, Ribnick [20] used a short-term and long-term image
buffer to calculate image similarity using image chromaticity, L1-norm value and histogram
value. In this paper, we implemented tampering detection using the RGB color feature. The
trial and error method was used to set the threshold value.
An object in an image can be represented by the shape of a point, circle, square, contour,
and silhouette. A point-employed method describes an object as a set of points [22] or the
centroid [21]. In addition, an object also can be represented by an ellipse [5]. The contour
and silhouette method [24] recognizes the object as a contour that is the silhouette boundaryobtained inside the contour. It is suitable for tracking complex non-rigid shapes. Last, the
object skeleton method [1] uses the medial axis transformation to the silhouette of an
image. In this paper, we represent an object as the representative point of an object and use
it to track the object.
2.3 Human activity recognition
Human activity recognition is a challenging task due to the non-rigidness of the human
body, as human motion lacks a clear categorical structure: the motion can be often classifiedinto several categories simultaneously, because some activities have a natural compositional
structure in terms of basic action units and even the transition between simple activities
naturally has temporal segments of ambiguity and overlap. Human motion often displays
multiple levels of increasing complexity that range from action-units to activities and
behaviors. The more complex the human behavior, the more difficult it becomes to perform
recognition in isolation. Motions can occur in various timescales and as they often exhibit
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
4/20
long-term dependencies, long contexts of observations may need to be considered for correct
classification at particular time-steps. For instance, the motion class at a current time-step may
be hard to predict using only the previous state and the current image observation alone, but
may be less ambiguous if several neighboring states or observations possibly both backward
and forward in time are considered. However, this computation would be hard to perform usinga Hidden Markov Model (HMM) [18] where stringent independence assumptions among
observations are required to ensure computational tractability.
Many algorithms have been proposed to recognize human activities. Lv et al. [13] and
Ribeiro [19] focus on the selection of suitable feature sets for different events. Models, such
as HMM [7, 18], state machine [2], Adaboost [23], are also widely used for activity
recognition. However, most of the methods proposed in these works are inflexible to add
new activities. They are trained or constructed to recognize predefined events. If new
activities are added, the entire model has to be re-trained or the entire system has to be
re-constructed. Other methods [25, 28] tried to use a similarity metric, so that different events
can be clustered into different groups. This approach has more flexibility for newly added
events. However, due to the uncertain nature of the activity instances, it is difficult to find a
suitable feature set, such that all samples of an event are clustered closely around a center.
2.4 Object tracking based on multiple cameras
A single camera is insufficient to detect and track objects due to its limited field of view
(FOV) or occlusion. Many approaches address detection and tracking using overlapping or
non-overlapping multiple views. Tracking algorithms [3, 11] require camera calibration and
a computation of the handoff of tracked objects between overlapped cameras. It isnecessary to share a considerable common FOV with the first camera to accomplish this.
These requirements of overlapped cameras, however, are impractical due to the large
number of cameras required and the physical constraint upon their placement. Thus, it must
be able to deal with a non-overlapping region in the system, where an object is invisible to
any camera. Kettnaker and Zabih [12] presented a Bayesian solution to track objects across
multiple cameras where the cameras have a non-overlapping field of view. They used
constraints on the motion of the objects between cameras, which are positions, object
velocities and transition times. A Bayesian formulation of the problem was used to
reconstruct the paths of objects across multiple cameras. They required manual input of the
topology of allowable paths of movement and the transition probabilities. Huang andRussell [10] used a probabilistic approach that is a combination of appearance matching
and transition times of cars in non-overlapping cameras with known topology. The
appearance of a car is evaluated using the color and the transition times modeled as
Gaussian distributions.
3 System architecture and metadata scheme
The surveillance system operates continuously or only as required to monitor a particularevent. To develop the intelligent surveillance system, various contexts are acquired from
physical sensors in monitoring areas. The situation is recognized by analyzing the context
data collected from the physical sensors. Then, the surveillance system generates metadata.
The goal of the metadata rule is to effectively index very large video surveillance databases
and to enable unified searches and management between distributed or heterogeneous
surveillance systems more efficiently.
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
5/20
3.1 System architecture
Figure 1 depicts our intelligent video surveillance system consisting of three different
layers. First, the low-level context module in the bottom layer collects the measurable data
from sensing hardware in the monitoring area. In this paper, the system receives audio-visual data and RFID tag data from cameras, microphones, and RFID readers. Data
acquired from various sensors are transmitted to the high-level context aware module. The
high-level context module recognizes human actions, such as hugging, snatching,
trespassing and tampering, by analyzing audio-visual data. The abnormal context aware
module judges whether the context is normal; if it is abnormal, it constructs the community
and gives an instruction for the appropriate services, as shown in Fig. 2.
Figure 3 shows the intelligent surveillance system architecture. The components are
described as follows.
Sensing Infrastructure: Sensing Infrastructure is used to collect various data fromheterogeneous sensing hardware devices in a ubiquitous network environment. This
paper used cameras, GPS for a location awareness sensor, and microphone for a noise
sensor to acquire various data in the monitoring area. Data from the Sensing
Infrastructure are transmitted to the Context Aggregator and are modified as our
predefined format for the context awareness.
Context Database: Context Database refers to the module in which the modified data from
the Context Broker (which is used for the future awareness of context) are stored. The
corresponding data are represented as space safety index, personal safety index and so on.
Context Broker: Context Broker stores the data into the context DB that is transmitted from
the Context Aggregator. Data are processed for the usage of the corresponding space.
Community Manager: When an event occurs in a specific location according to our
predefined criteria, Community Manager gives instruction to its Service Invocator to
construct relevant services that are defined by Community Editor.
Community Editor: Community Editor constructs the community that makes a service
when a pre-defined event occurs in our monitoring area. The Community is
dynamically constructed and stored in the Community Template Repository.
Low-level Context-aware
High-level Context-aware
Abnormal Context
Recognition
Fig. 1 3-tier context-aware surveillance system
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
6/20
Service Discoverer and Invocator: When an event occurs, Context Manager finds an
appropriate service through the Service Discoverer; and, if it exists, Service Invocator
performs the relevant service stored in the Community Template.
When a tampering action occurs in the monitoring area, sensing data are transmitted to
Context Broker and Context Broker commands Index Agent to update the latest space
safety index in the Index Database. A camera application computes its space safety index in
the index DB. If the computed space safety index value exceeds the threshold, it commands
the camera to monitor the area by the PTZ function. Finally, the user agent sends an alarm
message to users by computation of the space safety index.
Fig. 3 System architecture
Topology-based Context
Propagation
Camera Association
Multi-camera Tracking
Object Localization
Tampering Detection
Fig. 2 Appropriated services for intelligent surveillance system
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
7/20
A multi-camera tracking scheme is applied for a continuous video acquisition of the
object movement. After the physical setting on the camera and system environments,
Field of View (FOVs) of fixed and PTZ surveillance cameras are automatically set by
an image similarity comparison. The fixed cameras cover the Region Of Interest (ROI)
and analyze the real-time images for object representation and tracking in case ofabnormal situations. The system automatically sends an alarm message to the
surveillance system when tampering or violence occurs. After receiving the message,
the system shows the object in images and indicates the object location in the safety
index, Google satellite map, and a 2D map. If the object moves in the FOV of the fixed
camera, a PTZ camera traces the object using PTZ control. Otherwise, if the object
disappears from the FOV of the fixed camera, our system attempts to obtain the object
through our autonomic collaboration method employing adjacent camera topology in
the non-overlapping zone.
The main purpose of the intelligent surveillance system is to provide real time event
detection based upon established rules. Monitoring and surveillance agents then receive
alerts in real time, allowing them to address threats and other events of importance
proactively within their environment. However, the surveillance systems have different
established event rules and message exchanging rules with vendors. Thus, metadata
standardization is required to enable the intelligent surveillance systems to exchange
analyzed data. The metadata rule is to help exchange analyzed information between
distributed systems or heterogeneous systems. We define the metadata rule for exchanging
analyzed information between intelligent video surveillance systems that automatically
analyzes video data acquired from cameras.
3.2 Metadata scheme
Surveillance metadata should be constructed with a camera unique ID, camera resolution,
power on/off status, and camera location information. When a moving object appears in the
FOV, the object color feature is constructed as metadata. The object color feature is
classified as head, body, upper, and the lower part in the HSI color space. In addition,
metadata consists of a unique ID, size, object location, camera location, type, action, and
additional information. Figure 4 shows the schema diagram of metadata. In the next section,
we will describe audio-visual data analysis methods to generate metadata.
4 Audio-visual data analysis methods
The described framework in this paper includes real-time video data analysis methods for
an automated surveillance system, which is listed as object identification, tampering
detection, object size analysis, object location analysis, moving object tracking. This
chapter describes audio-visual data analysis methods to develop intelligent video
surveillance and to generate metadata.
4.1 Object identification
We used the background subtraction scheme for object classification. This divides the
background and moving objects. The background subtraction algorithm captures a
sequence of images containing moving objects from a static single camera and detects
moving objects from the reference background image. We statistically analyze the reference
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
8/20
Metadata
extension1..
MetadataID
xs:ID
Object
1..
ObjectType
Camera
CameraType
Sync
SYNCType
File
Comment
xs:string
Object
1..
ObjectType1..
ObjectID
xs:ID
Color
ColorType
Size
Location
LocationType
Type
restriction
Action
xs:string
Comment
xs:string
SYNCSYNCType
Fig. 4 Schema diagram of metadata
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
9/20
background image in HSI colour space for fifty frames with different illuminations and all
pixels of the static background scene image are modeled as Gaussian distribution with
respect to the hue and saturation values. After the preprocessing for analyzing the
background image, a sequence of images containing a moving human captured from a
camera is converted into HSI colour images and subtracted from the reference backgroundimage. If the subtraction values are greater than the threshold values which are derived
based on the variance values from the background image, those pixels are determined as
belonging to the foreground pixels.
After background subtraction, the object is identified by moving direction and color
histogram of the object in our system. When objects move in the monitoring area,
pixel-level subtraction results in the separation of background and object image.
However, these subtracted data have numerous useless noisy and ungrouped pixels.
Thus, these pixels are eliminated and blobs for the grouped pixels are grouped as a
moving object. Figure 5 shows background extraction and object movement orientation
analysis.
4.2 Tampering detection
Image difference comparison is used for tampering and the camera position setting.
Figure 6 shows tampering detection; Fig. 6 (c) shows the subtraction result of Fig. 6 (a)
Camera
CameraType
CameraID
xs:ID
Resolution
Status
restriction
Location
LocationType
Color
ColorType
Body
Head
Upper
Lower
Fig. 4 (continued)
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
10/20
and (b). In Fig. 6 (c), the unchanged part of Fig. 6 (a) and 6 (b) has close-to-zero RGB
values. When the image difference value exceeds the 80% threshold of non-zero RGB
pixel over the entire image, an alarm message is initiated. When a tampering alarm is
received, the system predicts an objects movement and controls the adjacent cameras to
acquire a continuous object using the PTZ function. In our previous paper [15], an object
movement routine was graphically presented considering the spatial relation of the camera
and the time-spatial relation of the object appearance and disappearance.
4.3 Object size and location analysis
The size of an object is determined by the distance to the object and the focal length of the
camera. To determine the distance to an object of unknown size is possible using the
(a) Monitoring area (b) Tampering action (c) Image difference
Fig. 6 Tampering detection
TIME : 1_30_16_6_46_718center coordinate : (143,139)
angle : 243.47previous c oordinate : (143,139)(161,130) (147,122) (144,118)
(0,0)
TIME : 1_30_16_9_8_765cen ter coordinate : (312,160)
ang le : 115.21pre vious c oordinate : (312,160)(210,112) (194,112) (185,112)
(213,112)
Fig. 5 Background extraction and object movement orientation analysis
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
11/20
knowledge about the height of the camera and the bearing to the point where the object
meets the ground. Therefore, object size y is computed by focal length f, camera height yc,
tilting angle x,
y fyc fsin qx vc vt cos qx = fsin qx vc vb cos qx fyc
vc vt sin qx fcos qx; 1
where vc is the center coordinate of the object in a camera, vt is the top coordinate of the
object in a camera, vb is the bottom coordinate of the object in a camera.
An object location y is computed by object size y, camera location h and camera
height h.
y0 h0 h y tan q ; 2
where h is the GPS location of a camera (Fig. 7).
4.4 Activity recognition
Human activity is recognized by [4], as shown in Fig. 8. The method in [4] can recognize
human activities, such as walking, turning, punching and sitting.
The proposed system adopts several action classifiers as the movement direction
of the object to recognize the human action view-invariantly. Then, the proposed
system selects a classifier based on the moving path of the target object. We trainthe Multi-Layer Perceptron (MLP) using 320 actions obtained from four subjects. When
a punching action occurs, the method sends an alarm message to our surveillance
system.
4.5 Moving object tracking
In the case of a multi-camera tracking system, while one fixed camera shows the ROI,
the PTZ camera is controlled by analyzing the moving image of the fixed camera. At
(a) Side view (b) Top-down view ( is FOV)
h
y
y' h'y''
/2
h'
|h'-y''|
y'
'
(h-y)tan
Fig. 7 Calculation of object size and location in camera environments
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
12/20
this time, the PTZ camera should control PTZ dependent on the angle of view and
zoom level. Thus, additional physical position adjustment between the fixed and PTZ
camera is essential for object tracking. In this paper, the camera topology can be
adjusted by image similarity comparison automatically. The camera position setting
algorithm is as follows.
Algorithm 1: Camera Position Setting
1: SetCameraPosition(&FixedCamera, fZoomLev, fHeight);
2: SetCameraPosition(&PTZCamera, fZoomLev, fHeight);3: image Rep[], FixedImg;
FixedImg = SaveImgFromFixedCam( );
Do Panning PTZ Camera
{ Rep = SaveImage( );
} while (From leftmost to rightmost)
4: CalculateImgDiff(Rep[], FixedImg);
5: SetPosition (Min (Rep[]));
We set the fixed camera position to the specific height and zoom level, as shown
above. The PTZ camera was calibrated using the fixed cameras height and zoom level.Then, we collected the representative images through the PTZ camera panning, which
covered entire monitoring areas. We set the PTZ camera location that satisfies the
minimum difference by calculating the differences between the fixed camera images
and the representative images.
An object is detected by preprocessing, which subtracts the object from a
background image. The background image without any objects is stored, so that an
object can be extracted by subtracting object images from the background image when
needed. However, the background subtraction method could not be started immediately
after setting the physical camera until no object appears in the background image. Italso takes time to store the background image. In addition, the background image may
be newly stored with luminance and the object is changed by light and wind. In this
paper, we use a motion history method of OpenCV library [16], which has no
background learning process, to trace moving objects using the centric point in real-time.
The tracking process is as follows.
Fig. 8 Screenshot of activity recognition
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
13/20
Algorithm 2: Moving Object Tracking
Image buffer[];
Point objPoint[];
MotionSegmentation seg[];
int minDistance, nowobjPoint;
int angle[];
buffer[] = SaveImg ( );
cvCvtColor (buffer[], CV_BGR2GRAY );
cvAbsDiff (buffer[]);
seg[] = cvUpdateMotionHistory (buffer[],DURATION);
for (i = 1; i < Num(seq); i++)
{
extractObjFromSegmentation (seg[]);
}objPoint[] = GetCenterPointofObj (seg[]);
angle[] = GetObjAngle (seg[]);
for(i = 1; i < Num(seq); i++)
{
int tmpDistance;
tmpDistance = CalculateEuclideanDistance (prevobjPoint, objPoint[i]);
if (tmpDistance ACCEPTABLE_MIN_DISTANCE)
nowobjPoint = Compare (prevAngle, objAngle[i]);
elsenowobjPoint = objPoint[i];
}
}
As depicted in algorithm 2, images are stored in a buffer, and then are converted into
gray-scale images. Our system obtains the motion history of two images using the image
difference calculation. We adopted the cvUpdateMotionHistory of the OpenCV API to
update the motion history. Motion history can be updated by the non-zero pixel silhouette
image when motion occurs in the image. In this paper, we used the 1 second time-stamp for
the image storage time and exclude the non-zero pixel silhouette images that have asummation of width and height below 20 pixels.
As shown in Fig. 9, when objects move in the image for a specific period, a blue-marked
motion history is updated by the comparison of two image frames. We represent an object
as the center point in a circle that covers the whole object. In addition, a line from the
centric point to a circular arc shows the movement direction of the object, as shown in
Fig. 9. We can predict the centric point of an object in the next frame using the centric point
and direction of the object. First, we continuously store the center point and direction in the
buffer. The object center point in next frame is set by the point that has the shortest distance
between object center points of the previous frame and the next frame. The Euclidean
method [6] was used to measure distance between center points. In the case of an occlusion
of multiple objects, the movement direction between the previous frame and next frame is
used for the objects identification.
The centric point is used for Panning and Tilting the PTZ camera, as used in the following
algorithm, considering the zoom level difference between the fixed camera and the PTZ camera.
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
14/20
Algorithm 3: PTZ Function
1: int MovFactor, prevX, prevY, newX, newY;
2: MovFactor = (PTZCamZoomLev/FixedCamZoomLev);
3: newX = FixedCamCenterX + (OjbCenterX FixedCamCenterX) x MovFactor;4: newY = FixedCamCenterY + (OjbCenterY FixedCamCenterY) x MovFactor;
5: DoPTZ (newX, newY);
We can calculate the degree of panning and tilting using the given coordinates depicted
in the function DoPTZ (newX, newY).
5 Physical prototyping of intelligent video surveillance system
We performed experiments with seven CCD cameras (704480 resolutions) to evaluatethe performance of our system. We used PCs with Intel 64-bit Xeon 3.2 G Processors
and 2 GB of RAM as the hardware platform, and Microsoft SQL Server 2000 as the
underlying DBMS. The system automatically recognizes various dangerous situations in
public areas and classifies the safety level by means of the environments safety index
models using network camera collaboration. Figure 10 shows color-based object
identification using two cameras and violence recognition using an acoustic sensor. In
Fig. 10 (a), the face of an entering object is detected by Adaboost algorithm in the
entrance. Each object is classified by the HSI color model. Thus, the intelligent
surveillance system identifies and tracks unauthenticated people by analyzing the color
and pattern of clothing. In Fig. 10 (b), dangerous situations are recognized by analyzing
audio and visual data. Our system detects abnormal situations using the decibel (dB)
levels of scream pitches.
Figure 11 shows the physical prototype of the intelligent video surveillance system
composed of a screen (4.2 m2 m) with six projectors, one ceiling projector, four fix
cameras, four PTZ cameras, one speed dome camera and two acoustic sensors. Figure 11
(a) Real image (b) Silhouette image
Fig. 9 Object tracking
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
15/20
(a) Object identification
(b) Abnormal situation detection
Fig. 10 Object identification violence recognition using multiple sensors
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
16/20
(a) Control center
(b) Screenshot of ISS
(c) GIS using Google Earth
(d) USS monitor
Fig. 11 Physical prototyping
of intelligent video surveillance
system
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
17/20
(c) shows a satellite map that covers the entire boundary of the Earth. When an accident
occurs, the system zooms into the accident area. The system indicates the accident point
using a red circle in a 2D map. We use the Google Earth API functions [8] to mark the
monitoring area for efficiency.
In Fig. 11 (d), the level of spatial importance is computed using space features andfacilities. Based on this space safety index, we reconstruct the monitoring area when an
accident occurs in a specific space. For example, the safety index of an ATM facility is
higher than that of other areas. At this time, if violence happens in the ATM area, the level
of spatial importance is recalculated. Then, the system controls the adjacent camera to
monitor the ATM area using PTZ control. If violence occurs in the ATM area, the area is
marked with red in the 2D map.
6 Conclusions
In this paper, we have developed an intelligent surveillance system that provides
various services, such as object identification, object size analysis, object localization,
tampering detection, activity recognition, and moving object tracking. The surveillance
systems have different established event rules and message exchanging rules with
vendors. Thus, we have defined the metadata rules to exchange analyzed information
between distributed surveillance systems or heterogeneous surveillance systems. A 3-
tier context-awareness conceptual framework is presented to identify the design
principles of the intelligent surveillance system. Most importantly, the design
prototypes, as the convergence of computers and buildings, have shown the potentialfor a profound transformation of design practice in smart space design. The design
framework and the implementation of the prototypes have served as a logical basis to
elaborate broad design concepts and intelligent video computing technologies that may
be performed toward future smart surveillance systems. In future work, we will improve
robust object identification methods and create an administrative mobile device
interface.
Acknowledgment
This research was supported by the MKE(The Ministry of Knowledge Economy),Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA
(National IT Industry Promotion Agency) (NIPA-2010-C1090-1031-0004) and this research is also
supported by the Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier
R&D Program of the Ministry of Knowledge Economy (MKE), the Korean government, as a result of UCN s
subproject 10C2-T3-10M.
References
1. Ali A, Aggarwal J (2001) Segmentation and recognition of continuous human activity. In: Detection and
recognition of events in video, 2001. Proceedings. IEEE Workshop on, pp 2835
2. Ayers D, Shah M (2001) Monitoring human behavior from video taken in an office environment. Image
Vis Comput 19(12):833846
3. Cai Q, Aggarwal JK (1996) Tracking human motion using multiple cameras. In: ICPR96: Proceedings
of the International Conference on Pattern Recognition (ICPR 96) Volume III-Volume 7276.
Washington, DC, USA: IEEE Computer Society, pp 6872
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
18/20
4. Chae YN, Kim Y-H, Choi J, Cho K, Yang HS (2009) An adaptive sensor fusion based objects tracking and
human action recognition for interactive virtual environments. In: VRCAI 09: Proceedings of the 8th
International Conference on Virtual Reality Continuum and its Applications in Industry. New York, NY,
USA: ACM, pp 357362
5. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach
Intell 25:564
5756. Danielsson P (1980) Euclidean distance mapping. Comput Graph Image Process 14(3):227248
7. Duong T, Bui H, Phung D, Venkatesh S (2025 2005) Activity recognition and abnormality detection
with the switching hidden semi-markov model. In: Computer Vision and Pattern Recognition, 2005.
CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp 838845
8. Google Earth API. Online Available. http://code.google.com/apis/earth/
9. Huang C-L, Liao B-Y (2001) A robust scene-change detection method for video segmentation. IEEE
Trans Circuits Syst Video Technol 11(12):12811288
10. Huang T, Russell S (1997) Object identification in a Bayesian context. In: IJCAI97: Proceedings of the
Fifteenth international joint conference on Artifical intelligence. San Francisco, CA, USA: Morgan
Kaufmann Publishers Inc., p. 12761282
11. Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S (1995) An architecture for multiple
perspective interactive video. In: MULTI- MEDIA95: Proceedings of the third ACM International
Conference on Multimedia. New York, NY, USA: ACM, pp 201212
12. Kettnaker V, Zabih R (Jul 1999) Counting people from multiple cameras. In: Multimedia computing and
systems, 1999. IEEE International Conference on, vol. 2, pp 267271
13. Lv F, Kang J, Nevatia R, Cohen I, Medioni G (2004) Automatic tracking and labeling of human
activities in a video sequence. In PETS04
14. Nam J, Tewfik A (2005) Detection of gradual transitions in video sequences using b-spline interpolation.
IEEE Trans Multimedia 7(4):667679
15. Nam Y, Ryu J, Joo Choi Y, Duke Cho W (2007) Learning spatio-temporal topology of a multi-camera
network by tracking multiple people. World Acad Sci Eng Tech 4(4):254259
16. OpenCV, Open Computer Vision Library. http://sourceforge.net/projects/opencvlibrary/
17. Petrushin V, Wei G, Ghani R, Gershman A (2828 2005) Multiple sensor indoor surveillance: problems
and solutions. In: Machine Learning for Signal Processing, 2005 IEEE Workshop on, pp 349
35418. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition.
Proc IEEE 77(2):257286
19. Ribeiro PC, Santos-victor J (2005) Human activity recognition from video: modeling, feature selection
and classification architecture. In: International Workshop on Human Activity Recognition and
Modeling, pp 6170
20. Ribnick E, Atev S, Masoud O, Papanikolopoulos N, Voyles R (Nov. 2006) Real-time detection of
camera tampering. In: Video and signal based surveillance, 2006. AVSS 06. IEEE International
Conference on
21. Serby D, Meier E, Van Gool L (2326 2004) Probabilistic object tracking using multiple features. In: Pattern
recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp 184187 Vol.2
22. Veenman C, Reinders M, Backer E (2001) Resolving motion correspondence for densely moving points.
IEEE Trans Pattern Anal Mach Intell 23(1):547223. Viola P, Jones M, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J
Comput Vis 63:153161
24. Yilmaz A, Li X, Shah M (2004) Contour-based object tracking with occlusion handling in video
acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):15311536
25. Zelnik-Manorand L, Irani M (2001) Event-based analysis of video. In: Computer Vision and Pattern
Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 2,
pp 123130
26. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2025 2005) Semi-supervised adapted HMMs for
unusual event detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE
Computer Society Conference on, vol. 1, pp 611618
27. Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (Jul 1999) Improving color based video
shot detection. In: Multimedia computing and systems, 1999. IEEE International Conference on, vol. 2, pp752756 vol.2
28. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Computer vision and
pattern recognition, Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp
819826
Multimed Tools Appl
http://code.google.com/apis/earth/http://sourceforge.net/projects/opencvlibrary/http://sourceforge.net/projects/opencvlibrary/http://code.google.com/apis/earth/7/27/2019 9fcfd50d1bd3461721
19/20
Yunyoung Nam received B.S, M.S. and Ph.D. degree in Information and Computer Engineering from Ajou
University, Korea in 2001, 2003, and 2007 respectively. He was a research engineer in the Center ofExcellence in Ubiquitous System from 2007 to 2009. He was a post-doctoral researcher at Stony Brook
University in 2009, New York. He is currently a research professor in Ajou University in Korea. He also
spent time as a visiting scholar at Center of Excellence for Wireless & Information Technology (CEWIT),
Stony Brook University - State University of New York Stony Brook, New York. His research interests
include multimedia database, ubiquitous computing, image processing, pattern recognition, context-
awareness, conflict resolution, wearable computing, and intelligent video surveillance.
Seungmin Rho received his MS and PhD Degrees in Information and Computer Engineering from Ajou
University, Korea, in Computer Science from Ajou University, Korea, in 2003 and 2008, respectively. In
20082009, he was a Postdoctoral Research Fellow at the Computer Music Lab of the School of Computer
Science in Carnegie Mellon University. He is currently working as a Research Professor at School of
Electrical Engineering in Korea University. His research interests include database, music retrieval,
multimedia systems, machine learning, knowledge management and intelligent agent technologies. He hasbeen a reviewer in Multimedia Tools and Applications (MTAP), Journal of Systems and Software,
Information Science (Elsevier), and Program Committee member in over 10 international conferences. He
has published 14 papers in journals and book chapters and 21 in international conferences and workshops.
He is listed in Whos Who in the World.
Multimed Tools Appl
7/27/2019 9fcfd50d1bd3461721
20/20
Dr. Jong Hyuk Park received his Ph.D. degree in Graduate School of Information Security from Korea
University, Korea. From December, 2002 to July, 2007, Dr. Park had been a research scientist of R&DInstitute, Hanwha S&C Co., Ltd., Korea. From September, 2007 to August, 2009, He had been a professor at
the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor
at the Department of Computer Science and Engineering, Seoul National University of Science and
Technology, Korea. Dr. Park has published about 100 research papers in international journals and
conferences. He has been serving as chairs, program committee, or organizing committee chair for many
international conferences and workshops. He is a president of Korea Information Technology Convergence
Society (KITCS). He is editor-in-chief (EiC) of International Journal of Information Technology,
Communications and Convergence (IJITCC), InderScience. He was EiCs of the International Journal of
Multimedia and Ubiquitous Engineering (IJMUE) and the International Journal of Smart Home (IJSH). He is
Associate Editor / Editor of 14 international journals including 8 journals indexed by SCI(E). In addition, he
has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John
Wiley, Oxford Univ. press, Hindawi, Emerald, Inderscience. His research interests include security anddigital forensics, ubiquitous and pervasive computing, context awareness, multimedia services, etc. He got
the best paper award in ISA-08 conference, April, 2008. And he got the outstanding leadership awards from
IEEE HPCC-09 and ISA-09, June, 2009.
Multimed Tools Appl