9fcfd50d1bd3461721

Embed Size (px)

Citation preview

  • 7/27/2019 9fcfd50d1bd3461721

    1/20

    Intelligent video surveillance system: 3-tier context-aware

    surveillance system with metadata

    Yunyoung Nam & Seungmin Rho & Jong Hyuk Park

    # Springer Science+Business Media, LLC 2010

    Abstract This paper presents an intelligent video surveillance system with the metadata rule

    for the exchange of analyzed information. We define the metadata rule for the exchange of

    analyzed information between intelligent video surveillance systems that automatically

    analyzes video data acquired from cameras. The metadata rule is to effectively index very

    large video surveillance databases and to unify searches and management between distributed

    or heterogeneous surveillance systems more efficiently. The system consists of low-level

    context-aware, high-level context-aware and intelligent services to generate metadata for the

    surveillance systems. Various contexts are acquired from physical sensors in monitoring areas

    for the low-level context-aware system. The situation is recognized in the high-level context-

    aware system by analyzing the context data collected in the low-level system. The system

    provides intelligent services to track moving objects in Fields Of View (FOVs) and to recognize

    human activities. Furthermore, the system supports real-time moving objects tracking with

    Panning, Tilting and Zooming (PTZ) cameras in overlapping and non-overlapping FOVs.

    Keywords Object identification . Object localization . Object tracking . CCTV.

    Surveillance . PTZ camera . Metadata

    1 Introduction

    The advent of increased network bandwidth and improved image processing technologies

    has led to the rapid emergence of intelligent video surveillance systems. Traditional

    Multimed Tools Appl

    DOI 10.1007/s11042-010-0677-x

    Y. Nam

    Center of Excellence for Ubiquitous System, Ajou University, Suwon, South Korea

    e-mail: [email protected]

    S. Rho (*)School of Electrical Engineering, Korea University, Seoul, South Korea

    e-mail: [email protected]

    J. H. Park

    Department of Computer Science and Engineering, Seoul National University of Science

    and Technology, Seoul, South Korea

    e-mail: [email protected]

  • 7/27/2019 9fcfd50d1bd3461721

    2/20

    closed-circuit television (CCTV) requires a relative few operators to continuously monitor a

    significant number of cameras in areas, such as military installations, roads, and airports

    that need security. Intelligent surveillance systems can provide automated services, such as

    abrupt incursion detection, shortest path recommendation using traffic jam analysis,

    robbery monitoring, and people counting.In general, video surveillance systems monitor the specific activities by analyzing a

    recorded video and multi-channel monitoring. It is a considerably tedious task for people to

    monitor multi-channel division screens for 24 h. In addition, when the target object moves

    from one area to the other area, a camera handover is required. Accordingly, intelligence

    video surveillance systems not only provide real-time abnormal event detection by

    analyzing images acquired from cameras, but also acquire continuous video sequences

    from adjacent cameras using panning, tilting and zooming (PTZ) with multi-channel, multi-

    area, and an immediate acquisition of the monitoring area video with complex and vast

    environments.

    More and more cameras are being in place forming huge surveillance systems. Within

    those systems required basic functionalities are identical. The video stream needs to be

    transmitted from the site to an appropriate place where it will be archived. The video might

    be looked at by a number of persons and in case of an incident it could be exported to the

    appropriate authorities. In order to identify a requested stream it would be necessary to

    enhance the pure video stream by appropriate metadata. The information about recording

    time and place as well as camera parameters used for recording would be sufficient to

    achieve basic interoperability. For efficient archival a packaging of the video and metadata

    information into a file format should be supported. That file format should also provide for

    the inclusion of user data and possibly additional MPEG-7 metadata. This metadata shouldprovide key functionality to support the activities of the CCTV manufacturers, installers

    and users. This paper describes the metadata rule for exchanging analyzed information in

    surveillance systems.

    The remaining sections of this paper are organized as follows. Section 2 introduces

    related work. The system architecture and metadata scheme are given in Section 3.

    Section 4 presents video analysis methods for metadata generation. Section 5 shows

    physical prototyping of an intelligent video surveillance system. Finally, we conclude this

    paper and discuss our future work in Section 6.

    2 Background

    2.1 Surveillance systems

    Surveillance systems have played an important role in the management of public places relating

    to safety and security. The explosion in the number of cameras that should be monitored, the

    accruing costs of offering monitoring personnel and the limitations of human operators to

    uphold sustained levels of concentration, severely circumscribe the efficaciousness of these

    systems. Alternatively, subsequent advances in information and communication technologiescan potentially offer considerable improvements. The deployment of technology to maintain

    surveillance is used in modern urban environments [17].

    In many surveillance applications, events of interest may occur rarely. For these unusual

    events (or abnormal, rare events), it is difficult to collect sufficient training data for

    supervised learning to develop unusual event models. In this case, many unusual event

    detection algorithms [7, 25, 26, 28] that require a large amount of training data become

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    3/20

    unsuitable. Several algorithms have been proposed to address the difficulty of unusual

    event recognition with sparse training data. Zelnik-Manor et al. [25] and Zhong et al. [28]

    clustered the divided video clips into different groups based on a similarity measure. The

    groups with relatively small numbers of video clips are detected as unusual events.

    However, since unusual events have insufficient training data, clusters for these events maynot be sufficiently representative to predict future unusual events. Zhang et al. [26]

    proposed a method by developing the unusual event model from that of usual events. This

    method provides a hint on how to deal with the lack-of-training-data issue. However, they

    obtained all unusual event models by adapting from the general usual-event model, while in

    reality, the usual events and unusual events can be vastly different in nature. In this paper,

    we develop abnormal activity recognition based on a motion history image and a moving

    trajectory of objects.

    2.2 Tampering detection

    Position, angle, and power of a camera can be arbitrarily changed intentionally or

    accidentally. We implemented a tampering module using image difference calculation to

    deal with this situation. Scene change detection is essential for the implementation of the

    tampering module to detect tampering. Numerous scene change detection schemes have

    been proposed. Nam [14] detected a gradual scene change, such as fade-in, fade-out, and

    overlaps by B-spline curve. However, this is inappropriate for the tampering module

    implementation, because it does not cope with abrupt scene changes. In addition, Zhao [27]

    and Huang [9] detected moving picture scene changes using color histogram and pixel-

    based features. The color or pixel reflects the global feature of the image to detect thechange of a screen. Unlike tampering detection, scene change detection in moving images

    is more difficult than orientation detection of a camera. Thus, it needs to be modified for

    our tampering method. In addition, Ribnick [20] used a short-term and long-term image

    buffer to calculate image similarity using image chromaticity, L1-norm value and histogram

    value. In this paper, we implemented tampering detection using the RGB color feature. The

    trial and error method was used to set the threshold value.

    An object in an image can be represented by the shape of a point, circle, square, contour,

    and silhouette. A point-employed method describes an object as a set of points [22] or the

    centroid [21]. In addition, an object also can be represented by an ellipse [5]. The contour

    and silhouette method [24] recognizes the object as a contour that is the silhouette boundaryobtained inside the contour. It is suitable for tracking complex non-rigid shapes. Last, the

    object skeleton method [1] uses the medial axis transformation to the silhouette of an

    image. In this paper, we represent an object as the representative point of an object and use

    it to track the object.

    2.3 Human activity recognition

    Human activity recognition is a challenging task due to the non-rigidness of the human

    body, as human motion lacks a clear categorical structure: the motion can be often classifiedinto several categories simultaneously, because some activities have a natural compositional

    structure in terms of basic action units and even the transition between simple activities

    naturally has temporal segments of ambiguity and overlap. Human motion often displays

    multiple levels of increasing complexity that range from action-units to activities and

    behaviors. The more complex the human behavior, the more difficult it becomes to perform

    recognition in isolation. Motions can occur in various timescales and as they often exhibit

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    4/20

    long-term dependencies, long contexts of observations may need to be considered for correct

    classification at particular time-steps. For instance, the motion class at a current time-step may

    be hard to predict using only the previous state and the current image observation alone, but

    may be less ambiguous if several neighboring states or observations possibly both backward

    and forward in time are considered. However, this computation would be hard to perform usinga Hidden Markov Model (HMM) [18] where stringent independence assumptions among

    observations are required to ensure computational tractability.

    Many algorithms have been proposed to recognize human activities. Lv et al. [13] and

    Ribeiro [19] focus on the selection of suitable feature sets for different events. Models, such

    as HMM [7, 18], state machine [2], Adaboost [23], are also widely used for activity

    recognition. However, most of the methods proposed in these works are inflexible to add

    new activities. They are trained or constructed to recognize predefined events. If new

    activities are added, the entire model has to be re-trained or the entire system has to be

    re-constructed. Other methods [25, 28] tried to use a similarity metric, so that different events

    can be clustered into different groups. This approach has more flexibility for newly added

    events. However, due to the uncertain nature of the activity instances, it is difficult to find a

    suitable feature set, such that all samples of an event are clustered closely around a center.

    2.4 Object tracking based on multiple cameras

    A single camera is insufficient to detect and track objects due to its limited field of view

    (FOV) or occlusion. Many approaches address detection and tracking using overlapping or

    non-overlapping multiple views. Tracking algorithms [3, 11] require camera calibration and

    a computation of the handoff of tracked objects between overlapped cameras. It isnecessary to share a considerable common FOV with the first camera to accomplish this.

    These requirements of overlapped cameras, however, are impractical due to the large

    number of cameras required and the physical constraint upon their placement. Thus, it must

    be able to deal with a non-overlapping region in the system, where an object is invisible to

    any camera. Kettnaker and Zabih [12] presented a Bayesian solution to track objects across

    multiple cameras where the cameras have a non-overlapping field of view. They used

    constraints on the motion of the objects between cameras, which are positions, object

    velocities and transition times. A Bayesian formulation of the problem was used to

    reconstruct the paths of objects across multiple cameras. They required manual input of the

    topology of allowable paths of movement and the transition probabilities. Huang andRussell [10] used a probabilistic approach that is a combination of appearance matching

    and transition times of cars in non-overlapping cameras with known topology. The

    appearance of a car is evaluated using the color and the transition times modeled as

    Gaussian distributions.

    3 System architecture and metadata scheme

    The surveillance system operates continuously or only as required to monitor a particularevent. To develop the intelligent surveillance system, various contexts are acquired from

    physical sensors in monitoring areas. The situation is recognized by analyzing the context

    data collected from the physical sensors. Then, the surveillance system generates metadata.

    The goal of the metadata rule is to effectively index very large video surveillance databases

    and to enable unified searches and management between distributed or heterogeneous

    surveillance systems more efficiently.

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    5/20

    3.1 System architecture

    Figure 1 depicts our intelligent video surveillance system consisting of three different

    layers. First, the low-level context module in the bottom layer collects the measurable data

    from sensing hardware in the monitoring area. In this paper, the system receives audio-visual data and RFID tag data from cameras, microphones, and RFID readers. Data

    acquired from various sensors are transmitted to the high-level context aware module. The

    high-level context module recognizes human actions, such as hugging, snatching,

    trespassing and tampering, by analyzing audio-visual data. The abnormal context aware

    module judges whether the context is normal; if it is abnormal, it constructs the community

    and gives an instruction for the appropriate services, as shown in Fig. 2.

    Figure 3 shows the intelligent surveillance system architecture. The components are

    described as follows.

    Sensing Infrastructure: Sensing Infrastructure is used to collect various data fromheterogeneous sensing hardware devices in a ubiquitous network environment. This

    paper used cameras, GPS for a location awareness sensor, and microphone for a noise

    sensor to acquire various data in the monitoring area. Data from the Sensing

    Infrastructure are transmitted to the Context Aggregator and are modified as our

    predefined format for the context awareness.

    Context Database: Context Database refers to the module in which the modified data from

    the Context Broker (which is used for the future awareness of context) are stored. The

    corresponding data are represented as space safety index, personal safety index and so on.

    Context Broker: Context Broker stores the data into the context DB that is transmitted from

    the Context Aggregator. Data are processed for the usage of the corresponding space.

    Community Manager: When an event occurs in a specific location according to our

    predefined criteria, Community Manager gives instruction to its Service Invocator to

    construct relevant services that are defined by Community Editor.

    Community Editor: Community Editor constructs the community that makes a service

    when a pre-defined event occurs in our monitoring area. The Community is

    dynamically constructed and stored in the Community Template Repository.

    Low-level Context-aware

    High-level Context-aware

    Abnormal Context

    Recognition

    Fig. 1 3-tier context-aware surveillance system

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    6/20

    Service Discoverer and Invocator: When an event occurs, Context Manager finds an

    appropriate service through the Service Discoverer; and, if it exists, Service Invocator

    performs the relevant service stored in the Community Template.

    When a tampering action occurs in the monitoring area, sensing data are transmitted to

    Context Broker and Context Broker commands Index Agent to update the latest space

    safety index in the Index Database. A camera application computes its space safety index in

    the index DB. If the computed space safety index value exceeds the threshold, it commands

    the camera to monitor the area by the PTZ function. Finally, the user agent sends an alarm

    message to users by computation of the space safety index.

    Fig. 3 System architecture

    Topology-based Context

    Propagation

    Camera Association

    Multi-camera Tracking

    Object Localization

    Tampering Detection

    Fig. 2 Appropriated services for intelligent surveillance system

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    7/20

    A multi-camera tracking scheme is applied for a continuous video acquisition of the

    object movement. After the physical setting on the camera and system environments,

    Field of View (FOVs) of fixed and PTZ surveillance cameras are automatically set by

    an image similarity comparison. The fixed cameras cover the Region Of Interest (ROI)

    and analyze the real-time images for object representation and tracking in case ofabnormal situations. The system automatically sends an alarm message to the

    surveillance system when tampering or violence occurs. After receiving the message,

    the system shows the object in images and indicates the object location in the safety

    index, Google satellite map, and a 2D map. If the object moves in the FOV of the fixed

    camera, a PTZ camera traces the object using PTZ control. Otherwise, if the object

    disappears from the FOV of the fixed camera, our system attempts to obtain the object

    through our autonomic collaboration method employing adjacent camera topology in

    the non-overlapping zone.

    The main purpose of the intelligent surveillance system is to provide real time event

    detection based upon established rules. Monitoring and surveillance agents then receive

    alerts in real time, allowing them to address threats and other events of importance

    proactively within their environment. However, the surveillance systems have different

    established event rules and message exchanging rules with vendors. Thus, metadata

    standardization is required to enable the intelligent surveillance systems to exchange

    analyzed data. The metadata rule is to help exchange analyzed information between

    distributed systems or heterogeneous systems. We define the metadata rule for exchanging

    analyzed information between intelligent video surveillance systems that automatically

    analyzes video data acquired from cameras.

    3.2 Metadata scheme

    Surveillance metadata should be constructed with a camera unique ID, camera resolution,

    power on/off status, and camera location information. When a moving object appears in the

    FOV, the object color feature is constructed as metadata. The object color feature is

    classified as head, body, upper, and the lower part in the HSI color space. In addition,

    metadata consists of a unique ID, size, object location, camera location, type, action, and

    additional information. Figure 4 shows the schema diagram of metadata. In the next section,

    we will describe audio-visual data analysis methods to generate metadata.

    4 Audio-visual data analysis methods

    The described framework in this paper includes real-time video data analysis methods for

    an automated surveillance system, which is listed as object identification, tampering

    detection, object size analysis, object location analysis, moving object tracking. This

    chapter describes audio-visual data analysis methods to develop intelligent video

    surveillance and to generate metadata.

    4.1 Object identification

    We used the background subtraction scheme for object classification. This divides the

    background and moving objects. The background subtraction algorithm captures a

    sequence of images containing moving objects from a static single camera and detects

    moving objects from the reference background image. We statistically analyze the reference

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    8/20

    Metadata

    extension1..

    MetadataID

    xs:ID

    Object

    1..

    ObjectType

    Camera

    CameraType

    Sync

    SYNCType

    File

    Comment

    xs:string

    Object

    1..

    ObjectType1..

    ObjectID

    xs:ID

    Color

    ColorType

    Size

    Location

    LocationType

    Type

    restriction

    Action

    xs:string

    Comment

    xs:string

    SYNCSYNCType

    Fig. 4 Schema diagram of metadata

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    9/20

    background image in HSI colour space for fifty frames with different illuminations and all

    pixels of the static background scene image are modeled as Gaussian distribution with

    respect to the hue and saturation values. After the preprocessing for analyzing the

    background image, a sequence of images containing a moving human captured from a

    camera is converted into HSI colour images and subtracted from the reference backgroundimage. If the subtraction values are greater than the threshold values which are derived

    based on the variance values from the background image, those pixels are determined as

    belonging to the foreground pixels.

    After background subtraction, the object is identified by moving direction and color

    histogram of the object in our system. When objects move in the monitoring area,

    pixel-level subtraction results in the separation of background and object image.

    However, these subtracted data have numerous useless noisy and ungrouped pixels.

    Thus, these pixels are eliminated and blobs for the grouped pixels are grouped as a

    moving object. Figure 5 shows background extraction and object movement orientation

    analysis.

    4.2 Tampering detection

    Image difference comparison is used for tampering and the camera position setting.

    Figure 6 shows tampering detection; Fig. 6 (c) shows the subtraction result of Fig. 6 (a)

    Camera

    CameraType

    CameraID

    xs:ID

    Resolution

    Status

    restriction

    Location

    LocationType

    Color

    ColorType

    Body

    Head

    Upper

    Lower

    Fig. 4 (continued)

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    10/20

    and (b). In Fig. 6 (c), the unchanged part of Fig. 6 (a) and 6 (b) has close-to-zero RGB

    values. When the image difference value exceeds the 80% threshold of non-zero RGB

    pixel over the entire image, an alarm message is initiated. When a tampering alarm is

    received, the system predicts an objects movement and controls the adjacent cameras to

    acquire a continuous object using the PTZ function. In our previous paper [15], an object

    movement routine was graphically presented considering the spatial relation of the camera

    and the time-spatial relation of the object appearance and disappearance.

    4.3 Object size and location analysis

    The size of an object is determined by the distance to the object and the focal length of the

    camera. To determine the distance to an object of unknown size is possible using the

    (a) Monitoring area (b) Tampering action (c) Image difference

    Fig. 6 Tampering detection

    TIME : 1_30_16_6_46_718center coordinate : (143,139)

    angle : 243.47previous c oordinate : (143,139)(161,130) (147,122) (144,118)

    (0,0)

    TIME : 1_30_16_9_8_765cen ter coordinate : (312,160)

    ang le : 115.21pre vious c oordinate : (312,160)(210,112) (194,112) (185,112)

    (213,112)

    Fig. 5 Background extraction and object movement orientation analysis

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    11/20

    knowledge about the height of the camera and the bearing to the point where the object

    meets the ground. Therefore, object size y is computed by focal length f, camera height yc,

    tilting angle x,

    y fyc fsin qx vc vt cos qx = fsin qx vc vb cos qx fyc

    vc vt sin qx fcos qx; 1

    where vc is the center coordinate of the object in a camera, vt is the top coordinate of the

    object in a camera, vb is the bottom coordinate of the object in a camera.

    An object location y is computed by object size y, camera location h and camera

    height h.

    y0 h0 h y tan q ; 2

    where h is the GPS location of a camera (Fig. 7).

    4.4 Activity recognition

    Human activity is recognized by [4], as shown in Fig. 8. The method in [4] can recognize

    human activities, such as walking, turning, punching and sitting.

    The proposed system adopts several action classifiers as the movement direction

    of the object to recognize the human action view-invariantly. Then, the proposed

    system selects a classifier based on the moving path of the target object. We trainthe Multi-Layer Perceptron (MLP) using 320 actions obtained from four subjects. When

    a punching action occurs, the method sends an alarm message to our surveillance

    system.

    4.5 Moving object tracking

    In the case of a multi-camera tracking system, while one fixed camera shows the ROI,

    the PTZ camera is controlled by analyzing the moving image of the fixed camera. At

    (a) Side view (b) Top-down view ( is FOV)

    h

    y

    y' h'y''

    /2

    h'

    |h'-y''|

    y'

    '

    (h-y)tan

    Fig. 7 Calculation of object size and location in camera environments

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    12/20

    this time, the PTZ camera should control PTZ dependent on the angle of view and

    zoom level. Thus, additional physical position adjustment between the fixed and PTZ

    camera is essential for object tracking. In this paper, the camera topology can be

    adjusted by image similarity comparison automatically. The camera position setting

    algorithm is as follows.

    Algorithm 1: Camera Position Setting

    1: SetCameraPosition(&FixedCamera, fZoomLev, fHeight);

    2: SetCameraPosition(&PTZCamera, fZoomLev, fHeight);3: image Rep[], FixedImg;

    FixedImg = SaveImgFromFixedCam( );

    Do Panning PTZ Camera

    { Rep = SaveImage( );

    } while (From leftmost to rightmost)

    4: CalculateImgDiff(Rep[], FixedImg);

    5: SetPosition (Min (Rep[]));

    We set the fixed camera position to the specific height and zoom level, as shown

    above. The PTZ camera was calibrated using the fixed cameras height and zoom level.Then, we collected the representative images through the PTZ camera panning, which

    covered entire monitoring areas. We set the PTZ camera location that satisfies the

    minimum difference by calculating the differences between the fixed camera images

    and the representative images.

    An object is detected by preprocessing, which subtracts the object from a

    background image. The background image without any objects is stored, so that an

    object can be extracted by subtracting object images from the background image when

    needed. However, the background subtraction method could not be started immediately

    after setting the physical camera until no object appears in the background image. Italso takes time to store the background image. In addition, the background image may

    be newly stored with luminance and the object is changed by light and wind. In this

    paper, we use a motion history method of OpenCV library [16], which has no

    background learning process, to trace moving objects using the centric point in real-time.

    The tracking process is as follows.

    Fig. 8 Screenshot of activity recognition

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    13/20

    Algorithm 2: Moving Object Tracking

    Image buffer[];

    Point objPoint[];

    MotionSegmentation seg[];

    int minDistance, nowobjPoint;

    int angle[];

    buffer[] = SaveImg ( );

    cvCvtColor (buffer[], CV_BGR2GRAY );

    cvAbsDiff (buffer[]);

    seg[] = cvUpdateMotionHistory (buffer[],DURATION);

    for (i = 1; i < Num(seq); i++)

    {

    extractObjFromSegmentation (seg[]);

    }objPoint[] = GetCenterPointofObj (seg[]);

    angle[] = GetObjAngle (seg[]);

    for(i = 1; i < Num(seq); i++)

    {

    int tmpDistance;

    tmpDistance = CalculateEuclideanDistance (prevobjPoint, objPoint[i]);

    if (tmpDistance ACCEPTABLE_MIN_DISTANCE)

    nowobjPoint = Compare (prevAngle, objAngle[i]);

    elsenowobjPoint = objPoint[i];

    }

    }

    As depicted in algorithm 2, images are stored in a buffer, and then are converted into

    gray-scale images. Our system obtains the motion history of two images using the image

    difference calculation. We adopted the cvUpdateMotionHistory of the OpenCV API to

    update the motion history. Motion history can be updated by the non-zero pixel silhouette

    image when motion occurs in the image. In this paper, we used the 1 second time-stamp for

    the image storage time and exclude the non-zero pixel silhouette images that have asummation of width and height below 20 pixels.

    As shown in Fig. 9, when objects move in the image for a specific period, a blue-marked

    motion history is updated by the comparison of two image frames. We represent an object

    as the center point in a circle that covers the whole object. In addition, a line from the

    centric point to a circular arc shows the movement direction of the object, as shown in

    Fig. 9. We can predict the centric point of an object in the next frame using the centric point

    and direction of the object. First, we continuously store the center point and direction in the

    buffer. The object center point in next frame is set by the point that has the shortest distance

    between object center points of the previous frame and the next frame. The Euclidean

    method [6] was used to measure distance between center points. In the case of an occlusion

    of multiple objects, the movement direction between the previous frame and next frame is

    used for the objects identification.

    The centric point is used for Panning and Tilting the PTZ camera, as used in the following

    algorithm, considering the zoom level difference between the fixed camera and the PTZ camera.

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    14/20

    Algorithm 3: PTZ Function

    1: int MovFactor, prevX, prevY, newX, newY;

    2: MovFactor = (PTZCamZoomLev/FixedCamZoomLev);

    3: newX = FixedCamCenterX + (OjbCenterX FixedCamCenterX) x MovFactor;4: newY = FixedCamCenterY + (OjbCenterY FixedCamCenterY) x MovFactor;

    5: DoPTZ (newX, newY);

    We can calculate the degree of panning and tilting using the given coordinates depicted

    in the function DoPTZ (newX, newY).

    5 Physical prototyping of intelligent video surveillance system

    We performed experiments with seven CCD cameras (704480 resolutions) to evaluatethe performance of our system. We used PCs with Intel 64-bit Xeon 3.2 G Processors

    and 2 GB of RAM as the hardware platform, and Microsoft SQL Server 2000 as the

    underlying DBMS. The system automatically recognizes various dangerous situations in

    public areas and classifies the safety level by means of the environments safety index

    models using network camera collaboration. Figure 10 shows color-based object

    identification using two cameras and violence recognition using an acoustic sensor. In

    Fig. 10 (a), the face of an entering object is detected by Adaboost algorithm in the

    entrance. Each object is classified by the HSI color model. Thus, the intelligent

    surveillance system identifies and tracks unauthenticated people by analyzing the color

    and pattern of clothing. In Fig. 10 (b), dangerous situations are recognized by analyzing

    audio and visual data. Our system detects abnormal situations using the decibel (dB)

    levels of scream pitches.

    Figure 11 shows the physical prototype of the intelligent video surveillance system

    composed of a screen (4.2 m2 m) with six projectors, one ceiling projector, four fix

    cameras, four PTZ cameras, one speed dome camera and two acoustic sensors. Figure 11

    (a) Real image (b) Silhouette image

    Fig. 9 Object tracking

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    15/20

    (a) Object identification

    (b) Abnormal situation detection

    Fig. 10 Object identification violence recognition using multiple sensors

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    16/20

    (a) Control center

    (b) Screenshot of ISS

    (c) GIS using Google Earth

    (d) USS monitor

    Fig. 11 Physical prototyping

    of intelligent video surveillance

    system

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    17/20

    (c) shows a satellite map that covers the entire boundary of the Earth. When an accident

    occurs, the system zooms into the accident area. The system indicates the accident point

    using a red circle in a 2D map. We use the Google Earth API functions [8] to mark the

    monitoring area for efficiency.

    In Fig. 11 (d), the level of spatial importance is computed using space features andfacilities. Based on this space safety index, we reconstruct the monitoring area when an

    accident occurs in a specific space. For example, the safety index of an ATM facility is

    higher than that of other areas. At this time, if violence happens in the ATM area, the level

    of spatial importance is recalculated. Then, the system controls the adjacent camera to

    monitor the ATM area using PTZ control. If violence occurs in the ATM area, the area is

    marked with red in the 2D map.

    6 Conclusions

    In this paper, we have developed an intelligent surveillance system that provides

    various services, such as object identification, object size analysis, object localization,

    tampering detection, activity recognition, and moving object tracking. The surveillance

    systems have different established event rules and message exchanging rules with

    vendors. Thus, we have defined the metadata rules to exchange analyzed information

    between distributed surveillance systems or heterogeneous surveillance systems. A 3-

    tier context-awareness conceptual framework is presented to identify the design

    principles of the intelligent surveillance system. Most importantly, the design

    prototypes, as the convergence of computers and buildings, have shown the potentialfor a profound transformation of design practice in smart space design. The design

    framework and the implementation of the prototypes have served as a logical basis to

    elaborate broad design concepts and intelligent video computing technologies that may

    be performed toward future smart surveillance systems. In future work, we will improve

    robust object identification methods and create an administrative mobile device

    interface.

    Acknowledgment

    This research was supported by the MKE(The Ministry of Knowledge Economy),Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA

    (National IT Industry Promotion Agency) (NIPA-2010-C1090-1031-0004) and this research is also

    supported by the Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier

    R&D Program of the Ministry of Knowledge Economy (MKE), the Korean government, as a result of UCN s

    subproject 10C2-T3-10M.

    References

    1. Ali A, Aggarwal J (2001) Segmentation and recognition of continuous human activity. In: Detection and

    recognition of events in video, 2001. Proceedings. IEEE Workshop on, pp 2835

    2. Ayers D, Shah M (2001) Monitoring human behavior from video taken in an office environment. Image

    Vis Comput 19(12):833846

    3. Cai Q, Aggarwal JK (1996) Tracking human motion using multiple cameras. In: ICPR96: Proceedings

    of the International Conference on Pattern Recognition (ICPR 96) Volume III-Volume 7276.

    Washington, DC, USA: IEEE Computer Society, pp 6872

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    18/20

    4. Chae YN, Kim Y-H, Choi J, Cho K, Yang HS (2009) An adaptive sensor fusion based objects tracking and

    human action recognition for interactive virtual environments. In: VRCAI 09: Proceedings of the 8th

    International Conference on Virtual Reality Continuum and its Applications in Industry. New York, NY,

    USA: ACM, pp 357362

    5. Comaniciu D, Ramesh V, Meer P (2003) Kernel-based object tracking. IEEE Trans Pattern Anal Mach

    Intell 25:564

    5756. Danielsson P (1980) Euclidean distance mapping. Comput Graph Image Process 14(3):227248

    7. Duong T, Bui H, Phung D, Venkatesh S (2025 2005) Activity recognition and abnormality detection

    with the switching hidden semi-markov model. In: Computer Vision and Pattern Recognition, 2005.

    CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp 838845

    8. Google Earth API. Online Available. http://code.google.com/apis/earth/

    9. Huang C-L, Liao B-Y (2001) A robust scene-change detection method for video segmentation. IEEE

    Trans Circuits Syst Video Technol 11(12):12811288

    10. Huang T, Russell S (1997) Object identification in a Bayesian context. In: IJCAI97: Proceedings of the

    Fifteenth international joint conference on Artifical intelligence. San Francisco, CA, USA: Morgan

    Kaufmann Publishers Inc., p. 12761282

    11. Kelly PH, Katkere A, Kuramura DY, Moezzi S, Chatterjee S (1995) An architecture for multiple

    perspective interactive video. In: MULTI- MEDIA95: Proceedings of the third ACM International

    Conference on Multimedia. New York, NY, USA: ACM, pp 201212

    12. Kettnaker V, Zabih R (Jul 1999) Counting people from multiple cameras. In: Multimedia computing and

    systems, 1999. IEEE International Conference on, vol. 2, pp 267271

    13. Lv F, Kang J, Nevatia R, Cohen I, Medioni G (2004) Automatic tracking and labeling of human

    activities in a video sequence. In PETS04

    14. Nam J, Tewfik A (2005) Detection of gradual transitions in video sequences using b-spline interpolation.

    IEEE Trans Multimedia 7(4):667679

    15. Nam Y, Ryu J, Joo Choi Y, Duke Cho W (2007) Learning spatio-temporal topology of a multi-camera

    network by tracking multiple people. World Acad Sci Eng Tech 4(4):254259

    16. OpenCV, Open Computer Vision Library. http://sourceforge.net/projects/opencvlibrary/

    17. Petrushin V, Wei G, Ghani R, Gershman A (2828 2005) Multiple sensor indoor surveillance: problems

    and solutions. In: Machine Learning for Signal Processing, 2005 IEEE Workshop on, pp 349

    35418. Rabiner L (1989) A tutorial on hidden markov models and selected applications in speech recognition.

    Proc IEEE 77(2):257286

    19. Ribeiro PC, Santos-victor J (2005) Human activity recognition from video: modeling, feature selection

    and classification architecture. In: International Workshop on Human Activity Recognition and

    Modeling, pp 6170

    20. Ribnick E, Atev S, Masoud O, Papanikolopoulos N, Voyles R (Nov. 2006) Real-time detection of

    camera tampering. In: Video and signal based surveillance, 2006. AVSS 06. IEEE International

    Conference on

    21. Serby D, Meier E, Van Gool L (2326 2004) Probabilistic object tracking using multiple features. In: Pattern

    recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, vol. 2, pp 184187 Vol.2

    22. Veenman C, Reinders M, Backer E (2001) Resolving motion correspondence for densely moving points.

    IEEE Trans Pattern Anal Mach Intell 23(1):547223. Viola P, Jones M, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J

    Comput Vis 63:153161

    24. Yilmaz A, Li X, Shah M (2004) Contour-based object tracking with occlusion handling in video

    acquired using mobile cameras. IEEE Trans Pattern Anal Mach Intell 26(11):15311536

    25. Zelnik-Manorand L, Irani M (2001) Event-based analysis of video. In: Computer Vision and Pattern

    Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 2,

    pp 123130

    26. Zhang D, Gatica-Perez D, Bengio S, McCowan I (2025 2005) Semi-supervised adapted HMMs for

    unusual event detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE

    Computer Society Conference on, vol. 1, pp 611618

    27. Zhao W, Wang J, Bhat D, Sakiewicz K, Nandhakumar N, Chang W (Jul 1999) Improving color based video

    shot detection. In: Multimedia computing and systems, 1999. IEEE International Conference on, vol. 2, pp752756 vol.2

    28. Zhong H, Shi J, Visontai M (2004) Detecting unusual activity in video. In: Computer vision and

    pattern recognition, Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, pp

    819826

    Multimed Tools Appl

    http://code.google.com/apis/earth/http://sourceforge.net/projects/opencvlibrary/http://sourceforge.net/projects/opencvlibrary/http://code.google.com/apis/earth/
  • 7/27/2019 9fcfd50d1bd3461721

    19/20

    Yunyoung Nam received B.S, M.S. and Ph.D. degree in Information and Computer Engineering from Ajou

    University, Korea in 2001, 2003, and 2007 respectively. He was a research engineer in the Center ofExcellence in Ubiquitous System from 2007 to 2009. He was a post-doctoral researcher at Stony Brook

    University in 2009, New York. He is currently a research professor in Ajou University in Korea. He also

    spent time as a visiting scholar at Center of Excellence for Wireless & Information Technology (CEWIT),

    Stony Brook University - State University of New York Stony Brook, New York. His research interests

    include multimedia database, ubiquitous computing, image processing, pattern recognition, context-

    awareness, conflict resolution, wearable computing, and intelligent video surveillance.

    Seungmin Rho received his MS and PhD Degrees in Information and Computer Engineering from Ajou

    University, Korea, in Computer Science from Ajou University, Korea, in 2003 and 2008, respectively. In

    20082009, he was a Postdoctoral Research Fellow at the Computer Music Lab of the School of Computer

    Science in Carnegie Mellon University. He is currently working as a Research Professor at School of

    Electrical Engineering in Korea University. His research interests include database, music retrieval,

    multimedia systems, machine learning, knowledge management and intelligent agent technologies. He hasbeen a reviewer in Multimedia Tools and Applications (MTAP), Journal of Systems and Software,

    Information Science (Elsevier), and Program Committee member in over 10 international conferences. He

    has published 14 papers in journals and book chapters and 21 in international conferences and workshops.

    He is listed in Whos Who in the World.

    Multimed Tools Appl

  • 7/27/2019 9fcfd50d1bd3461721

    20/20

    Dr. Jong Hyuk Park received his Ph.D. degree in Graduate School of Information Security from Korea

    University, Korea. From December, 2002 to July, 2007, Dr. Park had been a research scientist of R&DInstitute, Hanwha S&C Co., Ltd., Korea. From September, 2007 to August, 2009, He had been a professor at

    the Department of Computer Science and Engineering, Kyungnam University, Korea. He is now a professor

    at the Department of Computer Science and Engineering, Seoul National University of Science and

    Technology, Korea. Dr. Park has published about 100 research papers in international journals and

    conferences. He has been serving as chairs, program committee, or organizing committee chair for many

    international conferences and workshops. He is a president of Korea Information Technology Convergence

    Society (KITCS). He is editor-in-chief (EiC) of International Journal of Information Technology,

    Communications and Convergence (IJITCC), InderScience. He was EiCs of the International Journal of

    Multimedia and Ubiquitous Engineering (IJMUE) and the International Journal of Smart Home (IJSH). He is

    Associate Editor / Editor of 14 international journals including 8 journals indexed by SCI(E). In addition, he

    has been serving as a Guest Editor for international journals by some publishers: Springer, Elsevier, John

    Wiley, Oxford Univ. press, Hindawi, Emerald, Inderscience. His research interests include security anddigital forensics, ubiquitous and pervasive computing, context awareness, multimedia services, etc. He got

    the best paper award in ISA-08 conference, April, 2008. And he got the outstanding leadership awards from

    IEEE HPCC-09 and ISA-09, June, 2009.

    Multimed Tools Appl