moving vehicle registration

Embed Size (px)

Citation preview

  • 7/29/2019 moving vehicle registration

    1/12

    IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010 2901

    Multiframe Super-Resolution Reconstruction ofSmall Moving Objects

    Adam W. M. van Eekeren, Member, IEEE, Klamer Schutte, and Lucas J. van Vliet, Member, IEEE

    AbstractMultiframe super-resolution (SR) reconstruction ofsmall moving objects against a cluttered background is difficultfor two reasons: a small object consists completely of mixedboundary pixels and the background contribution changes fromframe-to-frame. We present a solution to this problem thatgreatly improves recognition of small moving objects under theassumption of a simple linear motion model in the real-world. Thepresented method not only explicitly models the image acquisitionsystem, but also the space-time variant fore- and backgroundcontributions to the mixed pixels. The latter is due to a changinglocal background as a result of the apparent motion. The methodsimultaneously estimates a subpixel precise polygon boundary

    as well as a high-resolution (HR) intensity description of a smallmoving object subject to a modified total variation constraint.Experiments on simulated and real-world data show excellent per-formance of the proposed multiframe SR reconstruction method.

    Index TermsBoundary description, moving object, partial areaeffect, super-resolution (SR) reconstruction.

    I. INTRODUCTION

    IN SURVEILLANCE applications, the most interesting

    events are dynamic events consisting of changes occurringin the scene such as moving persons or moving objects. In

    this paper, we focus on multiframe super-resolution (SR) re-

    construction of small moving objects in under-sampled image

    sequences. Small objects are objects that are completely com-

    prised of boundary pixels. Each boundary pixel is a mixed

    pixel, and its value has both contributions of the moving

    foreground object and the locally varying background. Hence,

    not only do the fractions change from frame-to-frame, but

    also the local background values change due to the apparent

    motion. Especially for small moving objects, an improvement

    in resolution is useful to permit classification or identification.

    Manuscript received November 25, 2008; revised April 24, 2010; acceptedApril 24, 2010. Date of publication August 19, 2010; date of current versionOctober 15, 2010. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Michael Elad.

    A. W. M. van Eekeren is with the Electro Optics Group at TNO Defence,Security, and Safety, The Hague, The Netherlands. He is also with the Quanti-tative Imaging Group, Delft University of Technology, Delft, The Netherlands(e-mail: [email protected]).

    K. Schutte is with the Electro Optics group at TNO Defence, Security andSafety, The Hague, The Netherlands (e-mail: [email protected]).

    L. J. van Vliet is with the Quantitative Imaging Group at Delft University ofTechnology, Delft, The Netherlands (e-mail: [email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TIP.2010.2068210

    Multiframe SR reconstruction1 improves the spatial resolu-

    tion by exchanging temporal information of a sequence of sub-

    pixel displaced low-resolution (LR) images for spatial informa-

    tion. Although the concept of SR reconstruction already exists

    for more than 20 years [1], relatively little attention is given to

    SR reconstruction of moving objects. In [2][8], this subject was

    addressed for various dedicated tasks.

    Although [2] and [5] apply different SR reconstruction

    methods, i.e., iterative-back-projection [9] and projection onto

    convex sets [10], respectively, both use a validity map in their

    reconstruction process. This makes these methods robust tomotion outliers. Both methods perform well on large moving

    objects that obey to a simple translational motion model. For

    large objects, only a small fraction of the pixels are boundary

    pixels. Hardie et al. [7] use optical flow to segment a moving

    object and subsequently apply SR reconstruction to it. In their

    work, the background is static and SR reconstruction is only

    applied to the masked area inside a large moving object. In

    [6], Kalman filters are used to reduce edge artifacts at the

    boundary between fore- and background. However, the fore-

    and background are not explicitly modeled in this method.

    In previous work [3], we presented a system that applies SR

    reconstruction after a segmentation step simultaneously to a

    large moving object and the background using Hardies method

    [7]. Again, no SR reconstruction is applied to the boundary

    of mixed pixels separating the moving object from a cluttered

    background. In [4], we presented the first attempt of SR recon-

    struction on small moving objects with simulated data. At that

    time no experiments were done on real-world data which lifted

    the need for a very precise estimate of the objects trajectory.

    In [8], SR reconstruction is performed on moving vehicles of

    approximately 10 by 20 pixels. For object registration a trajec-

    tory model is used in combination with a consistency measure

    of the local background and vehicle. However, in the SR recon-

    struction approach no attention is given to mixed pixels.

    An interesting subset of moving objects consists of faces. Ef-forts in that area using SR reconstruction include [11] and [12],

    in which the modeling of complex motion is a key element.

    However, the faces in the LR input images used are far larger

    than the small objects that we focus on in this paper. SR recon-

    struction on moving objects is also applied in astronomy. An

    overview can be found in [13], where it is explained that SR

    reconstruction is only possible under the condition that the so-

    lution is very sparse, i.e., very few samples having a value larger

    than zero. In contrast, our SR reconstruction method is designed

    to handle nonzero cluttered backgrounds.

    1In the remainder of this paper SR reconstruction refers to multi-frame SR

    reconstruction.

    1057-7149/$26.00 2010 IEEE

  • 7/29/2019 moving vehicle registration

    2/12

    2902 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

    Fig. 1. Flow diagram illustrating the construction of a 2-D HR image representing the cameras field-of-view and the degradation thereof into a LR framevia a camera model.

    For small moving objects that consist completely of mixed

    pixels against a cluttered background, the state-of-the-art

    pixel-based SR reconstruction methods mentioned previously

    will fail. Pixel-based SR reconstruction methods make an error

    at the object boundary, because they cannot disentangle the

    contributions from the space-time variant background and

    foreground information within a mixed pixel. To tackle theaforementioned problem we incorporate a subpixel precise

    object boundary model with a high-resolution (HR) pixel grid.

    We simultaneously estimate this polygonal object boundary as

    well as a HR intensity description of a small moving object

    subject to a modified total variation constraint. Assuming rigid

    objects that move with constant speed through the real world,

    object registration is achieved by fitting a trajectory through the

    objects center-of-mass in each frame. The approach assumes

    that a HR background image is estimated first. Robust SR

    reconstruction methods can accomplish this. They treat the

    intensity fluctuations after global registration caused by the

    small moving object as outliers. Especially for small moving

    objects our approach significantly improves object recognition.Note that the use of the proposed SR reconstruction method

    is not limited to small moving objects. It can also be used to

    improve the resolution of boundary regions of larger moving

    objects as long as the size of the object does not prohibit proper

    SR reconstruction of the background.

    The paper is organized as follows. First, in Section II we

    present the forward model for a simulated HR scene and

    the observed LR image data by an electro-optical sensor

    system. In Section III, the three steps of the proposed SR

    reconstruction method for small moving objects are presented.

    Section IV presents experiments on simulated data, followed

    by a real-world experiment in Section V. Finally, in Section VIthe main conclusions are presented.

    II. FORWARD MODEL: REAL-WORLD DATA DESCRIPTION

    This section describes the two steps of our forward model to

    constructs a LR camera frame from HR representations of the

    fore- and background in combination with a subpixel precise

    polygon model of our object. The first step models the construc-

    tion of a 2-D HR image including the moving object whereas

    the second step models the image degradation as a result of thephysical properties of our camera system.

    A. 2-D HR Scene

    We model a cameras field-of-viewthe sceneat frame

    as a properly sampled 2-D HR image . Each frame consists

    of pixels without significant degradation due to motion, blur

    or noise. Let us express this image in lexicographical notation

    as the vector . The image is con-

    structed from a translated HR background intensity description, consisting of pixels, and a translated HR

    foreground intensity description , consisting

    of pixels. This is depicted in the left part of Fig. 1. Note that

    the foreground has a different apparent motion with respect to

    the camera than the background .

    The small moving object in the foreground is not only rep-

    resented by its HR intensity description , but also by a sub-

    pixel precise polygon boundary ,

    with the number of vertices. We impose the following as-

    sumptions on the motion of the object: 1) the aspect anglethe

    angle between the direction of motion and the optical axis of the

    camerastays the same and 2) the object is moving with a con-stant velocity, i.e., the acceleration is zero. These are realistic

    assumptions if the object is far away from the camera and for a

    short duration up to a few seconds. The latter does not limit the

    acquisition of a large number LR frames due to the high frame

    rate of todays image sensors.

    At frame the HR background and the HR foreground are

    translatedand merged into the 2-D HR image in which the

    th pixel is defined by

    (1)

    with and .

    Here, is the number of frames. The summation over

    represent the translation of foreground pixel to

    by bilinear interpolation and similarly, the summation over

    translates background pixel to . The weight

    represents the foreground contribution at pixel in frame

    depending upon the polygon boundary . The foreground

    contribution varies between 0 and 1, so the corresponding

    background contribution equals by definition .

    Fig. 2 depicts the construction of the th HR

    image by masking both the translated background,

    , and the translated foreground,

    , after which the constituents are mergedinto . The polygon boundary defines the foreground

  • 7/29/2019 moving vehicle registration

    3/12

    VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2903

    Fig. 2. Flow diagram illustrating the masking of foreground and background constituents and the merging thereof into the HR image . The polygon boundaryis superimposed on the background contributions for visualization purposes only. Note that in the weight images and black indicatesno contribution, white indicates full contribution and greys indicate a partial contribution.

    contributions and the background contributions in

    HR frame .

    B. Camera Model

    A LR camera frame is obtained by applying the cameramodel to the 2-D HR image representing the cameras

    field-of-view. The camera model comprises two types of image

    blur, sampling, and degradation by noise.

    Blur: The optical point-spread-function (PSF), together

    with the sensor PSF, will cause a blurring in the image

    plane. In this paper, the optical blur is modeled by a

    Gaussian function with standard deviation . The

    sensor blur is modeled by a uniform rectangular function

    representing the fill-factor of each sensor element. A

    convolution of both functions represents the total blurring

    function.

    Sampling: The sampling as depicted in Fig. 1 reflects thepixel pitch only. The integration of photons over the photo-

    sensitive area of a pixel is accounted for by the aforemen-

    tioned sensor blur.

    Noise: The temporal noise in the recorded data is mod-

    eled by additive, independent and identically distributed

    Gaussian noise samples with standard deviation .

    For the recorded data used, independent additive Gaussian

    distributed noise is a sufficiently accurate noise model.

    Other types of noise, like fixed pattern noise (FPN) and bad

    pixels, are not explicitly modeled. For applications where

    FPN becomes a hindrance, it is advised to correct the cap-

    tured data prior to SR reconstruction using a scene-based

    non uniformity correction algorithm, such as the one pro-posed in [14].

    All in all, the observed th LRpixel from frame ismodeled

    as follows:

    (2)

    for and .

    Here, denotes the number of LR pixels in . The weight

    represents the contribution of HR pixel to estimated

    LR pixel . Each contribution is determined by the blurring

    and sampling of the camera. represents an additive, inde-

    pendent and identically distributed Gaussian noise sample with

    standard deviation .

    III. DESCRIPTION OF PROPOSED METHOD

    The proposed SR reconstruction method can be divided into

    three parts: 1) applying SR reconstruction to the background

    for subsequent detection of moving objects from the residue be-tween the observed LR frame and a simulated LR frame based

    upon the estimated HR background at that instance; 2) fitting a

    trajectory model to the detected instances of the moving object

    through the image sequence to obtain subpixel precise object

    registration; and 3) obtaining a HR object representationcom-

    prised of a subpixel precise boundary and a HR intensity de-

    scriptionby solving an inverse problem based upon the model

    of Section II. We start with the third step, because it is the key

    innovative part of the proposed method.

    A. SR Reconstruction of a Small Moving Object

    To find the optimal HR description of the object (consisting

    of a polygon boundary anda HR intensity description ), wesolve an inverse problem based upon the camera observation

  • 7/29/2019 moving vehicle registration

    4/12

    2904 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

    model described in (1) and (2). To favor sparse solutions of this

    ill-posed problem we added two regularization terms: one to pe-

    nalize intensity transitions in the HR intensity description and

    one to avoid unrealistically wild object shapes. These observa-

    tions give rise to the following cost function:

    (3)

    where the first summation term represents the normalized data

    misfit contributions for all pixels . Normalization is per-

    formed with respect to the total number of LR pixels and the

    noise variance . Here, denotes the measured intensities

    of the observed LR pixels and the corresponding estimatedintensities obtained using the forward model of Section II. Al-

    though the estimated intensities are also dependent upon

    the background , only and are varied to minimize (3).

    The HR background is estimated in advance as described in

    Section III-B.

    The second term of the cost function is a regularization

    term which favors sparse solutions by penalizing the amount

    of intensity variation within the object according to a criterion

    similar to the bilateral total variation (BTV) criterion [15]. Here,

    is the shift operator that shifts by pixels in horizontal

    direction whereas shifts by pixels in vertical direction.

    The actual minimization of the cost function is done in an it-

    erative way by the LevenbergMarquardt (LM) algorithm [16].

    This optimization algorithm assumes that the cost function has

    a first derivative that exists everywhere. However, the L1-norm

    used in the TV criterion does not satisfy this assumption. There-

    fore, we introduce the hyperbolic norm

    (4)

    This norm has the same properties as the L1-norm for large

    values and it has a first (and second) derivative that

    exists everywhere. For all experiments the value is used.

    The third term of (3) constrains the shape of the polygon by

    penalizing the variation of the polygon boundary . Regu-larization is needed to penalize unwanted protrusions, such as

    spikes, which cover a very small area compared to the total ob-

    ject area. This constraint is embodied by the measure , which

    is small when the polygon boundary is smooth

    with (5)

    is the inverse of , which is the area spanned by the edges

    ( and ) at vertex and half the angle between those edges

    as indicated by the right part of (5).

    From example (a) in Fig. 3 it is clear why the area is calcu-

    lated with half the angle : if we would take the full angle

    , would be zero, which would result in . Ex-ample (b) shows that the measure will be very large for small

    Fig. 3. Two examples to illustrate the expression for polygon regularizationat vertex of polygon . (a) is minimal for , (b) is maximalfor .

    angles, sharp protrusions. Note that this measure also becomes

    very large for (inward pointing spike).

    Note that in (3) normalization is performed on by a mul-

    tiplication with the square of the mean edge length ,

    with the number of vertices and the total edge length of

    . This normalization prevents extensive growth of edges.

    As mentioned previously, the actual minimization of the

    cost function is performed in an iterative way by the Leven-bergMarquardt algorithm [16]. To allow this, we put the cost

    function of (3) in the LM framework, which expects a format

    like where is the measurement

    and is the estimate depending upon parameter . In

    general, it is straightforward to store all residues, for example

    , in a vector which forms the input of the

    LM algorithm. In our case, we have to be aware of the different

    norms in each of the terms of (3). The residue vector looks

    like

    (6)

    where the letters on top indicate the number of elements used in

    each part of the cost function. The length of the vector in (6) is

    .

    The cost function in (3) is iteratively minimized to simulta-

    neously find the optimal and . A flow diagram of this itera-

    tive minimization procedure in steady state is depicted in Fig. 4.

    Here the Cost function refers to (3) and the Camera model to

    formulas (1) and (2). Note that the measured data used for

    the minimization procedure contains a small region-of-interest

    (ROI) around the moving object in each frame only.

    The optimization scheme depicted in Fig. 4 has to be initial-

    ized with an object boundary and an object intensity descrip-

    tion . These can be obtained in several ways; we have chosento use a simple and robust initialization that proved to initialize

  • 7/29/2019 moving vehicle registration

    5/12

    VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2905

    Fig. 4. Flow diagram illustratingthe steadystate of estimating a HRdescriptionof a moving object ( and ). denotes the measured intensities in a regionof interest containing the moving object in all frames after registration anddenotes the correspondingestimated intensities at iteration . Note that theinitialHR object description ( and ) is derived from the measured LR sequenceand the object mask sequence.

    the method close enough to the global minimum to permit con-

    vergence to the global minimum in most practical cases.

    The initial object boundary is obtained by first calculating

    the frame-wise median width and the frame-wise median height

    of the mask in the object mask sequence (defined in the

    next section). Subsequently, we construct an elliptical object

    boundary from the previously calculated width and height.

    Upon initialization the vertices are evenly distributed over the

    ellipse. The number of vertices is fixed during minimization.

    The object intensity distribution is initialized by a constant

    intensity equal to the median value over all masked pixel inten-

    sities in the measured LR sequence .

    Furthermore, the optimization procedure is performed in two

    steps. The first step consists of the initialization described pre-

    viously followed by a few iterations of the LM algorithm. We

    derived during experimentation that using more than five itera-

    tions has no effect on the final result.

    After this step the intensity description often contains large

    gradients perpendicular to the estimated object boundary, where

    pixels outside the contour still contain the initial initialization

    values. As this can cause getting stuck in local minima, a par-

    tial reinitialization step is proposed. In this step, all intensities

    of HR foreground pixels adjacent to a mixed boundary pixel but

    located completely inside the object boundary are propagatedoutwards. After this partial reinitialization, we continue the it-

    erative procedure until convergence or for a fixed number of it-

    erations to be determined in a simulation experiment.

    B. SR Reconstruction of Background and Moving Object

    Detection

    A small moving object causes a temporary change of a small

    localized set of pixel intensities. In previous work [17], we pre-

    sented a framework for the detection of moving point targets

    against a static cluttered background. A robust pixel-based SR

    reconstruction method computes a HR background image by

    treating the local intensity variations caused by the small ob-ject as outliers. After registration of the HR background to a

    recorded LR frame we apply the camera model to simulate the

    LR frame with identical aliasing artifacts as in the recorded LR

    frame, but without the small object. Thresholding the absolute

    value of the residue image yields a powerful tool for object de-

    tection, provided that the apparent motion is sufficient given the

    number of frames to be used in background reconstruction. As-

    suming LR frames containing a moving object of width(expressed in LR pixels), the apparent lateral motion must ex-

    ceed LR pixels/frame for a proper background

    reconstruction.

    Several robust SR reconstruction methods have been reported

    [15], [18], [19]. We choose the method developed by Zomet

    et al. [19], which is robust to intensity outliers, such as those

    caused by small moving objects. This method employs the same

    camera model as presented in (2). Its robustness is introduced

    by a robust back-projection

    (7)

    where median denotes a scaled pixel-wise median over the

    frames and is the projection operator from HR image to

    LR frame .

    A LR representation of the background, obtained by applying

    the camera model to the shifted HR background image , is

    compared to the corresponding LR frame of the recorded image

    sequence

    (8)

    where represents the blur and down-sample operation,is the th pixel of the shifted HR background in frame

    and is the recorded intensity of the th pixel in frame .

    All difference pixels constitute a residual image sequence

    in which a moving object can be detected.

    Thresholding this residual image sequence followed by

    tracking improves the detectability for low residue-to-noise

    ratios. Threshold selection is done with the chord method

    from Zack et al. [20], which is illustrated in Fig. 5. With

    this histogram based method an object mask sequence

    results for and

    , with the number of observed LR

    frames and the number of pixels in each LR frame.After thresholding, multiple events may have been detected

    in each frame of . We apply tracking to link the most sim-

    ilar events in each frame to a so-called reference event. This ref-

    erence event is defined by the median width , the median

    height and the median residual energy of the largest

    event in each frame (median is computed frame-wise). Next, we

    search in each frame for the event with the smallest normal-

    ized Euclidian distance w.r.t. the reference event shown in

    (9) at the bottom of the next page, with the index of the event

    in frame with the smallest normalized Euclidian distance to

    the reference event. After this tracking step an object mask se-

    quence is generated with in each frame at most one event,

    the one corresponding to the object giving rise to the referenceevent. Note that a frame can be empty if no event was detected.

  • 7/29/2019 moving vehicle registration

    6/12

    2906 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

    Fig. 5. Thresholdselection bythe chord method is based uponfinding thevalueof that maximizes the distance between the histogram and the chord. Thevalue is used as threshold value.

    C. Moving Object RegistrationThe object mask sequence , obtained after thresholding

    and tracking, gives a rough quantized indication of the position

    of the object in each frame. For performing SR reconstruction, a

    more precise, subpixel registration is needed. For large moving

    objects which contain a sufficient number of internal pixels with

    sufficient structure, gradient-based registration [21] can be per-

    formed. In the setting of small moving objects, this is usually

    not the case and another approach is needed.

    Assuming a linear motion model for a moving object in the

    real-world, the projected model can be fitted to the sequence

    of detected object positions. We assume a constant velocity

    without acceleration in the real world, which seems realistic

    given the nature of small moving objects: the objects are far

    away from the observer and will have a small acceleration

    within the frames due to the high frame rate of todays

    image sensors.

    First, the position of the object in each frame is deter-

    mined by computing the weighted center-of-mass (COM) of the

    masked pixels as follows:

    (10)

    with the number of LR pixels in frame , the location

    of pixel , the corresponding mask value (0 or 1) and

    is the measured intensity.

    To fit a trajectory, all object positions in time must be known

    w.r.t. a reference point in the background of the scene. This is

    done by adding the previously obtained apparent background

    translation to the calculated object position for each frame:

    .

    To obtain all object positions with subpixel precision, a robust

    fit to the measured object positions is performed. Assumingconstant motion, all object positions can be described by a refer-

    ence object position and a translation . Both the reference

    object position and the translation of the object are estimated by

    minimizing the following cost function:

    (11)

    where denotes the Euclidean distance in LR pixels between

    the measured object position and the estimated object position

    at frame

    (12)

    The cost function in (11) is known as the Gaussian norm [22].

    This norm is robust to outliers (e.g., false detections in our case).

    The smoothing parameter is set to 0.5 LR pixel. Minimizing

    the cost function in (11) with the LevenbergMarquardt algo-

    rithm results in an accurate subpixel precise registration of the

    moving object. If, e.g., 50 frames are used, the regis-

    tration precision is improved by a factor 7.

    D. Computational Complexity

    The computational complexity is dominated by calculating(3), i.e., computing the SR reconstruction of the HR foreground.

    At every iteration of the LM optimization procedure, the cost

    function has to be calculated for variations in the estimated pa-

    rameters to estimate the gradient w.r.t. the parameters to be

    solved. The cost function has to be evaluated

    times, with the number of HR foreground intensities, the

    number of vertices and # the number of LM iterations. A recon-

    struction as described in Section IV-B ( , ,

    ) using Matlab code took 37 min on a Pentium-4, 3.2

    GHz processor under Windows.

    The processing time can be drastically reduced if a precom-

    putation of the partial derivatives of the cost function w.r.t. theHR foreground intensities is performed off-line and stored. In

    this case, the computational complexity reduces to .

    Note that typically thereby forecasting a reduction in

    the computation time by one order of magnitude.

    (9)

  • 7/29/2019 moving vehicle registration

    7/12

    VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2907

    IV. EXPERIMENTS ON SIMULATED DATA

    The proposed SR reconstruction method for small moving

    objects is first applied to simulated data to study the behavior

    under controlled conditions. In a series of experiments, we tune

    the regularization parameters and the number of iterations. Then

    we study the convergence, the robustness in the presence of

    clutter and noise, and the robustness against violations of theunderlying linear motion model.

    A. Generating the Simulated Car Sequence

    The simulated car sequence was generated to resemble the

    real-world sequence of the next section as good as possible.

    We simulated an under-sampled image sequence containing a

    small moving car using the camera model as depicted in Fig. 1.

    The parameters of the camera model were chosen to match the

    sensor properties of the real-world system, i.e., optical blurring

    (Gaussian kernel with standard deviation LR pixel)

    and sensor blurring (rectangular uniform filter with a 100% fill-

    factor) and Gaussian distributed noise to resemble the actualnoise conditions (see below). The car follows a linear motion

    trajectory with zero acceleration. It consists of two internal in-

    tensities, which are both above the median background inten-

    sity. The low object intensity is exactly in between the me-

    dian background intensity and the high object intensity. The

    boundary of the car is modeled by a polygon with seven vertices.

    Fig. 7(a) shows a HR image of the simulated car, which serves as

    a ground-truth for all SR reconstruction results. Fig. 7(b) and (c)

    show two LR image frames in which the car covers approxi-

    mately 6 pixels. All 6 pixels are so called mixed pixels and

    contain contributions of the fore- and background.

    The image quality is further quantified by the signal-to-noise

    ration (SNR) and the signal-to-clutter ratio (SCR). The SNR isa measure for the contrast between the object and the time-aver-

    aged local background compared to stochastic variations called

    noise. The SNR is defined as

    (13)

    with the number of frames, the mean foreground in-

    tensity in frame and the mean local background inten-

    sity inframe . iscalculated by takingthe mean intensity

    of LR pixels that contain at least 50% foreground and is

    defined by the mean intensity of all 100% background pixels ina small neighborhood around the object.

    The SCR is a measure for the contrast between the object and

    the time-averaged local background compared to the variation

    in the local background. The SCR is defined as

    (14)

    with the standard deviation of the local background in

    frame . In the LR domain, the SNR is 29 dB and the SCR is 14

    dB. These are realistic values and derived from the real-world

    image sequence of the next section.

    In the next subsections, different experiments on the sim-ulated data are performed. For all experiments 50 LR frames

    Fig. 6. NMSE between the SR result and the ground truth as a function of theregularization parameters and . Here both parameters are kept constant

    throughout all iterations in step 1 and step 2.

    are used to estimate the HR foreground and 85 LR frames are

    used to estimate the HR background. In all used reconstruction

    methods, the zoom factor is set to 4 and the camera parameters

    are the same as in generating the simulated data.

    B. Test 1: Tuning the Algorithm

    Our algorithm contains several parameters such as the camera

    parameters, the regularization parameters, and a stopping cri-

    terion. Although the camera parameters such as the PSF and

    fill-factor can be estimated rather well, the regularization pa-rameters and are far more difficult to tune. To study the

    influence of the regularization parameters on the final result and

    select the parameters for later use, a few experiments are per-

    formed on 50 LR frames of the simulated car sequence.

    In this experiment, we study the influence of the regular-

    ization parameters and on the SR result for the sim-

    ulated car sequence with a SNR of 29 dB and a SCR of 14

    dB. Note that both regularization parameters are kept constant

    during both steps of the optimization procedure. We use the nor-

    malized mean squared error (NMSE) between the SR result of

    the car and its ground truth as a

    figure-of-merit. Note that this measure considers only the fore-ground intensities, the background intensities are set to zero

    (15)

    with the number of HR pixels, the estimated foreground

    contributions using SR and the ground truth. Normalization

    is done with the squared maximum value of .

    From the result in Fig. 6 it can be seen that has by far the

    largest influence on the NMSE. Therefore it is recommended to

    set to . The value for is not critical and set to .

    In a broad range around these values, more than three to five

    iterations in step 1 did not change the final result. After 10 to15 iterations in step 2 the solution converged. Hence, we set the

  • 7/29/2019 moving vehicle registration

    8/12

    2908 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

    Fig. 7. Four times SR reconstruction of a simulated under-sampled image sequence containing a small moving car. (a) HR image representing the scene serving

    as ground truth; (b), (c) two typical LR frames (5 4 pixels) of the moving car; (d) 4 SR by a robust state-of-the-art method [18]; and (e) 4 SR by the proposedmethod.

    maximum number of iterations in step 1 to five and in step 2 to

    15.

    C. Test 2: Comparison With a State-of-the-Art Pixel-Based

    Technique

    To assess the value of the proposed algorithm we compare it

    with the visually best result obtained by a robust state-of-the-art

    pixel-based SR technique [18]. Note that the registration is per-

    formed by the trajectory fitting technique of this paper (to 85

    LR frames) to put both methods on equal footing. The state-of-the-art pixel-based SR result is shown in Fig. 7(d) and bears very

    little resemblance to the ground truth. This is no surprise since

    the partial area effect at the boundary of the objectwhich af-

    fects all object pixelsis not accounted for.

    Using the optimal regularization parameters in both steps:

    , we performed a SR reconstruction with

    the proposed method to exactly the same LR image sequence.

    The result is depicted in Fig. 7(e) and shows a very good resem-

    blance to the ground truth. Subtle changes along the boundary

    and along the intensity transition are caused by partial area ef-

    fects due to the random placement of the reconstructed object

    w.r.t. the HR grid. The object boundary is approximated with 8

    vertices, which is one more than used for constructing the data,

    so the boundary is slightly over-fitted. Comparing the results in

    Fig. 7(d) and (e) shows that the result of our proposed method

    is clearly superior to the pixel-based method of Pham [18].

    D. Test 3: Robustness in the Presence of Clutter and Noise

    To investigate the robustness of our method under different

    conditions, we varied 1) the clutter amplitude of the local back-

    ground and 2) the noise level of the simulated car sequence de-

    scribed in Section IV-A. The clutter of the background is varied

    by multiplying the background with a certain factor after sub-

    tracting the median intensity. Afterwards the median intensity

    is added again to return to the original median intensity. Theobject intensities as well as the size and shape of the car remain

    the same. All parameters that are used for the reconstruction are

    set to the same values as in test 2 in Section IV-C.

    The quality of the different SR results is expressed by the

    NMSE w.r.t. the ground truth as before. Fig. 8 depicts the

    NMSE as a function of SNR and SCR. We divided the results in

    three different categories: good , medium

    and bad . For

    each region a typical SR result is displayed to give a visual

    impression of the performance. It is clear that the SR result in

    the good region, obtained for values of the SNR and SCRthat occur in practice, bears a good resemblance to the ground

    truth. Note that the visible background in these pictures is not

    used to calculate the NMSE. Fig. 8 shows that the performance

    decreases for a decreasing SNR. Furthermore, the boundary

    between the good and medium region indicates a decrease

    in performance under high clutter conditions .

    E. Test 4: Robustness Against Variations in Motion

    The proposed method assumes that the object moves with a

    constant speed and appears in all frames to be used for recon-

    struction with the same aspect angle. To demonstrate the robust-

    ness of our method to violations on these assumptions, two ex-periments are performed. The first experiment shall determine

    the robustness w.r.t. an acceleration of the object. The second

    experiment shall establish the robustness w.r.t. scaling of the ob-

    ject. We modified the simulated car sequence of Section IV-A.

    In the first experiment an acceleration , expressed in LR

    , is added and contributes to the object position by

    , with the frame number. In the second experiment,

    a scale factor, defined as the vehicle size last frame/vehicle size

    first frame, is added. A scale factor of 0.8 indicates that the ob-

    served length of the car varies from 3 LR pixel in the first frame

    to 2.4 LR pixel in the last frame.

    The NMSE as a function of acceleration and scaling is de-

    picted in Fig. 9. Fig. 9(a) shows that a larger acceleration causesa larger error. An acceptable decrease of the is

  • 7/29/2019 moving vehicle registration

    9/12

    VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2909

    Fig. 8. NMSE for the SR results of the simulated car sequence as a function of the SNR and SCR. We have roughly divided the space in three categories: good,medium, bad and provided a typical SR result for each category.

    Fig. 9. NMSE for the SR results of the simulated car sequence as a function of (a) acceleration and (b) object scaling.

    Fig. 10. Top view of the acquisition geometry to capture the real-world data.

    obtained for accelerations smaller than 0.001 LR .

    The error of a constant velocity model fitted to a constant ac-

    celeration motion will follow a parabolic model. This parabola

    will be symmetric, and has a top to end point difference of

    . From the mid-point between its top and an

    end point we get a maximum error of , with

    and this gives a maximum translational

    error of 0.16 LR pixel.

    For the second experiment Fig. 9(b) shows that a maximum

    scaling of 15% is allowed with an acceptable performance loss.

    This is a 7.5% maximum scale change from a mean scale. For a

    3 pixel size object this translates to a maximum pixel shift error

    of LR pixel for both the front and back object edges compared to its center of mass position.

    Note that both experiments have well-comparable maximum

    position errors of 0.16 and 0.11 LR pixel, rather consistent with

    the requirement that the registration error for SR should at least

    be smaller than half the HR pixel pitch. This can easily be de-

    duced from the argument below. Critical sampling of bandlim-

    ited signals can be modeled by a Gaussian low-pass filter fol-

    lowed by sampling with a sampling pitch of 1.1 times the stan-

    dard deviation of the Gaussian PSF [23]. In [21], we showed thatGaussian noise in the LR image sequence leads to Gaussian dis-

    tributed registration estimates. These registration errors act as an

    additional blur, even for sequences of infinite length [24]. If the

    standard deviation of this registration-error induced image blur

    is substantially (say two times) smaller than the optical image

    blur it will not affect the image quality after SR.

    V. EXPERIMENT ON REAL-WORLD DATA

    To demonstrate the potential of the proposed method under

    realistic conditions we applied it to a real-world image se-

    quence. Real-world data permits us to study the impact of

    changes in object intensities caused by variations in reflection,lens aberrations, small changes in aspect angle of the object

  • 7/29/2019 moving vehicle registration

    10/12

    2910 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

    Fig. 11. Four times SR reconstruction of a vehicle captured by an infrared camera (50 frames) at a large distance: (a) and (c) show the LR captured data; (b) and(d) show the SR reconstruction result obtained by the proposed method. (a) LR reference frame (64 64 pixels); (b) SR with zoom factor 4; (c) close-up of movingobject in (a); and (d) Close-up of moving object in (b).

    along the trajectory, and practical violations of the linear motion

    assumption.

    The data for this experiment is captured with an infrared

    camera (1 T from Amber Radiance). The sensor is composed

    of an indium antimonide (InSb) detector with 256 256 pixels

    in the 35 wavelength band. Furthermore, we use optics

    with a focal length of 50 mm and a viewing angle of 11.2 (also

    from Amber Radiance). We captured a vehicle (Jeep Wrangler)

    at 15 frames/second, driving with a continuous velocity ( 1pixel/frame apparent velocity) approximately perpendicular to

    the optical axis of the camera. A top view of the acquisition

    geometry is depicted in Fig. 10. During image capture, the

    platform of the camera was gently shaken to provide subpixel

    motion of the camera. Panning was used to keep the moving

    vehicle within the field of view of the camera.

    We selected the distance such that the vehicle appeared small

    (covering appr. 5 2 LR pixels in area) in the image plane.

    Fig. 11(a) shows a typical LR frame (64 64 pixels). A close-up

    of the vehicle is depicted in Fig. 11(c). The vehicle is driving

    from left to right at a distance of approximately 1150 meters.

    The SNR of the vehicle with the background is 30 dB and theSCR is 13 dB. In the simulation experiments, we have shown

    that for these values our method is capable of delivering good

    reconstruction performance. Fig. 11(b) shows the result after

    applying our SR reconstruction method with a close-up of the

    car in Fig. 11(d).

    The HR background is reconstructed from 85 frames with

    zoom factor 4. The camera blur is modeled by Gaussian optical

    blurring , followed by uniform rectangular sensor

    blurring (100% fill-factor). The HR foreground is reconstructed

    from 50 frames with zoom factor 4 with the same camera pa-rameters. The object boundary is approximated with 12 vertices

    and during the reconstruction the following settings are used:

    , in both step 1 and 2.

    Note that much more detail is visible in the SR result than in

    the LR image. The shape of the vehicle is very well pronounced

    and the hot engine of the vehicle is well visible. For comparison

    we display in Fig. 12 the SR result next to a captured image of

    the vehicle at a 4 shorter distance. Please be aware that the

    intensity mapping is not the same for both images. So a grey

    level in Fig. 12(a) may not be compared with the same grey

    level in Fig. 12(b). Notice that Fig. 12(b) was captured at a later

    time. Differences in environmental conditions (position of thesun, clouds, etc.), heating of the engine and vehicle as well as

  • 7/29/2019 moving vehicle registration

    11/12

    VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2911

    Fig. 12. SR result with zoom factor 4 of a jeep in (a) compared with the same jeep captured at a 4 shorter distance (b). (a) 4 SR result. (b) Object 4 closerto camera.

    the pose of the vehicle contribute to the observed differences be-

    tween the two images. The shape of the vehicle is reconstructed

    very well and the hot engine is located at a similar place.

    VI. CONCLUSION

    This paper presents a method for SR reconstruction of

    small moving objects. The method explicitly models the fore-

    and background contribution to the partial area effect of the

    boundary pixels. The main novelty of the proposed SR recon-

    struction method is the use of a combined object boundary

    and intensity description of the target object. This enables us

    to simultaneously estimate the object boundary with subpixel

    precision and the foreground intensities from the boundary

    pixels subject to a modified total variation constraint. This

    modification permits the use of the LevenbergMarquardt

    algorithm for optimizing the cost function. This method is

    known to converge to the global optimum for a well behavedcost function and an initial estimate not too far away.

    The proposed multiframe SR reconstruction method clearly

    improves the visual recognition of small moving objects under

    realistic imaging conditions in terms of SNR and SCR. We

    showed that our method performs well in reconstructing a

    small moving object where a state-of-the-art pixel-based SR

    reconstruction method [18] fails. The robustness against de-

    teriorations such as clutter and noise as well as violations of

    the linear motion model was established. Our method not only

    performs well on simulated data, but also provides an excellent

    result on a real-world image sequence captured with an infrared

    camera.

    REFERENCES

    [1] R. Y. Tsai and T. S. Huang, Multiframe image restoration and reg-istration, in Advances in Computer Vision and Image Proscessing.Greenwich, CT: JAI Press, 1984, vol. 1, pp. 317339.

    [2] M. Ben-Ezra, A. Zomet, and S. K. Nayar, Video super-resolutionusing controlled subpixel detector shifts, IEEE Trans. Pattern Anal.

    Mach. Intell. , vol. 27, no. 6, pp. 977987, Jun. 2005.[3] A. W. M. van Eekeren, K. Schutte, J. Dijk, D. J. J. de Lange, and L.

    J. van Vliet, Super-resolution on moving objects and background,in Proc. IEEE 13th Int. Conf. Image Process., 2006, vol. 1, pp.27092712.

    [4] A. W. M. van Eekeren, K. Schutte, and L. J. van Vliet, Super-reso-lution on small moving objects, in Proc. IEEE 15th Int. Conf. ImageProcess., 2008, vol. 1, pp. 12481251.

    [5] P. E. Eren, M. I. Sezan, and A. M. Tekalp, Robust, object-based highresolution image reconstruction from low-resolution video, IEEETrans. Image Process., vol. 6, no. 10, pp. 14461451, Oct. 1997.

    [6] S. Farsiu, M. Elad,and P. Milanfar,Video-to-videodynamicsuperres-olution for grayscaleand color sequences,J. Appl. Signal Process., pp.115, 2006.

    [7] R. C. Hardie, T. R. Tuinstra, J. Bognart, K. J. Barnard, and E. E. Arm-strong, High resolution image reconstruction from digital video withglobal and non-global scene motion, in Proc. IEEE 4th Int. Conf.

    Image Process., 1997, vol. 1, pp. 153156.[8] F. W. Wheeler and A. J. Hoogs, Moving vehicle registration and

    super-resolution, in Proc. IEEE Appl. Imagery Pattern Recognit.Workshop, 2007, pp. 101107.

    [9] M. Irani and S. Peleg, Improving resolution by image registration,Graph. Models Image Process., vol. 53, pp. 231239, 1991.

    [10] A. J. Patti, M. I. Sezan, and A. M. Tekalp, Superresolution videoreconstruction with arbitrary sampling lattices and nonzero aperturetime, IEEE Trans. Image Process., vol. 6, no. 8, pp. 10641076, Aug.1997.

    [11] R. J. M. den Hollander, D. J. J. de Lange, and K. Schutte, Super-resolutionof facesusing the epipolar constraint, in Proc. British Mach.Vis. Conf., 2007, pp. 110.

    [12] J. Wu, M. Trivedi, and B. Rao, High frequency component compensa-tion based super-resolution algorithm for face video enhancement, inProc. IEEE 17th Int. Conf. PatternRecognit., 2004, vol. 3, pp.598601.

    [13] J. Starck, E. Pantin, and F. Murtagh, Deconvolution in astronomy: Areview, Pub. Astron. Soc. Pacific, no. 114, pp. 10511069, 2002.

    [14] K. Schutte, D. J. J. de Lange, and S. P. van den Broek, Signal con-ditioning algorithms for enhanced tactical sensor imagery, in Proc.SPIE: Infrared Imag. Syst.: Design, Anal., Model. and Testing XIV,2003, vol. 5076, pp. 92100.

    [15] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, Fast and robustmulti-frame super resolution,IEEE Trans. Image Process., vol. 13, no.10, pp. 13271344, Oct. 2004.

    [16] J. J. Mor, The LevenbergMarquardt Algorithm: Implementation andTheory. New York: Springer-Verlag, 1978, vol. 630, pp. 105116.

    [17] J. Dijk, A. W. M. van Eekeren, K. Schutte, D. J. J. de Lange, and L.J. van Vliet, Super-resolution reconstruction for moving point targetdetection, Opt. Eng., vol. 47, no. 8, 2008.

    [18] T. Q. Pham, L. J. van Vliet, and K. Schutte, Robust fusion of irreg-ularly sampled data using adaptive normalized convolution, J. Appl.

    Signal Process., vol. 2006, pp. 112, 2006.[19] A. Zomet, A. Rav-Acha, and S. Peleg, Robust super-resolution, inProc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2001, vol. 1, pp.645650.

    [20] G. W. Zack, W. E. Rogers, and S. A. Latt, Automatic measurement ofsister chromatid exchange frequency, J. Histochem. Cytochem., vol.25, no. 7, pp. 741753, 1977.

    [21] T. Q. Pham, M. Bezuijen, L. J. van Vliet, K. Schutte, and C. L. L.Hendriks, Performance of optimal registration estimators, in Proc.Vis. Inf. Process. XIV, 2005, vol. 5817, pp. 133144.

    [22] J. van de Weijer and R. van den Boomgaard, Least squares and robustestimation of local image structure, Int. J. Comput. Vis., vol. 64, no.23, pp. 143155, 2005.

    [23] P. Verbeek and L. van Vliet, On the location error of curved edges inlow-passfiltered 2-d and 3-d images,IEEE Trans. Pattern Anal. Mach.

    Intell., vol. 16, no. 7, pp. 726733, Jul. 1994.[24] T. Q. Pham, L. J. van Vliet, and K. Schutte, Influence of signal-to-

    noise ratio and point spread function on limits of super-resolution,in Proc. Image Process.: Algorithms Syst. IV, 2005, vol. 5672, pp.169180, SPIE.

  • 7/29/2019 moving vehicle registration

    12/12

    2912 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

    Adam W. M. van Eekeren (S00M02) receivedthe M.Sc. degree from the Department of ElectricalEngineering, Eindhoven University of Technology,The Netherlands, in 2002, and the Ph.D. degree fromthe Electro-Optics Group within TNO Defence,Security, and Safety, The Hague, in collaborationwith the Quantitative Imaging Group at the DelftUniversity of Technology, The Netherlands, in 2009.

    He did his graduation project within PhilipsMedical Systems on the topic of image enhancementusing morphological operators. Subsequently, he

    worked for one year at the Philips Research Laboratory on image segmentationusing level sets. He worked as a Research Scientist at the Electro-Optics Group,TNO Defence, Security, and Safety, where he works on image improvement,change detection, and 3-D reconstruction. His research interests include imagerestoration, super-resolution, image quality assessment, and object detection.

    Klamer Schutte received the M.Sc. degree inphysics from the University of Amsterdam in 1989and the Ph.D. degree from the University of Twente,Enschede, The Netherlands, in 1994.

    He had a Post-Doctoral position with the DelftUniversity of Technologys Pattern Recognition(now Quantitative Imaging) group. Since 1996, hehas been employed by TNO, currently as SeniorResearch Scientist Electro-Optics within the Busi-ness Unit Observation Systems. Within TNO he hasactively lead multiple projects in areas of Signal and

    Image Processing. Recently, he has led many projects include super resolutionreconstruction for both international industries and governments, resulting insuper resolution reconstruction based products in active service. His researchinterests include pattern recognition, sensor fusion, image analysis, and imagerestoration. He is Secretary of the NVBHPV, The Netherlands branch of theIAPR.

    Lucas J. van Vliet (M02) studied applied physicsand received the Ph.D. degree (cum laude) from theDelft University of Technology, Delft, The Nether-lands, in 1993.

    He was appointed Full Professor in multidimen-sional image analysis in 1999. Since 2009, he hasbeen Director of the Delft Health Initiative, headof the Quantitative Imaging Group and chairman of

    the Department Imaging Science & Technology. Hewas president (20032009) of the Dutch Society forPattern Recognition and Image Analysis (NVPHBV)

    and sits on the board of the International Association for Pattern Recognition(IAPR) and the Dutch graduate school on Computing and Imaging (ASCI).He supervised 25 Ph.D. theses and is currently supervising 10 Ph.D. students.He was visiting scientist at Lawrence Livermore National Laboratories (1987),University of California San Francisco (1988), Monash University Melbourne(1996), and Lawrence Berkeley National Laboratories (1996). He has a trackrecord on fundamental as well as applied research in the field of multidimen-sional image processing, image analysis, and image recognition; (co)author of200 papers and four patents.

    Prof. van Vliet was awarded the prestigious talent research fellowship of theRoyal Netherlands Academy of Arts and Sciences (KNAW) in 1996.