Upload
udaykiran-sattaru
View
226
Download
0
Embed Size (px)
Citation preview
7/29/2019 moving vehicle registration
1/12
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010 2901
Multiframe Super-Resolution Reconstruction ofSmall Moving Objects
Adam W. M. van Eekeren, Member, IEEE, Klamer Schutte, and Lucas J. van Vliet, Member, IEEE
AbstractMultiframe super-resolution (SR) reconstruction ofsmall moving objects against a cluttered background is difficultfor two reasons: a small object consists completely of mixedboundary pixels and the background contribution changes fromframe-to-frame. We present a solution to this problem thatgreatly improves recognition of small moving objects under theassumption of a simple linear motion model in the real-world. Thepresented method not only explicitly models the image acquisitionsystem, but also the space-time variant fore- and backgroundcontributions to the mixed pixels. The latter is due to a changinglocal background as a result of the apparent motion. The methodsimultaneously estimates a subpixel precise polygon boundary
as well as a high-resolution (HR) intensity description of a smallmoving object subject to a modified total variation constraint.Experiments on simulated and real-world data show excellent per-formance of the proposed multiframe SR reconstruction method.
Index TermsBoundary description, moving object, partial areaeffect, super-resolution (SR) reconstruction.
I. INTRODUCTION
IN SURVEILLANCE applications, the most interesting
events are dynamic events consisting of changes occurringin the scene such as moving persons or moving objects. In
this paper, we focus on multiframe super-resolution (SR) re-
construction of small moving objects in under-sampled image
sequences. Small objects are objects that are completely com-
prised of boundary pixels. Each boundary pixel is a mixed
pixel, and its value has both contributions of the moving
foreground object and the locally varying background. Hence,
not only do the fractions change from frame-to-frame, but
also the local background values change due to the apparent
motion. Especially for small moving objects, an improvement
in resolution is useful to permit classification or identification.
Manuscript received November 25, 2008; revised April 24, 2010; acceptedApril 24, 2010. Date of publication August 19, 2010; date of current versionOctober 15, 2010. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Michael Elad.
A. W. M. van Eekeren is with the Electro Optics Group at TNO Defence,Security, and Safety, The Hague, The Netherlands. He is also with the Quanti-tative Imaging Group, Delft University of Technology, Delft, The Netherlands(e-mail: [email protected]).
K. Schutte is with the Electro Optics group at TNO Defence, Security andSafety, The Hague, The Netherlands (e-mail: [email protected]).
L. J. van Vliet is with the Quantitative Imaging Group at Delft University ofTechnology, Delft, The Netherlands (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2010.2068210
Multiframe SR reconstruction1 improves the spatial resolu-
tion by exchanging temporal information of a sequence of sub-
pixel displaced low-resolution (LR) images for spatial informa-
tion. Although the concept of SR reconstruction already exists
for more than 20 years [1], relatively little attention is given to
SR reconstruction of moving objects. In [2][8], this subject was
addressed for various dedicated tasks.
Although [2] and [5] apply different SR reconstruction
methods, i.e., iterative-back-projection [9] and projection onto
convex sets [10], respectively, both use a validity map in their
reconstruction process. This makes these methods robust tomotion outliers. Both methods perform well on large moving
objects that obey to a simple translational motion model. For
large objects, only a small fraction of the pixels are boundary
pixels. Hardie et al. [7] use optical flow to segment a moving
object and subsequently apply SR reconstruction to it. In their
work, the background is static and SR reconstruction is only
applied to the masked area inside a large moving object. In
[6], Kalman filters are used to reduce edge artifacts at the
boundary between fore- and background. However, the fore-
and background are not explicitly modeled in this method.
In previous work [3], we presented a system that applies SR
reconstruction after a segmentation step simultaneously to a
large moving object and the background using Hardies method
[7]. Again, no SR reconstruction is applied to the boundary
of mixed pixels separating the moving object from a cluttered
background. In [4], we presented the first attempt of SR recon-
struction on small moving objects with simulated data. At that
time no experiments were done on real-world data which lifted
the need for a very precise estimate of the objects trajectory.
In [8], SR reconstruction is performed on moving vehicles of
approximately 10 by 20 pixels. For object registration a trajec-
tory model is used in combination with a consistency measure
of the local background and vehicle. However, in the SR recon-
struction approach no attention is given to mixed pixels.
An interesting subset of moving objects consists of faces. Ef-forts in that area using SR reconstruction include [11] and [12],
in which the modeling of complex motion is a key element.
However, the faces in the LR input images used are far larger
than the small objects that we focus on in this paper. SR recon-
struction on moving objects is also applied in astronomy. An
overview can be found in [13], where it is explained that SR
reconstruction is only possible under the condition that the so-
lution is very sparse, i.e., very few samples having a value larger
than zero. In contrast, our SR reconstruction method is designed
to handle nonzero cluttered backgrounds.
1In the remainder of this paper SR reconstruction refers to multi-frame SR
reconstruction.
1057-7149/$26.00 2010 IEEE
7/29/2019 moving vehicle registration
2/12
2902 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010
Fig. 1. Flow diagram illustrating the construction of a 2-D HR image representing the cameras field-of-view and the degradation thereof into a LR framevia a camera model.
For small moving objects that consist completely of mixed
pixels against a cluttered background, the state-of-the-art
pixel-based SR reconstruction methods mentioned previously
will fail. Pixel-based SR reconstruction methods make an error
at the object boundary, because they cannot disentangle the
contributions from the space-time variant background and
foreground information within a mixed pixel. To tackle theaforementioned problem we incorporate a subpixel precise
object boundary model with a high-resolution (HR) pixel grid.
We simultaneously estimate this polygonal object boundary as
well as a HR intensity description of a small moving object
subject to a modified total variation constraint. Assuming rigid
objects that move with constant speed through the real world,
object registration is achieved by fitting a trajectory through the
objects center-of-mass in each frame. The approach assumes
that a HR background image is estimated first. Robust SR
reconstruction methods can accomplish this. They treat the
intensity fluctuations after global registration caused by the
small moving object as outliers. Especially for small moving
objects our approach significantly improves object recognition.Note that the use of the proposed SR reconstruction method
is not limited to small moving objects. It can also be used to
improve the resolution of boundary regions of larger moving
objects as long as the size of the object does not prohibit proper
SR reconstruction of the background.
The paper is organized as follows. First, in Section II we
present the forward model for a simulated HR scene and
the observed LR image data by an electro-optical sensor
system. In Section III, the three steps of the proposed SR
reconstruction method for small moving objects are presented.
Section IV presents experiments on simulated data, followed
by a real-world experiment in Section V. Finally, in Section VIthe main conclusions are presented.
II. FORWARD MODEL: REAL-WORLD DATA DESCRIPTION
This section describes the two steps of our forward model to
constructs a LR camera frame from HR representations of the
fore- and background in combination with a subpixel precise
polygon model of our object. The first step models the construc-
tion of a 2-D HR image including the moving object whereas
the second step models the image degradation as a result of thephysical properties of our camera system.
A. 2-D HR Scene
We model a cameras field-of-viewthe sceneat frame
as a properly sampled 2-D HR image . Each frame consists
of pixels without significant degradation due to motion, blur
or noise. Let us express this image in lexicographical notation
as the vector . The image is con-
structed from a translated HR background intensity description, consisting of pixels, and a translated HR
foreground intensity description , consisting
of pixels. This is depicted in the left part of Fig. 1. Note that
the foreground has a different apparent motion with respect to
the camera than the background .
The small moving object in the foreground is not only rep-
resented by its HR intensity description , but also by a sub-
pixel precise polygon boundary ,
with the number of vertices. We impose the following as-
sumptions on the motion of the object: 1) the aspect anglethe
angle between the direction of motion and the optical axis of the
camerastays the same and 2) the object is moving with a con-stant velocity, i.e., the acceleration is zero. These are realistic
assumptions if the object is far away from the camera and for a
short duration up to a few seconds. The latter does not limit the
acquisition of a large number LR frames due to the high frame
rate of todays image sensors.
At frame the HR background and the HR foreground are
translatedand merged into the 2-D HR image in which the
th pixel is defined by
(1)
with and .
Here, is the number of frames. The summation over
represent the translation of foreground pixel to
by bilinear interpolation and similarly, the summation over
translates background pixel to . The weight
represents the foreground contribution at pixel in frame
depending upon the polygon boundary . The foreground
contribution varies between 0 and 1, so the corresponding
background contribution equals by definition .
Fig. 2 depicts the construction of the th HR
image by masking both the translated background,
, and the translated foreground,
, after which the constituents are mergedinto . The polygon boundary defines the foreground
7/29/2019 moving vehicle registration
3/12
VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2903
Fig. 2. Flow diagram illustrating the masking of foreground and background constituents and the merging thereof into the HR image . The polygon boundaryis superimposed on the background contributions for visualization purposes only. Note that in the weight images and black indicatesno contribution, white indicates full contribution and greys indicate a partial contribution.
contributions and the background contributions in
HR frame .
B. Camera Model
A LR camera frame is obtained by applying the cameramodel to the 2-D HR image representing the cameras
field-of-view. The camera model comprises two types of image
blur, sampling, and degradation by noise.
Blur: The optical point-spread-function (PSF), together
with the sensor PSF, will cause a blurring in the image
plane. In this paper, the optical blur is modeled by a
Gaussian function with standard deviation . The
sensor blur is modeled by a uniform rectangular function
representing the fill-factor of each sensor element. A
convolution of both functions represents the total blurring
function.
Sampling: The sampling as depicted in Fig. 1 reflects thepixel pitch only. The integration of photons over the photo-
sensitive area of a pixel is accounted for by the aforemen-
tioned sensor blur.
Noise: The temporal noise in the recorded data is mod-
eled by additive, independent and identically distributed
Gaussian noise samples with standard deviation .
For the recorded data used, independent additive Gaussian
distributed noise is a sufficiently accurate noise model.
Other types of noise, like fixed pattern noise (FPN) and bad
pixels, are not explicitly modeled. For applications where
FPN becomes a hindrance, it is advised to correct the cap-
tured data prior to SR reconstruction using a scene-based
non uniformity correction algorithm, such as the one pro-posed in [14].
All in all, the observed th LRpixel from frame ismodeled
as follows:
(2)
for and .
Here, denotes the number of LR pixels in . The weight
represents the contribution of HR pixel to estimated
LR pixel . Each contribution is determined by the blurring
and sampling of the camera. represents an additive, inde-
pendent and identically distributed Gaussian noise sample with
standard deviation .
III. DESCRIPTION OF PROPOSED METHOD
The proposed SR reconstruction method can be divided into
three parts: 1) applying SR reconstruction to the background
for subsequent detection of moving objects from the residue be-tween the observed LR frame and a simulated LR frame based
upon the estimated HR background at that instance; 2) fitting a
trajectory model to the detected instances of the moving object
through the image sequence to obtain subpixel precise object
registration; and 3) obtaining a HR object representationcom-
prised of a subpixel precise boundary and a HR intensity de-
scriptionby solving an inverse problem based upon the model
of Section II. We start with the third step, because it is the key
innovative part of the proposed method.
A. SR Reconstruction of a Small Moving Object
To find the optimal HR description of the object (consisting
of a polygon boundary anda HR intensity description ), wesolve an inverse problem based upon the camera observation
7/29/2019 moving vehicle registration
4/12
2904 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010
model described in (1) and (2). To favor sparse solutions of this
ill-posed problem we added two regularization terms: one to pe-
nalize intensity transitions in the HR intensity description and
one to avoid unrealistically wild object shapes. These observa-
tions give rise to the following cost function:
(3)
where the first summation term represents the normalized data
misfit contributions for all pixels . Normalization is per-
formed with respect to the total number of LR pixels and the
noise variance . Here, denotes the measured intensities
of the observed LR pixels and the corresponding estimatedintensities obtained using the forward model of Section II. Al-
though the estimated intensities are also dependent upon
the background , only and are varied to minimize (3).
The HR background is estimated in advance as described in
Section III-B.
The second term of the cost function is a regularization
term which favors sparse solutions by penalizing the amount
of intensity variation within the object according to a criterion
similar to the bilateral total variation (BTV) criterion [15]. Here,
is the shift operator that shifts by pixels in horizontal
direction whereas shifts by pixels in vertical direction.
The actual minimization of the cost function is done in an it-
erative way by the LevenbergMarquardt (LM) algorithm [16].
This optimization algorithm assumes that the cost function has
a first derivative that exists everywhere. However, the L1-norm
used in the TV criterion does not satisfy this assumption. There-
fore, we introduce the hyperbolic norm
(4)
This norm has the same properties as the L1-norm for large
values and it has a first (and second) derivative that
exists everywhere. For all experiments the value is used.
The third term of (3) constrains the shape of the polygon by
penalizing the variation of the polygon boundary . Regu-larization is needed to penalize unwanted protrusions, such as
spikes, which cover a very small area compared to the total ob-
ject area. This constraint is embodied by the measure , which
is small when the polygon boundary is smooth
with (5)
is the inverse of , which is the area spanned by the edges
( and ) at vertex and half the angle between those edges
as indicated by the right part of (5).
From example (a) in Fig. 3 it is clear why the area is calcu-
lated with half the angle : if we would take the full angle
, would be zero, which would result in . Ex-ample (b) shows that the measure will be very large for small
Fig. 3. Two examples to illustrate the expression for polygon regularizationat vertex of polygon . (a) is minimal for , (b) is maximalfor .
angles, sharp protrusions. Note that this measure also becomes
very large for (inward pointing spike).
Note that in (3) normalization is performed on by a mul-
tiplication with the square of the mean edge length ,
with the number of vertices and the total edge length of
. This normalization prevents extensive growth of edges.
As mentioned previously, the actual minimization of the
cost function is performed in an iterative way by the Leven-bergMarquardt algorithm [16]. To allow this, we put the cost
function of (3) in the LM framework, which expects a format
like where is the measurement
and is the estimate depending upon parameter . In
general, it is straightforward to store all residues, for example
, in a vector which forms the input of the
LM algorithm. In our case, we have to be aware of the different
norms in each of the terms of (3). The residue vector looks
like
(6)
where the letters on top indicate the number of elements used in
each part of the cost function. The length of the vector in (6) is
.
The cost function in (3) is iteratively minimized to simulta-
neously find the optimal and . A flow diagram of this itera-
tive minimization procedure in steady state is depicted in Fig. 4.
Here the Cost function refers to (3) and the Camera model to
formulas (1) and (2). Note that the measured data used for
the minimization procedure contains a small region-of-interest
(ROI) around the moving object in each frame only.
The optimization scheme depicted in Fig. 4 has to be initial-
ized with an object boundary and an object intensity descrip-
tion . These can be obtained in several ways; we have chosento use a simple and robust initialization that proved to initialize
7/29/2019 moving vehicle registration
5/12
VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2905
Fig. 4. Flow diagram illustratingthe steadystate of estimating a HRdescriptionof a moving object ( and ). denotes the measured intensities in a regionof interest containing the moving object in all frames after registration anddenotes the correspondingestimated intensities at iteration . Note that theinitialHR object description ( and ) is derived from the measured LR sequenceand the object mask sequence.
the method close enough to the global minimum to permit con-
vergence to the global minimum in most practical cases.
The initial object boundary is obtained by first calculating
the frame-wise median width and the frame-wise median height
of the mask in the object mask sequence (defined in the
next section). Subsequently, we construct an elliptical object
boundary from the previously calculated width and height.
Upon initialization the vertices are evenly distributed over the
ellipse. The number of vertices is fixed during minimization.
The object intensity distribution is initialized by a constant
intensity equal to the median value over all masked pixel inten-
sities in the measured LR sequence .
Furthermore, the optimization procedure is performed in two
steps. The first step consists of the initialization described pre-
viously followed by a few iterations of the LM algorithm. We
derived during experimentation that using more than five itera-
tions has no effect on the final result.
After this step the intensity description often contains large
gradients perpendicular to the estimated object boundary, where
pixels outside the contour still contain the initial initialization
values. As this can cause getting stuck in local minima, a par-
tial reinitialization step is proposed. In this step, all intensities
of HR foreground pixels adjacent to a mixed boundary pixel but
located completely inside the object boundary are propagatedoutwards. After this partial reinitialization, we continue the it-
erative procedure until convergence or for a fixed number of it-
erations to be determined in a simulation experiment.
B. SR Reconstruction of Background and Moving Object
Detection
A small moving object causes a temporary change of a small
localized set of pixel intensities. In previous work [17], we pre-
sented a framework for the detection of moving point targets
against a static cluttered background. A robust pixel-based SR
reconstruction method computes a HR background image by
treating the local intensity variations caused by the small ob-ject as outliers. After registration of the HR background to a
recorded LR frame we apply the camera model to simulate the
LR frame with identical aliasing artifacts as in the recorded LR
frame, but without the small object. Thresholding the absolute
value of the residue image yields a powerful tool for object de-
tection, provided that the apparent motion is sufficient given the
number of frames to be used in background reconstruction. As-
suming LR frames containing a moving object of width(expressed in LR pixels), the apparent lateral motion must ex-
ceed LR pixels/frame for a proper background
reconstruction.
Several robust SR reconstruction methods have been reported
[15], [18], [19]. We choose the method developed by Zomet
et al. [19], which is robust to intensity outliers, such as those
caused by small moving objects. This method employs the same
camera model as presented in (2). Its robustness is introduced
by a robust back-projection
(7)
where median denotes a scaled pixel-wise median over the
frames and is the projection operator from HR image to
LR frame .
A LR representation of the background, obtained by applying
the camera model to the shifted HR background image , is
compared to the corresponding LR frame of the recorded image
sequence
(8)
where represents the blur and down-sample operation,is the th pixel of the shifted HR background in frame
and is the recorded intensity of the th pixel in frame .
All difference pixels constitute a residual image sequence
in which a moving object can be detected.
Thresholding this residual image sequence followed by
tracking improves the detectability for low residue-to-noise
ratios. Threshold selection is done with the chord method
from Zack et al. [20], which is illustrated in Fig. 5. With
this histogram based method an object mask sequence
results for and
, with the number of observed LR
frames and the number of pixels in each LR frame.After thresholding, multiple events may have been detected
in each frame of . We apply tracking to link the most sim-
ilar events in each frame to a so-called reference event. This ref-
erence event is defined by the median width , the median
height and the median residual energy of the largest
event in each frame (median is computed frame-wise). Next, we
search in each frame for the event with the smallest normal-
ized Euclidian distance w.r.t. the reference event shown in
(9) at the bottom of the next page, with the index of the event
in frame with the smallest normalized Euclidian distance to
the reference event. After this tracking step an object mask se-
quence is generated with in each frame at most one event,
the one corresponding to the object giving rise to the referenceevent. Note that a frame can be empty if no event was detected.
7/29/2019 moving vehicle registration
6/12
2906 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010
Fig. 5. Thresholdselection bythe chord method is based uponfinding thevalueof that maximizes the distance between the histogram and the chord. Thevalue is used as threshold value.
C. Moving Object RegistrationThe object mask sequence , obtained after thresholding
and tracking, gives a rough quantized indication of the position
of the object in each frame. For performing SR reconstruction, a
more precise, subpixel registration is needed. For large moving
objects which contain a sufficient number of internal pixels with
sufficient structure, gradient-based registration [21] can be per-
formed. In the setting of small moving objects, this is usually
not the case and another approach is needed.
Assuming a linear motion model for a moving object in the
real-world, the projected model can be fitted to the sequence
of detected object positions. We assume a constant velocity
without acceleration in the real world, which seems realistic
given the nature of small moving objects: the objects are far
away from the observer and will have a small acceleration
within the frames due to the high frame rate of todays
image sensors.
First, the position of the object in each frame is deter-
mined by computing the weighted center-of-mass (COM) of the
masked pixels as follows:
(10)
with the number of LR pixels in frame , the location
of pixel , the corresponding mask value (0 or 1) and
is the measured intensity.
To fit a trajectory, all object positions in time must be known
w.r.t. a reference point in the background of the scene. This is
done by adding the previously obtained apparent background
translation to the calculated object position for each frame:
.
To obtain all object positions with subpixel precision, a robust
fit to the measured object positions is performed. Assumingconstant motion, all object positions can be described by a refer-
ence object position and a translation . Both the reference
object position and the translation of the object are estimated by
minimizing the following cost function:
(11)
where denotes the Euclidean distance in LR pixels between
the measured object position and the estimated object position
at frame
(12)
The cost function in (11) is known as the Gaussian norm [22].
This norm is robust to outliers (e.g., false detections in our case).
The smoothing parameter is set to 0.5 LR pixel. Minimizing
the cost function in (11) with the LevenbergMarquardt algo-
rithm results in an accurate subpixel precise registration of the
moving object. If, e.g., 50 frames are used, the regis-
tration precision is improved by a factor 7.
D. Computational Complexity
The computational complexity is dominated by calculating(3), i.e., computing the SR reconstruction of the HR foreground.
At every iteration of the LM optimization procedure, the cost
function has to be calculated for variations in the estimated pa-
rameters to estimate the gradient w.r.t. the parameters to be
solved. The cost function has to be evaluated
times, with the number of HR foreground intensities, the
number of vertices and # the number of LM iterations. A recon-
struction as described in Section IV-B ( , ,
) using Matlab code took 37 min on a Pentium-4, 3.2
GHz processor under Windows.
The processing time can be drastically reduced if a precom-
putation of the partial derivatives of the cost function w.r.t. theHR foreground intensities is performed off-line and stored. In
this case, the computational complexity reduces to .
Note that typically thereby forecasting a reduction in
the computation time by one order of magnitude.
(9)
7/29/2019 moving vehicle registration
7/12
VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2907
IV. EXPERIMENTS ON SIMULATED DATA
The proposed SR reconstruction method for small moving
objects is first applied to simulated data to study the behavior
under controlled conditions. In a series of experiments, we tune
the regularization parameters and the number of iterations. Then
we study the convergence, the robustness in the presence of
clutter and noise, and the robustness against violations of theunderlying linear motion model.
A. Generating the Simulated Car Sequence
The simulated car sequence was generated to resemble the
real-world sequence of the next section as good as possible.
We simulated an under-sampled image sequence containing a
small moving car using the camera model as depicted in Fig. 1.
The parameters of the camera model were chosen to match the
sensor properties of the real-world system, i.e., optical blurring
(Gaussian kernel with standard deviation LR pixel)
and sensor blurring (rectangular uniform filter with a 100% fill-
factor) and Gaussian distributed noise to resemble the actualnoise conditions (see below). The car follows a linear motion
trajectory with zero acceleration. It consists of two internal in-
tensities, which are both above the median background inten-
sity. The low object intensity is exactly in between the me-
dian background intensity and the high object intensity. The
boundary of the car is modeled by a polygon with seven vertices.
Fig. 7(a) shows a HR image of the simulated car, which serves as
a ground-truth for all SR reconstruction results. Fig. 7(b) and (c)
show two LR image frames in which the car covers approxi-
mately 6 pixels. All 6 pixels are so called mixed pixels and
contain contributions of the fore- and background.
The image quality is further quantified by the signal-to-noise
ration (SNR) and the signal-to-clutter ratio (SCR). The SNR isa measure for the contrast between the object and the time-aver-
aged local background compared to stochastic variations called
noise. The SNR is defined as
(13)
with the number of frames, the mean foreground in-
tensity in frame and the mean local background inten-
sity inframe . iscalculated by takingthe mean intensity
of LR pixels that contain at least 50% foreground and is
defined by the mean intensity of all 100% background pixels ina small neighborhood around the object.
The SCR is a measure for the contrast between the object and
the time-averaged local background compared to the variation
in the local background. The SCR is defined as
(14)
with the standard deviation of the local background in
frame . In the LR domain, the SNR is 29 dB and the SCR is 14
dB. These are realistic values and derived from the real-world
image sequence of the next section.
In the next subsections, different experiments on the sim-ulated data are performed. For all experiments 50 LR frames
Fig. 6. NMSE between the SR result and the ground truth as a function of theregularization parameters and . Here both parameters are kept constant
throughout all iterations in step 1 and step 2.
are used to estimate the HR foreground and 85 LR frames are
used to estimate the HR background. In all used reconstruction
methods, the zoom factor is set to 4 and the camera parameters
are the same as in generating the simulated data.
B. Test 1: Tuning the Algorithm
Our algorithm contains several parameters such as the camera
parameters, the regularization parameters, and a stopping cri-
terion. Although the camera parameters such as the PSF and
fill-factor can be estimated rather well, the regularization pa-rameters and are far more difficult to tune. To study the
influence of the regularization parameters on the final result and
select the parameters for later use, a few experiments are per-
formed on 50 LR frames of the simulated car sequence.
In this experiment, we study the influence of the regular-
ization parameters and on the SR result for the sim-
ulated car sequence with a SNR of 29 dB and a SCR of 14
dB. Note that both regularization parameters are kept constant
during both steps of the optimization procedure. We use the nor-
malized mean squared error (NMSE) between the SR result of
the car and its ground truth as a
figure-of-merit. Note that this measure considers only the fore-ground intensities, the background intensities are set to zero
(15)
with the number of HR pixels, the estimated foreground
contributions using SR and the ground truth. Normalization
is done with the squared maximum value of .
From the result in Fig. 6 it can be seen that has by far the
largest influence on the NMSE. Therefore it is recommended to
set to . The value for is not critical and set to .
In a broad range around these values, more than three to five
iterations in step 1 did not change the final result. After 10 to15 iterations in step 2 the solution converged. Hence, we set the
7/29/2019 moving vehicle registration
8/12
2908 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010
Fig. 7. Four times SR reconstruction of a simulated under-sampled image sequence containing a small moving car. (a) HR image representing the scene serving
as ground truth; (b), (c) two typical LR frames (5 4 pixels) of the moving car; (d) 4 SR by a robust state-of-the-art method [18]; and (e) 4 SR by the proposedmethod.
maximum number of iterations in step 1 to five and in step 2 to
15.
C. Test 2: Comparison With a State-of-the-Art Pixel-Based
Technique
To assess the value of the proposed algorithm we compare it
with the visually best result obtained by a robust state-of-the-art
pixel-based SR technique [18]. Note that the registration is per-
formed by the trajectory fitting technique of this paper (to 85
LR frames) to put both methods on equal footing. The state-of-the-art pixel-based SR result is shown in Fig. 7(d) and bears very
little resemblance to the ground truth. This is no surprise since
the partial area effect at the boundary of the objectwhich af-
fects all object pixelsis not accounted for.
Using the optimal regularization parameters in both steps:
, we performed a SR reconstruction with
the proposed method to exactly the same LR image sequence.
The result is depicted in Fig. 7(e) and shows a very good resem-
blance to the ground truth. Subtle changes along the boundary
and along the intensity transition are caused by partial area ef-
fects due to the random placement of the reconstructed object
w.r.t. the HR grid. The object boundary is approximated with 8
vertices, which is one more than used for constructing the data,
so the boundary is slightly over-fitted. Comparing the results in
Fig. 7(d) and (e) shows that the result of our proposed method
is clearly superior to the pixel-based method of Pham [18].
D. Test 3: Robustness in the Presence of Clutter and Noise
To investigate the robustness of our method under different
conditions, we varied 1) the clutter amplitude of the local back-
ground and 2) the noise level of the simulated car sequence de-
scribed in Section IV-A. The clutter of the background is varied
by multiplying the background with a certain factor after sub-
tracting the median intensity. Afterwards the median intensity
is added again to return to the original median intensity. Theobject intensities as well as the size and shape of the car remain
the same. All parameters that are used for the reconstruction are
set to the same values as in test 2 in Section IV-C.
The quality of the different SR results is expressed by the
NMSE w.r.t. the ground truth as before. Fig. 8 depicts the
NMSE as a function of SNR and SCR. We divided the results in
three different categories: good , medium
and bad . For
each region a typical SR result is displayed to give a visual
impression of the performance. It is clear that the SR result in
the good region, obtained for values of the SNR and SCRthat occur in practice, bears a good resemblance to the ground
truth. Note that the visible background in these pictures is not
used to calculate the NMSE. Fig. 8 shows that the performance
decreases for a decreasing SNR. Furthermore, the boundary
between the good and medium region indicates a decrease
in performance under high clutter conditions .
E. Test 4: Robustness Against Variations in Motion
The proposed method assumes that the object moves with a
constant speed and appears in all frames to be used for recon-
struction with the same aspect angle. To demonstrate the robust-
ness of our method to violations on these assumptions, two ex-periments are performed. The first experiment shall determine
the robustness w.r.t. an acceleration of the object. The second
experiment shall establish the robustness w.r.t. scaling of the ob-
ject. We modified the simulated car sequence of Section IV-A.
In the first experiment an acceleration , expressed in LR
, is added and contributes to the object position by
, with the frame number. In the second experiment,
a scale factor, defined as the vehicle size last frame/vehicle size
first frame, is added. A scale factor of 0.8 indicates that the ob-
served length of the car varies from 3 LR pixel in the first frame
to 2.4 LR pixel in the last frame.
The NMSE as a function of acceleration and scaling is de-
picted in Fig. 9. Fig. 9(a) shows that a larger acceleration causesa larger error. An acceptable decrease of the is
7/29/2019 moving vehicle registration
9/12
VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2909
Fig. 8. NMSE for the SR results of the simulated car sequence as a function of the SNR and SCR. We have roughly divided the space in three categories: good,medium, bad and provided a typical SR result for each category.
Fig. 9. NMSE for the SR results of the simulated car sequence as a function of (a) acceleration and (b) object scaling.
Fig. 10. Top view of the acquisition geometry to capture the real-world data.
obtained for accelerations smaller than 0.001 LR .
The error of a constant velocity model fitted to a constant ac-
celeration motion will follow a parabolic model. This parabola
will be symmetric, and has a top to end point difference of
. From the mid-point between its top and an
end point we get a maximum error of , with
and this gives a maximum translational
error of 0.16 LR pixel.
For the second experiment Fig. 9(b) shows that a maximum
scaling of 15% is allowed with an acceptable performance loss.
This is a 7.5% maximum scale change from a mean scale. For a
3 pixel size object this translates to a maximum pixel shift error
of LR pixel for both the front and back object edges compared to its center of mass position.
Note that both experiments have well-comparable maximum
position errors of 0.16 and 0.11 LR pixel, rather consistent with
the requirement that the registration error for SR should at least
be smaller than half the HR pixel pitch. This can easily be de-
duced from the argument below. Critical sampling of bandlim-
ited signals can be modeled by a Gaussian low-pass filter fol-
lowed by sampling with a sampling pitch of 1.1 times the stan-
dard deviation of the Gaussian PSF [23]. In [21], we showed thatGaussian noise in the LR image sequence leads to Gaussian dis-
tributed registration estimates. These registration errors act as an
additional blur, even for sequences of infinite length [24]. If the
standard deviation of this registration-error induced image blur
is substantially (say two times) smaller than the optical image
blur it will not affect the image quality after SR.
V. EXPERIMENT ON REAL-WORLD DATA
To demonstrate the potential of the proposed method under
realistic conditions we applied it to a real-world image se-
quence. Real-world data permits us to study the impact of
changes in object intensities caused by variations in reflection,lens aberrations, small changes in aspect angle of the object
7/29/2019 moving vehicle registration
10/12
2910 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010
Fig. 11. Four times SR reconstruction of a vehicle captured by an infrared camera (50 frames) at a large distance: (a) and (c) show the LR captured data; (b) and(d) show the SR reconstruction result obtained by the proposed method. (a) LR reference frame (64 64 pixels); (b) SR with zoom factor 4; (c) close-up of movingobject in (a); and (d) Close-up of moving object in (b).
along the trajectory, and practical violations of the linear motion
assumption.
The data for this experiment is captured with an infrared
camera (1 T from Amber Radiance). The sensor is composed
of an indium antimonide (InSb) detector with 256 256 pixels
in the 35 wavelength band. Furthermore, we use optics
with a focal length of 50 mm and a viewing angle of 11.2 (also
from Amber Radiance). We captured a vehicle (Jeep Wrangler)
at 15 frames/second, driving with a continuous velocity ( 1pixel/frame apparent velocity) approximately perpendicular to
the optical axis of the camera. A top view of the acquisition
geometry is depicted in Fig. 10. During image capture, the
platform of the camera was gently shaken to provide subpixel
motion of the camera. Panning was used to keep the moving
vehicle within the field of view of the camera.
We selected the distance such that the vehicle appeared small
(covering appr. 5 2 LR pixels in area) in the image plane.
Fig. 11(a) shows a typical LR frame (64 64 pixels). A close-up
of the vehicle is depicted in Fig. 11(c). The vehicle is driving
from left to right at a distance of approximately 1150 meters.
The SNR of the vehicle with the background is 30 dB and theSCR is 13 dB. In the simulation experiments, we have shown
that for these values our method is capable of delivering good
reconstruction performance. Fig. 11(b) shows the result after
applying our SR reconstruction method with a close-up of the
car in Fig. 11(d).
The HR background is reconstructed from 85 frames with
zoom factor 4. The camera blur is modeled by Gaussian optical
blurring , followed by uniform rectangular sensor
blurring (100% fill-factor). The HR foreground is reconstructed
from 50 frames with zoom factor 4 with the same camera pa-rameters. The object boundary is approximated with 12 vertices
and during the reconstruction the following settings are used:
, in both step 1 and 2.
Note that much more detail is visible in the SR result than in
the LR image. The shape of the vehicle is very well pronounced
and the hot engine of the vehicle is well visible. For comparison
we display in Fig. 12 the SR result next to a captured image of
the vehicle at a 4 shorter distance. Please be aware that the
intensity mapping is not the same for both images. So a grey
level in Fig. 12(a) may not be compared with the same grey
level in Fig. 12(b). Notice that Fig. 12(b) was captured at a later
time. Differences in environmental conditions (position of thesun, clouds, etc.), heating of the engine and vehicle as well as
7/29/2019 moving vehicle registration
11/12
VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2911
Fig. 12. SR result with zoom factor 4 of a jeep in (a) compared with the same jeep captured at a 4 shorter distance (b). (a) 4 SR result. (b) Object 4 closerto camera.
the pose of the vehicle contribute to the observed differences be-
tween the two images. The shape of the vehicle is reconstructed
very well and the hot engine is located at a similar place.
VI. CONCLUSION
This paper presents a method for SR reconstruction of
small moving objects. The method explicitly models the fore-
and background contribution to the partial area effect of the
boundary pixels. The main novelty of the proposed SR recon-
struction method is the use of a combined object boundary
and intensity description of the target object. This enables us
to simultaneously estimate the object boundary with subpixel
precision and the foreground intensities from the boundary
pixels subject to a modified total variation constraint. This
modification permits the use of the LevenbergMarquardt
algorithm for optimizing the cost function. This method is
known to converge to the global optimum for a well behavedcost function and an initial estimate not too far away.
The proposed multiframe SR reconstruction method clearly
improves the visual recognition of small moving objects under
realistic imaging conditions in terms of SNR and SCR. We
showed that our method performs well in reconstructing a
small moving object where a state-of-the-art pixel-based SR
reconstruction method [18] fails. The robustness against de-
teriorations such as clutter and noise as well as violations of
the linear motion model was established. Our method not only
performs well on simulated data, but also provides an excellent
result on a real-world image sequence captured with an infrared
camera.
REFERENCES
[1] R. Y. Tsai and T. S. Huang, Multiframe image restoration and reg-istration, in Advances in Computer Vision and Image Proscessing.Greenwich, CT: JAI Press, 1984, vol. 1, pp. 317339.
[2] M. Ben-Ezra, A. Zomet, and S. K. Nayar, Video super-resolutionusing controlled subpixel detector shifts, IEEE Trans. Pattern Anal.
Mach. Intell. , vol. 27, no. 6, pp. 977987, Jun. 2005.[3] A. W. M. van Eekeren, K. Schutte, J. Dijk, D. J. J. de Lange, and L.
J. van Vliet, Super-resolution on moving objects and background,in Proc. IEEE 13th Int. Conf. Image Process., 2006, vol. 1, pp.27092712.
[4] A. W. M. van Eekeren, K. Schutte, and L. J. van Vliet, Super-reso-lution on small moving objects, in Proc. IEEE 15th Int. Conf. ImageProcess., 2008, vol. 1, pp. 12481251.
[5] P. E. Eren, M. I. Sezan, and A. M. Tekalp, Robust, object-based highresolution image reconstruction from low-resolution video, IEEETrans. Image Process., vol. 6, no. 10, pp. 14461451, Oct. 1997.
[6] S. Farsiu, M. Elad,and P. Milanfar,Video-to-videodynamicsuperres-olution for grayscaleand color sequences,J. Appl. Signal Process., pp.115, 2006.
[7] R. C. Hardie, T. R. Tuinstra, J. Bognart, K. J. Barnard, and E. E. Arm-strong, High resolution image reconstruction from digital video withglobal and non-global scene motion, in Proc. IEEE 4th Int. Conf.
Image Process., 1997, vol. 1, pp. 153156.[8] F. W. Wheeler and A. J. Hoogs, Moving vehicle registration and
super-resolution, in Proc. IEEE Appl. Imagery Pattern Recognit.Workshop, 2007, pp. 101107.
[9] M. Irani and S. Peleg, Improving resolution by image registration,Graph. Models Image Process., vol. 53, pp. 231239, 1991.
[10] A. J. Patti, M. I. Sezan, and A. M. Tekalp, Superresolution videoreconstruction with arbitrary sampling lattices and nonzero aperturetime, IEEE Trans. Image Process., vol. 6, no. 8, pp. 10641076, Aug.1997.
[11] R. J. M. den Hollander, D. J. J. de Lange, and K. Schutte, Super-resolutionof facesusing the epipolar constraint, in Proc. British Mach.Vis. Conf., 2007, pp. 110.
[12] J. Wu, M. Trivedi, and B. Rao, High frequency component compensa-tion based super-resolution algorithm for face video enhancement, inProc. IEEE 17th Int. Conf. PatternRecognit., 2004, vol. 3, pp.598601.
[13] J. Starck, E. Pantin, and F. Murtagh, Deconvolution in astronomy: Areview, Pub. Astron. Soc. Pacific, no. 114, pp. 10511069, 2002.
[14] K. Schutte, D. J. J. de Lange, and S. P. van den Broek, Signal con-ditioning algorithms for enhanced tactical sensor imagery, in Proc.SPIE: Infrared Imag. Syst.: Design, Anal., Model. and Testing XIV,2003, vol. 5076, pp. 92100.
[15] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, Fast and robustmulti-frame super resolution,IEEE Trans. Image Process., vol. 13, no.10, pp. 13271344, Oct. 2004.
[16] J. J. Mor, The LevenbergMarquardt Algorithm: Implementation andTheory. New York: Springer-Verlag, 1978, vol. 630, pp. 105116.
[17] J. Dijk, A. W. M. van Eekeren, K. Schutte, D. J. J. de Lange, and L.J. van Vliet, Super-resolution reconstruction for moving point targetdetection, Opt. Eng., vol. 47, no. 8, 2008.
[18] T. Q. Pham, L. J. van Vliet, and K. Schutte, Robust fusion of irreg-ularly sampled data using adaptive normalized convolution, J. Appl.
Signal Process., vol. 2006, pp. 112, 2006.[19] A. Zomet, A. Rav-Acha, and S. Peleg, Robust super-resolution, inProc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2001, vol. 1, pp.645650.
[20] G. W. Zack, W. E. Rogers, and S. A. Latt, Automatic measurement ofsister chromatid exchange frequency, J. Histochem. Cytochem., vol.25, no. 7, pp. 741753, 1977.
[21] T. Q. Pham, M. Bezuijen, L. J. van Vliet, K. Schutte, and C. L. L.Hendriks, Performance of optimal registration estimators, in Proc.Vis. Inf. Process. XIV, 2005, vol. 5817, pp. 133144.
[22] J. van de Weijer and R. van den Boomgaard, Least squares and robustestimation of local image structure, Int. J. Comput. Vis., vol. 64, no.23, pp. 143155, 2005.
[23] P. Verbeek and L. van Vliet, On the location error of curved edges inlow-passfiltered 2-d and 3-d images,IEEE Trans. Pattern Anal. Mach.
Intell., vol. 16, no. 7, pp. 726733, Jul. 1994.[24] T. Q. Pham, L. J. van Vliet, and K. Schutte, Influence of signal-to-
noise ratio and point spread function on limits of super-resolution,in Proc. Image Process.: Algorithms Syst. IV, 2005, vol. 5672, pp.169180, SPIE.
7/29/2019 moving vehicle registration
12/12
2912 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010
Adam W. M. van Eekeren (S00M02) receivedthe M.Sc. degree from the Department of ElectricalEngineering, Eindhoven University of Technology,The Netherlands, in 2002, and the Ph.D. degree fromthe Electro-Optics Group within TNO Defence,Security, and Safety, The Hague, in collaborationwith the Quantitative Imaging Group at the DelftUniversity of Technology, The Netherlands, in 2009.
He did his graduation project within PhilipsMedical Systems on the topic of image enhancementusing morphological operators. Subsequently, he
worked for one year at the Philips Research Laboratory on image segmentationusing level sets. He worked as a Research Scientist at the Electro-Optics Group,TNO Defence, Security, and Safety, where he works on image improvement,change detection, and 3-D reconstruction. His research interests include imagerestoration, super-resolution, image quality assessment, and object detection.
Klamer Schutte received the M.Sc. degree inphysics from the University of Amsterdam in 1989and the Ph.D. degree from the University of Twente,Enschede, The Netherlands, in 1994.
He had a Post-Doctoral position with the DelftUniversity of Technologys Pattern Recognition(now Quantitative Imaging) group. Since 1996, hehas been employed by TNO, currently as SeniorResearch Scientist Electro-Optics within the Busi-ness Unit Observation Systems. Within TNO he hasactively lead multiple projects in areas of Signal and
Image Processing. Recently, he has led many projects include super resolutionreconstruction for both international industries and governments, resulting insuper resolution reconstruction based products in active service. His researchinterests include pattern recognition, sensor fusion, image analysis, and imagerestoration. He is Secretary of the NVBHPV, The Netherlands branch of theIAPR.
Lucas J. van Vliet (M02) studied applied physicsand received the Ph.D. degree (cum laude) from theDelft University of Technology, Delft, The Nether-lands, in 1993.
He was appointed Full Professor in multidimen-sional image analysis in 1999. Since 2009, he hasbeen Director of the Delft Health Initiative, headof the Quantitative Imaging Group and chairman of
the Department Imaging Science & Technology. Hewas president (20032009) of the Dutch Society forPattern Recognition and Image Analysis (NVPHBV)
and sits on the board of the International Association for Pattern Recognition(IAPR) and the Dutch graduate school on Computing and Imaging (ASCI).He supervised 25 Ph.D. theses and is currently supervising 10 Ph.D. students.He was visiting scientist at Lawrence Livermore National Laboratories (1987),University of California San Francisco (1988), Monash University Melbourne(1996), and Lawrence Berkeley National Laboratories (1996). He has a trackrecord on fundamental as well as applied research in the field of multidimen-sional image processing, image analysis, and image recognition; (co)author of200 papers and four patents.
Prof. van Vliet was awarded the prestigious talent research fellowship of theRoyal Netherlands Academy of Arts and Sciences (KNAW) in 1996.