moving vehicle registration

7/29/2019 moving vehicle registration

1/12

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010 2901

Multiframe Super-Resolution Reconstruction ofSmall Moving Objects

Adam W. M. van Eekeren, Member, IEEE, Klamer Schutte, and Lucas J. van Vliet, Member, IEEE

AbstractMultiframe super-resolution (SR) reconstruction ofsmall moving objects against a cluttered background is difficultfor two reasons: a small object consists completely of mixedboundary pixels and the background contribution changes fromframe-to-frame. We present a solution to this problem thatgreatly improves recognition of small moving objects under theassumption of a simple linear motion model in the real-world. Thepresented method not only explicitly models the image acquisitionsystem, but also the space-time variant fore- and backgroundcontributions to the mixed pixels. The latter is due to a changinglocal background as a result of the apparent motion. The methodsimultaneously estimates a subpixel precise polygon boundary

as well as a high-resolution (HR) intensity description of a smallmoving object subject to a modified total variation constraint.Experiments on simulated and real-world data show excellent per-formance of the proposed multiframe SR reconstruction method.

Index TermsBoundary description, moving object, partial areaeffect, super-resolution (SR) reconstruction.

I. INTRODUCTION

IN SURVEILLANCE applications, the most interesting

events are dynamic events consisting of changes occurringin the scene such as moving persons or moving objects. In

this paper, we focus on multiframe super-resolution (SR) re-

construction of small moving objects in under-sampled image

sequences. Small objects are objects that are completely com-

prised of boundary pixels. Each boundary pixel is a mixed

pixel, and its value has both contributions of the moving

foreground object and the locally varying background. Hence,

not only do the fractions change from frame-to-frame, but

also the local background values change due to the apparent

motion. Especially for small moving objects, an improvement

in resolution is useful to permit classification or identification.

Manuscript received November 25, 2008; revised April 24, 2010; acceptedApril 24, 2010. Date of publication August 19, 2010; date of current versionOctober 15, 2010. The associate editor coordinating the review of this manu-script and approving it for publication was Dr. Michael Elad.

A. W. M. van Eekeren is with the Electro Optics Group at TNO Defence,Security, and Safety, The Hague, The Netherlands. He is also with the Quanti-tative Imaging Group, Delft University of Technology, Delft, The Netherlands(e-mail: [email protected]).

K. Schutte is with the Electro Optics group at TNO Defence, Security andSafety, The Hague, The Netherlands (e-mail: [email protected]).

L. J. van Vliet is with the Quantitative Imaging Group at Delft University ofTechnology, Delft, The Netherlands (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2010.2068210

Multiframe SR reconstruction1 improves the spatial resolu-

tion by exchanging temporal information of a sequence of sub-

pixel displaced low-resolution (LR) images for spatial informa-

tion. Although the concept of SR reconstruction already exists

for more than 20 years [1], relatively little attention is given to

SR reconstruction of moving objects. In [2][8], this subject was

addressed for various dedicated tasks.

Although [2] and [5] apply different SR reconstruction

methods, i.e., iterative-back-projection [9] and projection onto

convex sets [10], respectively, both use a validity map in their

reconstruction process. This makes these methods robust tomotion outliers. Both methods perform well on large moving

objects that obey to a simple translational motion model. For

large objects, only a small fraction of the pixels are boundary

pixels. Hardie et al. [7] use optical flow to segment a moving

object and subsequently apply SR reconstruction to it. In their

work, the background is static and SR reconstruction is only

applied to the masked area inside a large moving object. In

[6], Kalman filters are used to reduce edge artifacts at the

boundary between fore- and background. However, the fore-

and background are not explicitly modeled in this method.

In previous work [3], we presented a system that applies SR

reconstruction after a segmentation step simultaneously to a

large moving object and the background using Hardies method

[7]. Again, no SR reconstruction is applied to the boundary

of mixed pixels separating the moving object from a cluttered

background. In [4], we presented the first attempt of SR recon-

struction on small moving objects with simulated data. At that

time no experiments were done on real-world data which lifted

the need for a very precise estimate of the objects trajectory.

In [8], SR reconstruction is performed on moving vehicles of

approximately 10 by 20 pixels. For object registration a trajec-

tory model is used in combination with a consistency measure

of the local background and vehicle. However, in the SR recon-

struction approach no attention is given to mixed pixels.

An interesting subset of moving objects consists of faces. Ef-forts in that area using SR reconstruction include [11] and [12],

in which the modeling of complex motion is a key element.

However, the faces in the LR input images used are far larger

than the small objects that we focus on in this paper. SR recon-

struction on moving objects is also applied in astronomy. An

overview can be found in [13], where it is explained that SR

reconstruction is only possible under the condition that the so-

lution is very sparse, i.e., very few samples having a value larger

than zero. In contrast, our SR reconstruction method is designed

to handle nonzero cluttered backgrounds.

1In the remainder of this paper SR reconstruction refers to multi-frame SR

reconstruction.

1057-7149/$26.00 2010 IEEE


2/12

2902 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 11, NOVEMBER 2010

Fig. 1. Flow diagram illustrating the construction of a 2-D HR image representing the cameras field-of-view and the degradation thereof into a LR framevia a camera model.

For small moving objects that consist completely of mixed

pixels against a cluttered background, the state-of-the-art

pixel-based SR reconstruction methods mentioned previously

will fail. Pixel-based SR reconstruction methods make an error

at the object boundary, because they cannot disentangle the

contributions from the space-time variant background and

foreground information within a mixed pixel. To tackle theaforementioned problem we incorporate a subpixel precise

object boundary model with a high-resolution (HR) pixel grid.

We simultaneously estimate this polygonal object boundary as

well as a HR intensity description of a small moving object

subject to a modified total variation constraint. Assuming rigid

objects that move with constant speed through the real world,

object registration is achieved by fitting a trajectory through the

objects center-of-mass in each frame. The approach assumes

that a HR background image is estimated first. Robust SR

reconstruction methods can accomplish this. They treat the

intensity fluctuations after global registration caused by the

small moving object as outliers. Especially for small moving

objects our approach significantly improves object recognition.Note that the use of the proposed SR reconstruction method

is not limited to small moving objects. It can also be used to

improve the resolution of boundary regions of larger moving

objects as long as the size of the object does not prohibit proper

SR reconstruction of the background.

The paper is organized as follows. First, in Section II we

present the forward model for a simulated HR scene and

the observed LR image data by an electro-optical sensor

system. In Section III, the three steps of the proposed SR

reconstruction method for small moving objects are presented.

Section IV presents experiments on simulated data, followed

by a real-world experiment in Section V. Finally, in Section VIthe main conclusions are presented.

II. FORWARD MODEL: REAL-WORLD DATA DESCRIPTION

This section describes the two steps of our forward model to

constructs a LR camera frame from HR representations of the

fore- and background in combination with a subpixel precise

polygon model of our object. The first step models the construc-

tion of a 2-D HR image including the moving object whereas

the second step models the image degradation as a result of thephysical properties of our camera system.

A. 2-D HR Scene

We model a cameras field-of-viewthe sceneat frame

as a properly sampled 2-D HR image . Each frame consists

of pixels without significant degradation due to motion, blur

or noise. Let us express this image in lexicographical notation

as the vector . The image is con-

structed from a translated HR background intensity description, consisting of pixels, and a translated HR

foreground intensity description , consisting

of pixels. This is depicted in the left part of Fig. 1. Note that

the foreground has a different apparent motion with respect to

the camera than the background .

The small moving object in the foreground is not only rep-

resented by its HR intensity description , but also by a sub-

pixel precise polygon boundary ,

with the number of vertices. We impose the following as-

sumptions on the motion of the object: 1) the aspect anglethe

angle between the direction of motion and the optical axis of the

camerastays the same and 2) the object is moving with a con-stant velocity, i.e., the acceleration is zero. These are realistic

assumptions if the object is far away from the camera and for a

short duration up to a few seconds. The latter does not limit the

acquisition of a large number LR frames due to the high frame

rate of todays image sensors.

At frame the HR background and the HR foreground are

translatedand merged into the 2-D HR image in which the

th pixel is defined by

(1)

with and .

Here, is the number of frames. The summation over

represent the translation of foreground pixel to

by bilinear interpolation and similarly, the summation over

translates background pixel to . The weight

represents the foreground contribution at pixel in frame

depending upon the polygon boundary . The foreground

contribution varies between 0 and 1, so the corresponding

background contribution equals by definition .

Fig. 2 depicts the construction of the th HR

image by masking both the translated background,

, and the translated foreground,

, after which the constituents are mergedinto . The polygon boundary defines the foreground


3/12

VAN EEKEREN et al.: MULTIFRAME SUPER-RESOLUTION RECONSTRUCTION OF SMALL MOVING OBJECTS 2903

Fig. 2. Flow diagram illustrating the masking of foreground and background constituents and the merging thereof into the HR image . The polygon boundaryis superimposed on the background contributions for visualization purposes only. Note that in the weight images and black indicatesno contribution, white indicates full contribution and greys indicate a partial contribution.

contributions and the background contributions in

HR frame .

B. Camera Model

A LR camera frame is obtained by applying the cameramodel to the 2-D HR image representing the cameras

field-of-view. The camera model comprises two types of image

blur, sampling, and degradation by noise.

Blur: The optical point-spread-function (PSF), together

with the sensor PSF, will cause a blurring in the image

plane. In this paper, the optical blur is modeled by a

Gaussian function with standard deviation . The

sensor blur is modeled by a uniform rectangular function

representing the fill-factor of each sensor element. A

convolution of both functions represents the total blurring

function.

Sampling: The sampling as depicted in Fig. 1 reflects thepixel pitch only. The integration of photons over the photo-

sensitive area of a pixel is accounted for by the aforemen-

tioned sensor blur.

Noise: The temporal noise in the recorded data is mod-

eled by additive, independent and identically distributed

Gaussian noise samples with standard deviation .

For the recorded data used, independent additive Gaussian

distributed noise is a sufficiently accurate noise model.

Other types of noise, like fixed pattern noise (FPN) and bad

pixels, are not explicitly modeled. For applications where

FPN becomes a hindrance, it is advised to correct the cap-

tured data prior to SR reconstruction using a scene-based

non uniformity correction algorithm, such as the one pro-posed in [14].

All in all, the observed th LRpixel from frame ismodeled

as follows:

(2)

for and .

Here, denotes the number of LR pixels in . The weight

represents the contribution of HR pixel to estimated

LR pixel . Each contribution is determined by the blurring

and sampling of the camera. represents an additive, inde-

pendent and identically distributed Gaussian noise sample with

standard deviation .

III. DESCRIPTION OF PROPOSED METHOD

The proposed SR reconstruction method can be divided into

three parts: 1) applying SR reconstruction to the background

for subsequent detection of moving objects from the residue be-tween the observed LR frame and a simulated LR frame based

upon the estimated HR background at that instance; 2) fitting a

trajectory model to the detected instances of the moving object

through the image sequence to obtain subpixel precise object

registration; and 3) obtaining a HR object representationcom-

prised of a subpixel precise boundary and a HR intensity de-

scriptionby solving an inverse problem based upon the model

of Section II. We start with the third step, because it is the key

innovative part of the proposed method.

A. SR Reconstruction of a Small Moving Object

To find the optimal HR description of the object (consisting

of a polygon boundary anda HR intensity description ), wesolve an inverse problem based upon the camera observation


4/12


model described in (1) and (2). To favor sparse solutions of this

ill-posed problem we added two regularization terms: one to pe-

nalize intensity transitions in the HR intensity description and

one to avoid unrealistically wild object shapes. These observa-

tions give rise to the following cost function:

(3)

where the first summation term represents the normalized data

misfit contributions for all pixels . Normalization is per-

formed with respect to the total number of LR pixels and the

noise variance . Here, denotes the measured intensities

of the observed LR pixels and the corresponding estimatedintensities obtained using the forward model of Section II. Al-

though the estimated intensities are also dependent upon

the background , only and are varied to minimize (3).

The HR background is estimated in advance as described in

Section III-B.

The second term of the cost function is a regularization

term which favors sparse solutions by penalizing the amount

of intensity variation within the object according to a criterion

similar to the bilateral total variation (BTV) criterion [15]. Here,

is the shift operator that shifts by pixels in horizontal

direction whereas shifts by pixels in vertical direction.

The actual minimization of the cost function is done in an it-

erative way by the LevenbergMarquardt (LM) algorithm [16].

This optimization algorithm assumes that the cost function has

a first derivative that exists everywhere. However, the L1-norm

used in the TV criterion does not satisfy this assumption. There-

fore, we introduce the hyperbolic norm

(4)

This norm has the same properties as the L1-norm for large

values and it has a first (and second) derivative that

exists everywhere. For all experiments the value is used.

The third term of (3) constrains the shape of the polygon by

penalizing the variation of the polygon boundary . Regu-larization is needed to penalize unwanted protrusions, such as

spikes, which cover a very small area compared to the total ob-

ject area. This constraint is embodied by the measure , which

is small when the polygon boundary is smooth

with (5)

is the inverse of , which is the area spanned by the edges

( and ) at vertex and half the angle between those edges

as indicated by the right part of (5).

From example (a) in Fig. 3 it is clear why the area is calcu-

lated with half the angle : if we would take the full angle

, would be zero, which would result in . Ex-ample (b) shows that the measure will be very large for small

Fig. 3. Two examples to illustrate the expression for polygon regularizationat vertex of polygon . (a) is minimal for , (b) is maximalfor .

angles, sharp protrusions. Note that this measure also becomes

very large for (inward pointing spike).

Note that in (3) normalization is performed on by a mul-

tiplication with the square of the mean edge length ,

with the number of vertices and the total edge length of

. This normalization prevents extensive growth of edges.

As mentioned previously, the actual minimization of the

cost function is performed in an iterative way by the Leven-bergMarquardt algorithm [16]. To allow this, we put the cost

function of (3) in the LM framework, which expects a format

like where is the measurement

and is the estimate depending upon parameter . In

general, it is straightforward to store all residues, for example

, in a vector which forms the input of the

LM algorithm. In our case, we have to be aware of the different

norms in each of the terms of (3). The residue vector looks

like

(6)

where the letters on top indicate the number of elements used in

each part of the cost function. The length of the vector in (6) is

.

The cost function in (3) is iteratively minimized to simulta-

neously find the optimal and . A flow diagram of this itera-

tive minimization procedure in steady state is depicted in Fig. 4.

Here the Cost function refers to (3) and the Camera model to

formulas (1) and (2). Note that the measured data used for

the minimization procedure contains a small region-of-interest

(ROI) around the moving object in each frame only.

The optimization scheme depicted in Fig. 4 has to be initial-

ized with an object boundary and an object intensity descrip-

tion . These can be obtained in several ways; we have chosento use a simple and robust initialization that proved to initialize


5/12


Fig. 4. Flow diagram illustratingthe steadystate of estimating a HRdescriptionof a moving object ( and ). denotes the measured intensities in a regionof interest containing the moving object in all frames after registration anddenotes the correspondingestimated intensities at iteration . Note that theinitialHR object description ( and ) is derived from the measured LR sequenceand the object mask sequence.

the method close enough to the global minimum to permit con-

vergence to the global minimum in most practical cases.

The initial object boundary is obtained by first calculating

the frame-wise median width and the frame-wise median height

of the mask in the object mask sequence (defined in the

next section). Subsequently, we construct an elliptical object

boundary from the previously calculated width and height.

Upon initialization the vertices are evenly distributed over the

ellipse. The number of vertices is fixed during minimization.

The object intensity distribution is initialized by a constant

intensity equal to the median value over all masked pixel inten-

sities in the measured LR sequence .

Furthermore, the optimization procedure is performed in two

steps. The first step consists of the initialization described pre-

viously followed by a few iterations of the LM algorithm. We

derived during experimentation that using more than five itera-

tions has no effect on the final result.

After this step the intensity description often contains large

gradients perpendicular to the estimated object boundary, where

pixels outside the contour still contain the initial initialization

values. As this can cause getting stuck in local minima, a par-

tial reinitialization step is proposed. In this step, all intensities

of HR foreground pixels adjacent to a mixed boundary pixel but

located completely inside the object boundary are propagatedoutwards. After this partial reinitialization, we continue the it-

erative procedure until convergence or for a fixed number of it-

erations to be determined in a simulation experiment.

B. SR Reconstruction of Background and Moving Object

Detection

A small moving object causes a temporary change of a small

localized set of pixel intensities. In previous work [17], we pre-

sented a framework for the detection of moving point targets

against a static cluttered background. A robust pixel-based SR

reconstruction method computes a HR background image by

treating the local intensity variations caused by the small ob-ject as outliers. After registration of the HR background to a

recorded LR frame we apply the camera model to simulate the

LR frame with identical aliasing artifacts as in the recorded LR

frame, but without the small object. Thresholding the absolute

value of the residue image yields a powerful tool for object de-

tection, provided that the apparent motion is sufficient given the

number of frames to be used in background reconstruction. As-

suming LR frames containing a moving object of width(expressed in LR pixels), the apparent lateral motion must ex-

ceed LR pixels/frame for a proper background

reconstruction.

Several robust SR reconstruction methods have been reported

[15], [18], [19]. We choose the method developed by Zomet

et al. [19], which is robust to intensity outliers, such as those

caused by small moving objects. This method employs the same

camera model as presented in (2). Its robustness is introduced

by a robust back-projection

(7)

where median denotes a scaled pixel-wise median over the

frames and is the projection operator from HR image to

LR frame .

A LR representation of the background, obtained by applying

the camera model to the shifted HR background image , is

compared to the corresponding LR frame of the recorded image

sequence

(8)

where represents the blur and down-sample operation,is the th pixel of the shifted HR background in frame

and is the recorded intensity of the th pixel in frame .

All difference pixels constitute a residual image sequence

in which a moving object can be detected.

Thresholding this residual image sequence followed by

tracking improves the detectability for low residue-to-noise

ratios. Threshold selection is done with the chord method

from Zack et al. [20], which is illustrated in Fig. 5. With

this histogram based method an object mask sequence

results for and

, with the number of observed LR

frames and the number of pixels in each LR frame.After thresholding, multiple events may have been detected

in each frame of . We apply tracking to link the most sim-

ilar events in each frame to a so-called reference event. This ref-

erence event is defined by the median width , the median

height and the median residual energy of the largest

event in each frame (median is computed frame-wise). Next, we

search in each frame for the event with the smallest normal-

ized Euclidian distance w.r.t. the reference event shown in

(9) at the bottom of the next page, with the index of the event

in frame with the smallest normalized Euclidian distance to

the reference event. After this tracking step an object mask se-

quence is generated with in each frame at most one event,

the one corresponding to the object giving rise to the referenceevent. Note that a frame can be empty if no event was detected.


6/12


Fig. 5. Thresholdselection bythe chord method is based uponfinding thevalueof that maximizes the distance between the histogram and the chord. Thevalue is used as threshold value.

C. Moving Object RegistrationThe object mask sequence , obtained after thresholding

and tracking, gives a rough quantized indication of the position

of the object in each frame. For performing SR reconstruction, a

more precise, subpixel registration is needed. For large moving

objects which contain a sufficient number of internal pixels with

sufficient structure, gradient-based registration [21] can be per-

formed. In the setting of small moving objects, this is usually

not the case and another approach is needed.

Assuming a linear motion model for a moving object in the

real-world, the projected model can be fitted to the sequence

of detected object positions. We assume a constant velocity

without acceleration in the real world, which seems realistic

given the nature of small moving objects: the objects are far

away from the observer and will have a small acceleration

within the frames due to the high frame rate of todays

image sensors.

First, the position of the object in each frame is deter-

mined by computing the weighted center-of-mass (COM) of the

masked pixels as follows:

(10)

with the number of LR pixels in frame , the location

of pixel , the corresponding mask value (0 or 1) and

is the measured intensity.

To fit a trajectory, all object positions in time must be known

w.r.t. a reference point in the background of the scene. This is

done by adding the previously obtained apparent background

translation to the calculated object position for each frame:

.

To obtain all object positions with subpixel precision, a robust

fit to the measured object positions is performed. Assumingconstant motion, all object positions can be described by a refer-

ence object position and a translation . Both the reference

object position and the translation of the object are estimated by

minimizing the following cost function:

(11)

where denotes the Euclidean distance in LR pixels between

the measured object position and the estimated object position

at frame

(12)

The cost function in (11) is known as the Gaussian norm [22].

This norm is robust to outliers (e.g., false detections in our case).

The smoothing parameter is set to 0.5 LR pixel. Minimizing

the cost function in (11) with the LevenbergMarquardt algo-

rithm results in an accurate subpixel precise registration of the

moving object. If, e.g., 50 frames are used, the regis-

tration precision is improved by a factor 7.

D. Computational Complexity

The computational complexity is dominated by calculating(3), i.e., computing the SR reconstruction of the HR foreground.

At every iteration of the LM optimization procedure, the cost

function has to be calculated for variations in the estimated pa-

rameters to estimate the gradient w.r.t. the parameters to be

solved. The cost function has to be evaluated

times, with the number of HR foreground intensities, the

number of vertices and # the number of LM iterations. A recon-

struction as described in Section IV-B ( , ,

) using Matlab code took 37 min on a Pentium-4, 3.2

GHz processor under Windows.

The processing time can be drastically reduced if a precom-

putation of the partial derivatives of the cost function w.r.t. theHR foreground intensities is performed off-line and stored. In

this case, the computational complexity reduces to .

Note that typically thereby forecasting a reduction in

the computation time by one order of magnitude.

(9)


7/12


IV. EXPERIMENTS ON SIMULATED DATA

The proposed SR reconstruction method for small moving

objects is first applied to simulated data to study the behavior

under controlled conditions. In a series of experiments, we tune

the regularization parameters and the number of iterations. Then

we study the convergence, the robustness in the presence of

clutter and noise, and the robustness against violations of theunderlying linear motion model.

A. Generating the Simulated Car Sequence

The simulated car sequence was generated to resemble the

real-world sequence of the next section as good as possible.

We simulated an under-sampled image sequence containing a

small moving car using the camera model as depicted in Fig. 1.

The parameters of the camera model were chosen to match the

sensor properties of the real-world system, i.e., optical blurring

(Gaussian kernel with standard deviation LR pixel)

and sensor blurring (rectangular uniform filter with a 100% fill-

factor) and Gaussian distributed noise to resemble the actualnoise conditions (see below). The car follows a linear motion

trajectory with zero acceleration. It consists of two internal in-

tensities, which are both above the median background inten-

sity. The low object intensity is exactly in between the me-

dian background intensity and the high object intensity. The

boundary of the car is modeled by a polygon with seven vertices.

Fig. 7(a) shows a HR image of the simulated car, which serves as

a ground-truth for all SR reconstruction results. Fig. 7(b) and (c)

show two LR image frames in which the car covers approxi-

mately 6 pixels. All 6 pixels are so called mixed pixels and

contain contributions of the fore- and background.

The image quality is further quantified by the signal-to-noise

ration (SNR) and the signal-to-clutter ratio (SCR). The SNR isa measure for the contrast between the object and the time-aver-

aged local background compared to stochastic variations called

noise. The SNR is defined as

(13)

with the number of frames, the mean foreground in-

tensity in frame and the mean local background inten-

sity inframe . iscalculated by takingthe mean intensity

of LR pixels that contain at least 50% foreground and is

defined by the mean intensity of all 100% background pixels ina small neighborhood around the object.

The SCR is a measure for the contrast between the object and

the time-averaged local background compared to the variation

in the local background. The SCR is defined as

(14)

with the standard deviation of the local background in

frame . In the LR domain, the SNR is 29 dB and the SCR is 14

dB. These are realistic values and derived from the real-world

image sequence of the next section.

In the next subsections, different experiments on the sim-ulated data are performed. For all experiments 50 LR frames

Fig. 6. NMSE between the SR result and the ground truth as a function of theregularization parameters and . Here both parameters are kept constant

throughout all iterations in step 1 and step 2.

are used to estimate the HR foreground and 85 LR frames are

used to estimate the HR background. In all used reconstruction

methods, the zoom factor is set to 4 and the camera parameters

are the same as in generating the simulated data.

B. Test 1: Tuning the Algorithm

Our algorithm contains several parameters such as the camera

parameters, the regularization parameters, and a stopping cri-

terion. Although the camera parameters such as the PSF and

fill-factor can be estimated rather well, the regularization pa-rameters and are far more difficult to tune. To study the

influence of the regularization parameters on the final result and

select the parameters for later use, a few experiments are per-

formed on 50 LR frames of the simulated car sequence.

In this experiment, we study the influence of the regular-

ization parameters and on the SR result for the sim-

ulated car sequence with a SNR of 29 dB and a SCR of 14

dB. Note that both regularization parameters are kept constant

during both steps of the optimization procedure. We use the nor-

malized mean squared error (NMSE) between the SR result of

the car and its ground truth as a

figure-of-merit. Note that this measure considers only the fore-ground intensities, the background intensities are set to zero

(15)

with the number of HR pixels, the estimated foreground

contributions using SR and the ground truth. Normalization

is done with the squared maximum value of .

From the result in Fig. 6 it can be seen that has by far the

largest influence on the NMSE. Therefore it is recommended to

set to . The value for is not critical and set to .

In a broad range around these values, more than three to five

iterations in step 1 did not change the final result. After 10 to15 iterations in step 2 the solution converged. Hence, we set the


8/12


Fig. 7. Four times SR reconstruction of a simulated under-sampled image sequence containing a small moving car. (a) HR image representing the scene serving

as ground truth; (b), (c) two typical LR frames (5 4 pixels) of the moving car; (d) 4 SR by a robust state-of-the-art method [18]; and (e) 4 SR by the proposedmethod.

maximum number of iterations in step 1 to five and in step 2 to

15.

C. Test 2: Comparison With a State-of-the-Art Pixel-Based

Technique

To assess the value of the proposed algorithm we compare it

with the visually best result obtained by a robust state-of-the-art

pixel-based SR technique [18]. Note that the registration is per-

formed by the trajectory fitting technique of this paper (to 85

LR frames) to put both methods on equal footing. The state-of-the-art pixel-based SR result is shown in Fig. 7(d) and bears very

little resemblance to the ground truth. This is no surprise since

the partial area effect at the boundary of the objectwhich af-

fects all object pixelsis not accounted for.

Using the optimal regularization parameters in both steps:

, we performed a SR reconstruction with

the proposed method to exactly the same LR image sequence.

The result is depicted in Fig. 7(e) and shows a very good resem-

blance to the ground truth. Subtle changes along the boundary

and along the intensity transition are caused by partial area ef-

fects due to the random placement of the reconstructed object

w.r.t. the HR grid. The object boundary is approximated with 8

vertices, which is one more than used for constructing the data,

so the boundary is slightly over-fitted. Comparing the results in

Fig. 7(d) and (e) shows that the result of our proposed method

is clearly superior to the pixel-based method of Pham [18].

D. Test 3: Robustness in the Presence of Clutter and Noise

To investigate the robustness of our method under different

conditions, we varied 1) the clutter amplitude of the local back-

ground and 2) the noise level of the simulated car sequence de-

scribed in Section IV-A. The clutter of the background is varied

by multiplying the background with a certain factor after sub-

tracting the median intensity. Afterwards the median intensity

is added again to return to the original median intensity. Theobject intensities as well as the size and shape of the car remain

the same. All parameters that are used for the reconstruction are

set to the same values as in test 2 in Section IV-C.

The quality of the different SR results is expressed by the

NMSE w.r.t. the ground truth as before. Fig. 8 depicts the

NMSE as a function of SNR and SCR. We divided the results in

three different categories: good , medium

and bad . For

each region a typical SR result is displayed to give a visual

impression of the performance. It is clear that the SR result in

the good region, obtained for values of the SNR and SCRthat occur in practice, bears a good resemblance to the ground

truth. Note that the visible background in these pictures is not

used to calculate the NMSE. Fig. 8 shows that the performance

decreases for a decreasing SNR. Furthermore, the boundary

between the good and medium region indicates a decrease

in performance under high clutter conditions .

E. Test 4: Robustness Against Variations in Motion

The proposed method assumes that the object moves with a

constant speed and appears in all frames to be used for recon-

struction with the same aspect angle. To demonstrate the robust-

ness of our method to violations on these assumptions, two ex-periments are performed. The first experiment shall determine

the robustness w.r.t. an acceleration of the object. The second

experiment shall establish the robustness w.r.t. scaling of the ob-

ject. We modified the simulated car sequence of Section IV-A.

In the first experiment an acceleration , expressed in LR

, is added and contributes to the object position by

, with the frame number. In the second experiment,

a scale factor, defined as the vehicle size last frame/vehicle size

first frame, is added. A scale factor of 0.8 indicates that the ob-

served length of the car varies from 3 LR pixel in the first frame

to 2.4 LR pixel in the last frame.

The NMSE as a function of acceleration and scaling is de-

picted in Fig. 9. Fig. 9(a) shows that a larger acceleration causesa larger error. An acceptable decrease of the is


9/12


Fig. 8. NMSE for the SR results of the simulated car sequence as a function of the SNR and SCR. We have roughly divided the space in three categories: good,medium, bad and provided a typical SR result for each category.

Fig. 9. NMSE for the SR results of the simulated car sequence as a function of (a) acceleration and (b) object scaling.

Fig. 10. Top view of the acquisition geometry to capture the real-world data.

obtained for accelerations smaller than 0.001 LR .

The error of a constant velocity model fitted to a constant ac-

celeration motion will follow a parabolic model. This parabola

will be symmetric, and has a top to end point difference of

. From the mid-point between its top and an

end point we get a maximum error of , with

and this gives a maximum translational

error of 0.16 LR pixel.

For the second experiment Fig. 9(b) shows that a maximum

scaling of 15% is allowed with an acceptable performance loss.

This is a 7.5% maximum scale change from a mean scale. For a

3 pixel size object this translates to a maximum pixel shift error

of LR pixel for both the front and back object edges compared to its center of mass position.

Note that both experiments have well-comparable maximum

position errors of 0.16 and 0.11 LR pixel, rather consistent with

the requirement that the registration error for SR should at least

be smaller than half the HR pixel pitch. This can easily be de-

duced from the argument below. Critical sampling of bandlim-

ited signals can be modeled by a Gaussian low-pass filter fol-

lowed by sampling with a sampling pitch of 1.1 times the stan-

dard deviation of the Gaussian PSF [23]. In [21], we showed thatGaussian noise in the LR image sequence leads to Gaussian dis-

tributed registration estimates. These registration errors act as an

additional blur, even for sequences of infinite length [24]. If the

standard deviation of this registration-error induced image blur

is substantially (say two times) smaller than the optical image

blur it will not affect the image quality after SR.

V. EXPERIMENT ON REAL-WORLD DATA

To demonstrate the potential of the proposed method under

realistic conditions we applied it to a real-world image se-

quence. Real-world data permits us to study the impact of

changes in object intensities caused by variations in reflection,lens aberrations, small changes in aspect angle of the object


10/12


Fig. 11. Four times SR reconstruction of a vehicle captured by an infrared camera (50 frames) at a large distance: (a) and (c) show the LR captured data; (b) and(d) show the SR reconstruction result obtained by the proposed method. (a) LR reference frame (64 64 pixels); (b) SR with zoom factor 4; (c) close-up of movingobject in (a); and (d) Close-up of moving object in (b).

along the trajectory, and practical violations of the linear motion

assumption.

The data for this experiment is captured with an infrared

camera (1 T from Amber Radiance). The sensor is composed

of an indium antimonide (InSb) detector with 256 256 pixels

in the 35 wavelength band. Furthermore, we use optics

with a focal length of 50 mm and a viewing angle of 11.2 (also

from Amber Radiance). We captured a vehicle (Jeep Wrangler)

at 15 frames/second, driving with a continuous velocity ( 1pixel/frame apparent velocity) approximately perpendicular to

the optical axis of the camera. A top view of the acquisition

geometry is depicted in Fig. 10. During image capture, the

platform of the camera was gently shaken to provide subpixel

motion of the camera. Panning was used to keep the moving

vehicle within the field of view of the camera.

We selected the distance such that the vehicle appeared small

(covering appr. 5 2 LR pixels in area) in the image plane.

Fig. 11(a) shows a typical LR frame (64 64 pixels). A close-up

of the vehicle is depicted in Fig. 11(c). The vehicle is driving

from left to right at a distance of approximately 1150 meters.

The SNR of the vehicle with the background is 30 dB and theSCR is 13 dB. In the simulation experiments, we have shown

that for these values our method is capable of delivering good

reconstruction performance. Fig. 11(b) shows the result after

applying our SR reconstruction method with a close-up of the

car in Fig. 11(d).

The HR background is reconstructed from 85 frames with

zoom factor 4. The camera blur is modeled by Gaussian optical

blurring , followed by uniform rectangular sensor

blurring (100% fill-factor). The HR foreground is reconstructed

from 50 frames with zoom factor 4 with the same camera pa-rameters. The object boundary is approximated with 12 vertices

and during the reconstruction the following settings are used:

, in both step 1 and 2.

Note that much more detail is visible in the SR result than in

the LR image. The shape of the vehicle is very well pronounced

and the hot engine of the vehicle is well visible. For comparison

we display in Fig. 12 the SR result next to a captured image of

the vehicle at a 4 shorter distance. Please be aware that the

intensity mapping is not the same for both images. So a grey

level in Fig. 12(a) may not be compared with the same grey

level in Fig. 12(b). Notice that Fig. 12(b) was captured at a later

time. Differences in environmental conditions (position of thesun, clouds, etc.), heating of the engine and vehicle as well as


11/12


Fig. 12. SR result with zoom factor 4 of a jeep in (a) compared with the same jeep captured at a 4 shorter distance (b). (a) 4 SR result. (b) Object 4 closerto camera.

the pose of the vehicle contribute to the observed differences be-

tween the two images. The shape of the vehicle is reconstructed

very well and the hot engine is located at a similar place.

VI. CONCLUSION

This paper presents a method for SR reconstruction of

small moving objects. The method explicitly models the fore-

and background contribution to the partial area effect of the

boundary pixels. The main novelty of the proposed SR recon-

struction method is the use of a combined object boundary

and intensity description of the target object. This enables us

to simultaneously estimate the object boundary with subpixel

precision and the foreground intensities from the boundary

pixels subject to a modified total variation constraint. This

modification permits the use of the LevenbergMarquardt

algorithm for optimizing the cost function. This method is

known to converge to the global optimum for a well behavedcost function and an initial estimate not too far away.

The proposed multiframe SR reconstruction method clearly

improves the visual recognition of small moving objects under

realistic imaging conditions in terms of SNR and SCR. We

showed that our method performs well in reconstructing a

small moving object where a state-of-the-art pixel-based SR

reconstruction method [18] fails. The robustness against de-

teriorations such as clutter and noise as well as violations of

the linear motion model was established. Our method not only

performs well on simulated data, but also provides an excellent

result on a real-world image sequence captured with an infrared

camera.

REFERENCES

[1] R. Y. Tsai and T. S. Huang, Multiframe image restoration and reg-istration, in Advances in Computer Vision and Image Proscessing.Greenwich, CT: JAI Press, 1984, vol. 1, pp. 317339.

[2] M. Ben-Ezra, A. Zomet, and S. K. Nayar, Video super-resolutionusing controlled subpixel detector shifts, IEEE Trans. Pattern Anal.

Mach. Intell. , vol. 27, no. 6, pp. 977987, Jun. 2005.[3] A. W. M. van Eekeren, K. Schutte, J. Dijk, D. J. J. de Lange, and L.

J. van Vliet, Super-resolution on moving objects and background,in Proc. IEEE 13th Int. Conf. Image Process., 2006, vol. 1, pp.27092712.

[4] A. W. M. van Eekeren, K. Schutte, and L. J. van Vliet, Super-reso-lution on small moving objects, in Proc. IEEE 15th Int. Conf. ImageProcess., 2008, vol. 1, pp. 12481251.

[5] P. E. Eren, M. I. Sezan, and A. M. Tekalp, Robust, object-based highresolution image reconstruction from low-resolution video, IEEETrans. Image Process., vol. 6, no. 10, pp. 14461451, Oct. 1997.

[6] S. Farsiu, M. Elad,and P. Milanfar,Video-to-videodynamicsuperres-olution for grayscaleand color sequences,J. Appl. Signal Process., pp.115, 2006.

[7] R. C. Hardie, T. R. Tuinstra, J. Bognart, K. J. Barnard, and E. E. Arm-strong, High resolution image reconstruction from digital video withglobal and non-global scene motion, in Proc. IEEE 4th Int. Conf.

Image Process., 1997, vol. 1, pp. 153156.[8] F. W. Wheeler and A. J. Hoogs, Moving vehicle registration and

super-resolution, in Proc. IEEE Appl. Imagery Pattern Recognit.Workshop, 2007, pp. 101107.

[9] M. Irani and S. Peleg, Improving resolution by image registration,Graph. Models Image Process., vol. 53, pp. 231239, 1991.

[10] A. J. Patti, M. I. Sezan, and A. M. Tekalp, Superresolution videoreconstruction with arbitrary sampling lattices and nonzero aperturetime, IEEE Trans. Image Process., vol. 6, no. 8, pp. 10641076, Aug.1997.

[11] R. J. M. den Hollander, D. J. J. de Lange, and K. Schutte, Super-resolutionof facesusing the epipolar constraint, in Proc. British Mach.Vis. Conf., 2007, pp. 110.

[12] J. Wu, M. Trivedi, and B. Rao, High frequency component compensa-tion based super-resolution algorithm for face video enhancement, inProc. IEEE 17th Int. Conf. PatternRecognit., 2004, vol. 3, pp.598601.

[13] J. Starck, E. Pantin, and F. Murtagh, Deconvolution in astronomy: Areview, Pub. Astron. Soc. Pacific, no. 114, pp. 10511069, 2002.

[14] K. Schutte, D. J. J. de Lange, and S. P. van den Broek, Signal con-ditioning algorithms for enhanced tactical sensor imagery, in Proc.SPIE: Infrared Imag. Syst.: Design, Anal., Model. and Testing XIV,2003, vol. 5076, pp. 92100.

[15] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, Fast and robustmulti-frame super resolution,IEEE Trans. Image Process., vol. 13, no.10, pp. 13271344, Oct. 2004.

[16] J. J. Mor, The LevenbergMarquardt Algorithm: Implementation andTheory. New York: Springer-Verlag, 1978, vol. 630, pp. 105116.

[17] J. Dijk, A. W. M. van Eekeren, K. Schutte, D. J. J. de Lange, and L.J. van Vliet, Super-resolution reconstruction for moving point targetdetection, Opt. Eng., vol. 47, no. 8, 2008.

[18] T. Q. Pham, L. J. van Vliet, and K. Schutte, Robust fusion of irreg-ularly sampled data using adaptive normalized convolution, J. Appl.

Signal Process., vol. 2006, pp. 112, 2006.[19] A. Zomet, A. Rav-Acha, and S. Peleg, Robust super-resolution, inProc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2001, vol. 1, pp.645650.

[20] G. W. Zack, W. E. Rogers, and S. A. Latt, Automatic measurement ofsister chromatid exchange frequency, J. Histochem. Cytochem., vol.25, no. 7, pp. 741753, 1977.

[21] T. Q. Pham, M. Bezuijen, L. J. van Vliet, K. Schutte, and C. L. L.Hendriks, Performance of optimal registration estimators, in Proc.Vis. Inf. Process. XIV, 2005, vol. 5817, pp. 133144.

[22] J. van de Weijer and R. van den Boomgaard, Least squares and robustestimation of local image structure, Int. J. Comput. Vis., vol. 64, no.23, pp. 143155, 2005.

[23] P. Verbeek and L. van Vliet, On the location error of curved edges inlow-passfiltered 2-d and 3-d images,IEEE Trans. Pattern Anal. Mach.

Intell., vol. 16, no. 7, pp. 726733, Jul. 1994.[24] T. Q. Pham, L. J. van Vliet, and K. Schutte, Influence of signal-to-

noise ratio and point spread function on limits of super-resolution,in Proc. Image Process.: Algorithms Syst. IV, 2005, vol. 5672, pp.169180, SPIE.


12/12


Adam W. M. van Eekeren (S00M02) receivedthe M.Sc. degree from the Department of ElectricalEngineering, Eindhoven University of Technology,The Netherlands, in 2002, and the Ph.D. degree fromthe Electro-Optics Group within TNO Defence,Security, and Safety, The Hague, in collaborationwith the Quantitative Imaging Group at the DelftUniversity of Technology, The Netherlands, in 2009.

He did his graduation project within PhilipsMedical Systems on the topic of image enhancementusing morphological operators. Subsequently, he

worked for one year at the Philips Research Laboratory on image segmentationusing level sets. He worked as a Research Scientist at the Electro-Optics Group,TNO Defence, Security, and Safety, where he works on image improvement,change detection, and 3-D reconstruction. His research interests include imagerestoration, super-resolution, image quality assessment, and object detection.

Klamer Schutte received the M.Sc. degree inphysics from the University of Amsterdam in 1989and the Ph.D. degree from the University of Twente,Enschede, The Netherlands, in 1994.

He had a Post-Doctoral position with the DelftUniversity of Technologys Pattern Recognition(now Quantitative Imaging) group. Since 1996, hehas been employed by TNO, currently as SeniorResearch Scientist Electro-Optics within the Busi-ness Unit Observation Systems. Within TNO he hasactively lead multiple projects in areas of Signal and

Image Processing. Recently, he has led many projects include super resolutionreconstruction for both international industries and governments, resulting insuper resolution reconstruction based products in active service. His researchinterests include pattern recognition, sensor fusion, image analysis, and imagerestoration. He is Secretary of the NVBHPV, The Netherlands branch of theIAPR.

Lucas J. van Vliet (M02) studied applied physicsand received the Ph.D. degree (cum laude) from theDelft University of Technology, Delft, The Nether-lands, in 1993.

He was appointed Full Professor in multidimen-sional image analysis in 1999. Since 2009, he hasbeen Director of the Delft Health Initiative, headof the Quantitative Imaging Group and chairman of

the Department Imaging Science & Technology. Hewas president (20032009) of the Dutch Society forPattern Recognition and Image Analysis (NVPHBV)

and sits on the board of the International Association for Pattern Recognition(IAPR) and the Dutch graduate school on Computing and Imaging (ASCI).He supervised 25 Ph.D. theses and is currently supervising 10 Ph.D. students.He was visiting scientist at Lawrence Livermore National Laboratories (1987),University of California San Francisco (1988), Monash University Melbourne(1996), and Lawrence Berkeley National Laboratories (1996). He has a trackrecord on fundamental as well as applied research in the field of multidimen-sional image processing, image analysis, and image recognition; (co)author of200 papers and four patents.

Prof. van Vliet was awarded the prestigious talent research fellowship of theRoyal Netherlands Academy of Arts and Sciences (KNAW) in 1996.

Documents

moving vehicle registration