VIDEO COLOUR VARIATION DETECTION AND MOTION MAGNIFICATION ...830532/FULLTEXT01.pdf · AND MOTION MAGNIFICATION TO OBSERVE SUBTLE CHANGES ... 12 Block diagram of thesis work ... culation

VIDEO COLOURVARIATION DETECTION

AND MOTIONMAGNIFICATION TO

OBSERVE SUBTLECHANGES

Ayyaz RasoolHusnain Javaid

Taimour Khan BabarRaja Umer Saghir

This thesis is presented as part of Degree ofMaster of Science in Electrical Engineering

Blekinge Institute of TechnologyOctober, 2013

Blekinge Institute of Technology

School of Engineering

Department of Electrical Engineering

Supervisor: Muhammad Shahid

Examiner: Dr. Benny Lovstrom

1

Abstract

Our thesis work is based on revealing minor informative variations in avideo which are hard to perceive, that can be further exaggerated to extracthidden variations of color and motions in a video. In our thesis we applydifferent techniques of a video decomposition like Laplacian, Steerable andGaussian pyramids to observe the improvement in performance of the videos.We start with a standard input video to decompose it in different spatial poolof frequencies, the temporal filtering process is applied to the frames to ex-tract hidden signals. The resultant signals from the temporal processing arethen amplified by a given factor to reveal hidden information in the videos.These amplified signals are added back to the original signals and then apyramid is collapsed to generate an output video. Performance of Gaussianand Steerable pyramids for video decomposition is evaluated over Eulerianmotion magnification. The output videos from all pyramids decomposition iscomputationally analyzed and compared with each other through SSIM andPSNR graphs. The video processing time is used to compare decompositionmethods. It is observed that Eulerian motion magnification with Steerablepyramid decomposition has potential of revealing hidden motions more thanLaplacian and Gaussian pyramids, precisely in monitoring and diagnostic ap-plications. Steerable pyramid decomposition method performs better thanthe other methods when input video is noisy.

Keywords: Gaussian Pyramid, Laplacian Pyramid, Motion Magnifica-tion, Steerable Pyramid, Video Magnification

Acknowledgement

First of all we would like to thank the Almighty Allah, who gave us strengthto complete this work. It is with immense gratitude that we acknowledge thesupport and help of our advisor Mr. Muhammad Shahid for his patience,motivation, enthusiasm and immense knowledge. He continually conveyeda spirit of adventure and excitement in regard to research and writing thisthesis. We could not have imagined having a better advisor and mentor forour masters thesis. Without his guidance and persistent help we could nothave accomplished our research goals. We are indebted to our examiner Dr.Benny Lovstrom for guiding us in working out issues related with our thesis.Last but not the least; we would like to thank our parents and ours friendsfor their inspiring and motivating.

Taimour, Ayyaz, Husnain, and Raja UmerKarlskrona, Sweden

October 2013

Contents

1 Introduction 91.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . 91.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Research Methodology . . . . . . . . . . . . . . . . . . . . . . 111.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Video Motion Magnification 122.1 Video Magnification Techniques . . . . . . . . . . . . . . . . . 122.2 Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Edge Detection (ED) . . . . . . . . . . . . . . . . . . . 132.3.2 Bit Plane Matching (BPM) . . . . . . . . . . . . . . . 132.3.3 Absolute Difference Summation (ADS) . . . . . . . . . 132.3.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Optical Flow (OF) . . . . . . . . . . . . . . . . . . . . . . . . 142.4.1 Differential Based Methods (DBM) . . . . . . . . . . . 142.4.2 Frequency Based Methods (FBM) . . . . . . . . . . . . 152.4.3 Correlation Based Methods (CBM) . . . . . . . . . . . 152.4.4 Multiple Motion Methods (MMM) . . . . . . . . . . . 152.4.5 Temporal Refinement Methods (TRM) . . . . . . . . . 162.4.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Spatio-Temporal Processing . . . . . . . . . . . . . . . . . . . 162.6 Eulerian Motion Magnification . . . . . . . . . . . . . . . . . . 16

2.6.1 Spatial Processing . . . . . . . . . . . . . . . . . . . . 162.6.2 Relation between Temporal Filtering and Magnification 172.6.3 Amplification Factor . . . . . . . . . . . . . . . . . . . 182.6.4 Bounds For Amplification . . . . . . . . . . . . . . . . 182.6.5 Temporal filters . . . . . . . . . . . . . . . . . . . . . . 182.6.6 Butterworth Filter . . . . . . . . . . . . . . . . . . . . 182.6.7 IIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . 192.6.8 Pass band Frequencies . . . . . . . . . . . . . . . . . . 192.6.9 Video Reconstruction . . . . . . . . . . . . . . . . . . . 19

3 Image Decomposition 203.1 Image Pyramids . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2 Low pass Pyramid . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Band Pass Pyramid . . . . . . . . . . . . . . . . . . . . . . . . 213.4 Gaussian Pyramid . . . . . . . . . . . . . . . . . . . . . . . . 21

5

3.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . 213.5 Laplacian Pyramid . . . . . . . . . . . . . . . . . . . . . . . . 25

3.5.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . 253.6 Steerable Pyramid . . . . . . . . . . . . . . . . . . . . . . . . 26

3.6.1 System diagram . . . . . . . . . . . . . . . . . . . . . . 27

4 System Frame Work 304.1 Input videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 Gaussian Pyramid . . . . . . . . . . . . . . . . . . . . 314.2.2 Steerable pyramid . . . . . . . . . . . . . . . . . . . . . 32

4.3 Temporal Processing . . . . . . . . . . . . . . . . . . . . . . . 324.4 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Amplification . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.6 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Experimental Results 375.1 Computational Analysis . . . . . . . . . . . . . . . . . . . . . 375.2 Computational Complexity with Laplacian Pyramid based

Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 385.3 Computational Complexity with Gaussian Pyramid Based

Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 395.4 Computational Complexity with Steerable Pyramid Based

Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 405.5 Image Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.5.1 PSNR . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.5.2 Structural Similarity Index . . . . . . . . . . . . . . . 48

6 Conclusion and Future Work Suggestions 556.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6

List of Figures

1 Video Magnification . . . . . . . . . . . . . . . . . . . . . . . 122 Eulerian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Convolved target pattern . . . . . . . . . . . . . . . . . . . . 214 Reduced scale copies of the Target pattern . . . . . . . . . . . 225 Gaussian Pyramid [26] . . . . . . . . . . . . . . . . . . . . . . 226 Level of the Gaussian pyramid expanded to size of original

image [26] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Equivalent weighting function [26] . . . . . . . . . . . . . . . . 248 Laplacian Pyramid . . . . . . . . . . . . . . . . . . . . . . . . 259 Pyramid levels expended to the size of the original image [24] 2610 Frequency domain structure of Steerable Pyramid [7] . . . . . 2711 Decomposition based on Steerable pyramid [7] . . . . . . . . . 2812 Block diagram of thesis work . . . . . . . . . . . . . . . . . . 3113 Gaussian Block Diagram . . . . . . . . . . . . . . . . . . . . . 3214 Steerable Block Diagram . . . . . . . . . . . . . . . . . . . . . 3215 Filters used for extracting signal . . . . . . . . . . . . . . . . . 3316 Videos used for experiment . . . . . . . . . . . . . . . . . . . . 3417 Block diagram for Reconstruction . . . . . . . . . . . . . . . . 3618 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4219 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4320 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4421 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4422 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4523 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4624 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4625 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4726 PSNR of Steerable, Gaussian and Laplacian Decompositions . 4727 SSIM of Steerable, Gaussian and Laplacian Decompositions . 4928 SSIM of Steerable, Gaussian and Laplacian Decompositions . 4929 SSIM of Steerable, Gaussian and Laplacian Decompositions . 5030 SSIM of Steerable, Gaussian and Laplacian Decompositions . 5031 SSIM of Steerable, Gaussian and Laplacian Decompositions . 5132 SSIM of Steerable, Gaussian and Laplacian Decompositions . 5133 SSIM of Steerable, Gaussian and Laplacian Decompositions . 5234 SSIM of Steerable, Gaussian and Laplacian Decompositions . 5235 SSIM of Steerable, Gaussian and Laplacian Decompositions . 53

7

List of Tables

1 Parameters used for videos . . . . . . . . . . . . . . . . . . . . 352 Laplacian Computational Results . . . . . . . . . . . . . . . . 383 Gaussian Computational Results . . . . . . . . . . . . . . . . 394 Steerable Computational Results . . . . . . . . . . . . . . . . 405 SSIM Values For Complete Video Sequence . . . . . . . . . . . 54

8

Chapter 1

1 Introduction

1.1 Background

Over the past few decades the field of image processing has gained a lotof importance. Today it has applications in various fields such as satellitecommunication, experimental physics, chemistry, agricultural and medicalsciences. Over the past few years the use of image processing has increased inthe field of medical sciences. It has aided the scientists to develop equipmentthat can extract minute details of substances and assists them to identifythe problem precisely. Few examples of such equipments are X-ray, MRI,Electrocardiography, CT scanners, and Ultrasound machines.

As stated above, there are various image processing applications in med-ical sciences that have helped us to cure various diseases that were difficultto cure before. Patients that have problem with low vision typically havereduced visual acuity and a significant loss of contrast sensitivity, often incombination with visual field loss. Due to these impairments the patientsface a number of disabilities like difficulty in reading, writing, recognizingfaces, watching television, mobility and completing other activities in theirdaily lives. To deal with such problems Low Vision Aid (LVA) devices areused, which use the process of image/video magnification to enhance contrastand vision.

The human eye has a limited spatio-temporal sensitivity due to whichit is unable to detect signals that fall below this level and may have highlyuseful information. Some examples motion with low spatial amplitude likean infant breathing or skin color variation due to the blood circulation in thebody. These minute variations can help us to study important aspects of theworld around us.

1.2 Motivation and Objectives

A standard video can have small amplitude and hidden signals which cannotbe observed with the naked eye because of limited sensitivity. These hiddensignals variation can be very informative [1] [2]. This can be exaggerated toextract hidden information [4].

Previously, motion detection techniques have been developed to amplifycolor with low amplitude such as human skin color varies due to blood cir-culation which is invisible to the naked eye. By successful implementation of

9

these tools we can reveal small or invisible signals in a video [1].Some efforts have been previously made to amplify and unveil subtle mo-

tions that are invisible otherwise, using a cartoon animated filter to exagger-ate motion. However, they use the Lagrangian approach, which is expensivein terms of computation and difficult to make it artifact free on boundariesfor complicated motions [2].

A motion segmentation can be used to produce good quality synthesis butthis increases the complexity of an algorithm [1]. The Eulerian prospectiveis to use properties of a voxel of fluids. Same differential approximationthen forms a basis of optical flow algorithm. Temporal processes have beenpreviously used to extract invisible signals. Eulerian motion magnificationhas the potential of being used as a monitoring and diagnostic applicationin medicine. It observes the hidden motions which are otherwise invisibleotherwise to the naked eye [9].

Objectives of this thesis work are:

• To produce better results of motion magnification.

• To reduce noise and produce better quality output video.

• To magnify video using Gaussian and Steerable pyramid decomposi-tions.

• To preserve video smoothness.

• To analyze and compare output videos.

1.3 Research Questions

Following research questions are considered in this thesis work:

• How to implement video magnification using the Gaussian and theSteerable pyramid decompositions of the videos to compare the resultswith Eulerian motion magnification?

• What is the impact on a performance if the Gaussian and the Steerablepyramids are used for video decomposition?

• What is the impact on a video if different video formats and noiseenvironment are subjected to the Eulerian process using the Gaussianand the Steerable pyramids for a video decomposition?

10

1.4 Research Methodology

Our research in this thesis is done by studying different video decomposi-tion and reconstruction methods . Eulerian video color amplification andmotion magnification use the Laplacian pyramids to decompose a video. Weimplement in the Matlab Gaussian and Steerable pyramids in the same en-vironment and observe improvements in video color variation and motionmagnification. At the end we compare them with the Laplacian pyramid byapplying a computational analysis.

1.5 Thesis Structure

The present thesis report is divided into six chapters:

• Chapter 1: This chapter contains an introduction and a background.It also contains research questions and research methodology which aredevised for our thesis.

• Chapter 2: Basic concepts of video magnification are discussed withproviding the examples.

• Chapter 3: Different methods of video decomposition and related ex-amples are discussed.

• Chapter 4: This chapter deals with the system model and its working.

• Chapter 5: Experimental results of a proposed model and previousworks are mentioned in this chapter.

• Chapter 6: This chapter concludes the thesis and gives an insight offuture works.

11

Chapter 2

2 Video Motion Magnification

Some video processing applications deal with different task oriented videomanipulations. Motion estimation is one of them, it is used to detect andtrack motion of a moving object in a video. Motion detection is useful forvideo coding. A dynamic motion detector attached to an encoder can helpan encoder to analyze whether a current block has motion or not [10]. Some-times this motion is so minute that it is not observable by the naked humaneye. This subtle motion can carry useful information. To avail informationcontained in this minute and negligible motion it is necessary to magnify thismotion. Video motion magnification is a useful tool to determine changes ina video which may not be noticeable under normal circumstances.

Figure 1: Video Magnification

2.1 Video Magnification Techniques

Video magnification techniques extract motion information from the videowhich needs to be magnified and after extraction they amplify or exaggeratethis motion. Some techniques track a motion in a video and some do notneed to track it.

2.2 Motion Detection

Two most common techniques used for motion detection are:

12

• Optical Flow.

• Spatio-Temporal Processing.

2.3 Motion Estimation

Motion estimation is a transformation between two different images or frames,mostly adjacent frames of a video. Motion vectors are determined to analyzethis transformation. Motion vectors can be global motion vectors if they arerelated to the whole image or they can be local if they relate to a macro block(MB), sub macro block (SMB) or a pixel. A local motion can be estimatedby calculating a motion vector of MB’s. To reduce complexity, only specificblocks are identified by using an efficient search centre adaptive motion esti-mation [11]. A multiplication free 1-bit transform can also be used in a lowcomplexity MB motion estimation which provides a better performance interms of computational load. In this case, the motion accuracy is slightlylower in terms of peak signal to noise ratio (PSNR). But it is regarded as agood option for devices where low complexity translation and low power isrequired [12]. Edge detection in spatio-temporal images [13], Variable BlockSize Motion Estimation (VBSME) based on reconfigurable bit matching canbe used to accurately estimate a motion [14]. An absolute difference sum-mation [15] or a motion correction based on median can be used to calculatea global vector.

2.3.1 Edge Detection (ED)

Edge detection of an object in an image is done on the basis of neighboringpixels. Pixel intensity values are compared to a decision block and later onit is decided if the pixel is an edge or not. After the edge detection, motionin a video can be easily tracked.

2.3.2 Bit Plane Matching (BPM)

BPM consists of image decomposition into a series of binary images, onefor each bit which is used in representation of pixel intensities. Correlationand minimum correlation measure are used to calculate motion vector in aspecific area.

2.3.3 Absolute Difference Summation (ADS)

ADS are used to calculate a global vector by calculating summation of ab-solute difference to find probability for a local vector. The smallest value

13

indicates a higher probability of the local vector, which makes it a vector ofchoice.

2.3.4 Limitations

Few challenges concerning motion estimation are geometry, lighting and re-flectance. Shapeless objects, fast moving light objects, shadows and noisecause challenges for motion estimation. To resolve these problems, someassumptions like rigid objects, opaque surface and low noise are made.

2.4 Optical Flow (OF)

An optical flow is an apparent motion of objects or a movement of bright-ness patterns in an image. This movement is caused by a relative motionbetween an eye and object in the scene. The concept of an optic flow wasfirst introduced by an American psychologist [16] in order to understand theperception of a objects shape, a distance and a movement of the objects inthe world. This concept was co-opted by image processing for motion detec-tion, luminance and motion compensation. OF is widely deployed for roboticnavigation [18,19]. OF computation can be categorized as:

• Intensity based differential methods.

• Frequency based differential methods.

• Correlation based methods.

• Multiple motion methods.

• Temporal refinement methods.

2.4.1 Differential Based Methods (DBM)

One of the interesting OF computation methods are differential methods.These are referred as differential methods because of Taylor series approxi-mations of image signal. OF calculates a relative motion between consecutiveframes at instant t and t + ∇t . Let I(x, y, t) be an image intensity valuemoved between two frames for which an equation can be given as:

I(x, y, t) = I(x+∇x, y +∇y, t+∇t) (1)

Assuming a movement to be small, with Taylor series expansion it be-comes:

I(x+∇x, y +∇y, t+∇t) = I(x, y, t) +∂I

∂x∇x+

∂I

∂y∇y +

∂I

∂t∇t+ .. (2)

14

From these equations it follows :

∂I

∂x∇x+

∂I

∂y∇y +

∂I

∂t∇t = 0 (3)

Or∂I4x∂x4t

∇x+∂I4y∂y4t

∇y +∂I4t∂t4t

∇t = 0 (4)

From which we derive

∂I

∂xVx +

∂I

∂yVy +

∂I

∂t= 0 (5)

Where Vx and Vy are x and y components of OF. This can further besolved to get

∇IT−→v = −It (6)

This equation cannot be solved as it contains two unknown variables.This is known as an aperture problem of the OF algorithms. All OF methodsneed additional conditions to estimate an actual flow. The motion can onlybe estimated by OF at image locations where there is a sufficient intensitystructure.

2.4.2 Frequency Based Methods (FBM)

FBM uses velocity tuned filters, which are orientation sensitive filters in theFourier domain of time variant images. Motion estimation on image signalscan be done on spatio temporal energy in Fourier space by using motionsensitive techniques. Other approaches include phase based filtering in whichmotion of phase contours tunes filtering.

2.4.3 Correlation Based Methods (CBM)

CBM approach is less sensitive to requirements of significant image features.These approaches define displacement as shift yielding best shift betweencontiguous time varying image regions. These approaches are for computa-tional stereopsis, when there is a minimum shift in viewing angle of image.

2.4.4 Multiple Motion Methods (MMM)

Multiple image motions are a common occurring phenomena. Occlusion andtransparency are important in terms of their occurrences in real life. One ofmany methods developed to handle occlusion is called as binary line processthat explicitly handles intensity discontinuities.

15

2.4.5 Temporal Refinement Methods (TRM)

Most methods for computing OF do not incorporate previous calculations.Given an image sequence, OF is computed for one of the images. To caterthis problem TRMs have been suggested. Advantages of a TRM approachare accuracy improvement and instant access to velocity estimates. Thisapproach is parallel, local and detects occlusion boundaries in an incrementalfashion and can only handle integer motions.

2.4.6 Limitations

Some assumptions are taken to compute OF, for example that brightnessof an observed object is constant over time and a velocity smoothness isconstrained [17]. OF correlates with a projection of the actual motion to theimage plane only under certain assumptions.

2.5 Spatio-Temporal Processing

One of the most promising approaches is spatial temporal processing formotion estimation. Spatio temporal filtering based on continuous wavelettransform (CWT) can be used for motion selectivity [20]. For extractionof invisible signals temporal processing has been previously used. Temporalaliasing of motions in videos has been dampened by temporal filters [4].Eulerian method using spatial temporal filtering can extract small and subtlemotions in a video [9]. In this method a motion is not tracked as it is donein the Lagrangian method.

2.6 Eulerian Motion Magnification

This method is based on the idea that temporal processing can producemagnification. It relies on the Taylor series expansions common in differentialoptical flow analysis. It uses linear approximation [9]. It is described below.

2.6.1 Spatial Processing

Spatial processing is the first step in Eulerian motion magnification as shownin Figure 2. Video sequence is decomposed into different spatial frequencybands using a pyramid structure. These frequency bands may consist ofdifferent spatial frequencies depending on signal to noise ratio (SNR). Lateron artifacts in these bands are suppressed by low pass filtering. Frames ofthe video are later down sampled to increase computational efficiency. Theseselective pixel bands are used in temporal filtering to detect motion.

16

Figure 2: Eulerian Motion

2.6.2 Relation between Temporal Filtering and Magnification

After a video has been spatially processed, it is then subjected to temporalprocessing as shown in Figure 2.Based on Taylor series expansion in equation 1 and 2, relation between tem-poral filtering and motion magnification can be shown. Taylor series expan-sion for 1D images can also be derived and this can be applied for 2D.

I(x, t) ≈ f(x) + δt∂f(x)

∂x(7)

Considering a temporal band pass filter with assumption that motion signalδt is with in the band pass.

B(x, t) = δt∂f(x)

∂x(8)

Amplifying the band pass signal by exaggeration or amplification factor ∝and adding back to image gives

I(x, t) = I(x, t)+ ∝ B(x, t) (9)

By combining equation 7, 8 and 9 we get

I ≈ f(x+ (1+ ∝)δ(t)) (10)

This shows that spatial displacement has been amplified by a factor of(1+ ∝).

17

2.6.3 Amplification Factor

After extracting or selecting the required band of frequencies which needsto be amplified like color or magnified as in case of motion, these bands areexaggerated by a factor alpha. These values can be controlled by the userdepending upon application requirements. Best value or an optimized valuecan be obtained by doing experimentation and observing results.

2.6.4 Bounds For Amplification

Upper bound for amplification factor [9] can be given as

(1+ ∝)δ(t) <λ

8(11)

where λ is image structure spatial wavelength. These boundry conditionshold for a small motion and an amplification factor is fixed for our derivedboundries. Exceeding these bounds may produce severe artifacts in magnifi-cation [9].

2.6.5 Temporal filters

One of important steps for video magnification is temporal filtering. This isdone in the temporal processing block as shown in Figure 2. The choice of anappropriate temporal filter depends on a video type and information neededto be extracted from it. The filter varies with frequencies we are dealing with,in case of color magnification the pass band filter specifications are differentthan in case of small motion magnification. This difference in filter selectionis due to frequency bands. In case of determining a heart beat of a person,while observing blood flow through his face or wrist it is preferable to usenarrow pass band filter to achieve better and more noise free results. Idealpass band filters with a sharp cut of frequencies can serve the task. On theother hand for motion magnification it is recommended to use broad passband filters. Some of the filters that are used are described below.

2.6.6 Butterworth Filter

Butterworth filters are known for their flat frequency response. They haveslower roll off. They are used to convert specified frequency band to a secondorder IIR structure.

18

2.6.7 IIR Filter

In real time it is convenient to use IIR filters both for color and motionexaggeration. Two low order IIR filters can be combined to implement aband pass filter.

2.6.8 Pass band Frequencies

Frequencies for a band pass filter are selected according to application re-quirements and these values can be optimized by experimentation.

2.6.9 Video Reconstruction

The last step as shown in Figure 2 is reconstruction. After different frequencybands which have been pooled and temporally processed, these frequencybands are restructured to form a video in which motion is magnified. Videoreconstruction is done by collapsing a pyramid used to a spatially decomposeimage.

19

Chapter 3

3 Image Decomposition

Digital image processing is a point of interest in many fields. These days anumber of methods are used for enhancement of an image quality and re-moving unwanted pixels or effects. Image decomposition plays a vital role inmaking images and videos more prominent and extracts useful informationto remove noise from corrupted images. Image pyramids offer a flexible, con-venient multiresolution format which makes it convenient to process images.

3.1 Image Pyramids

To extract required information from an image is a complicated process. Forextracting required information from the image, it needs to be passed througha combination of different filters, with different sequences depending uponthe requirements.

The visual sense of a human being is very intuitive. It has the prop-erties of less spatio-temporal sensitivity. A detection of variations on colorand motions at lower spatial amplitudes is difficult and impossible for thehuman sense of vision. An image pyramid is used to reveal minor and subtlechanges in videos. It is used in gathering important data from an image. Anapplication of image pyramids is very helpful to assemble important aspectsand variations in the world around us. Complex mathematical and numericalrepresentation, amplification and decomposition are used to form the imagepyramid.

A pyramid is a technique developed by computer vision, image and signalprocessing fields. It is used for signal representation at multiple scales. Thereare two types of pyramids, one is considered a low pass pyramid and anotherone is said to be a band pass pyramid.

3.2 Low pass Pyramid

In this case, the first step is to make an image smooth. This process is donewith the use of the smoothing filter. The next step is the low pass pyramidand its sub-sampling of the resultant smoothed image from the results of theprevious step. This is mostly done with the factor of double magnitude alongeach coordinates. This process is repeated and a smaller image is obtained.These results in the images are more smoothened as compared to the previousone. In this scenario, sampling density keeps on decreasing after proceeding

20

of each step in each level. Mathematical and graphical representation of allthe processes resembles to a pyramid, thats why it is named as pyramid.

3.3 Band Pass Pyramid

Another type of pyramid is called a band pass pyramid. If we differentiateand form different levels which are adjacent to each other in a pyramid then itis considered as the band pass pyramid representation of image. Although inthis process there is an interpolation involved when we represent the adjacentlevels of resolution in images.

3.4 Gaussian Pyramid

3.4.1 Description

Gaussian pyramid is a technique which is used in the field of image processing[13]. Mostly this technique is used in forming large scale digital imagesmathematically and graphically using different algorithms from small digitalsamples of an image. Structural content of small scale digital images are anadvantage for forming a scale image. This is very popular in the researchfield of computer graphic vision.

There are many ways to approach some specific target patterns whichcould exist in any scale [24]. Two of the approaches are explained in theFigures 3 and 4.

Figure 3: Convolved target pattern

In Figure 3, increasing scale copies of the original image are constructed.Then in the next step each copy is convolved with the original image, but

21

Figure 4: Reduced scale copies of the Target pattern

there is another alternative; a fixed size pattern may be convolved from mul-tiple copies of the original image in Figure 4, with a reduced resolution.There is no difference in the results of both approaches. But still the secondapproach is much more promising. The reason behind it is that at any givenconvolution in the case of a target pattern at an expanded scale, requiredmore arithmetic operations as compared to the image at reduced scale [24].If we consider the reduced scale image representation, it forms several copiesof the original image in a sequence. In this condition the resolution and thedensity of sampling the image keeps on decreasing while passing through allsteps. Figure 5 shows such example.

A very highly efficient algorithm is required for processing of image for-

Figure 5: Gaussian Pyramid [26]

mation. Consider that bottom level of the Pyramid G1 is equivalent to theoriginal image. This is subsampled by a factor of 2 and also low pass filtered

22

to from the next level of the pyramid G2. This process is repeated in orderto get to the next level in a pyramid. This can be represented in terms ofequation from 0 < 1 < N :

G(i, j)∑m

∑n

w(m,n)Gj − 1(2i+m, 2j + n) (12)

orG = REDUCED[Gj − i] (13)

The w(m,n) waiting function named as generating kernel using the small and

Figure 6: Level of the Gaussian pyramid expanded to size of original image[26]

separable w(m,n) to attain the promising efficiency. A five tap filter is used togenerate a pyramid. We can consider the pyramid construction equivalent toconvolving the original image with a set of Gaussian like weighting function.This could be seen in Figure 7. It should be considered that at each level ofthe pyramid, the function width always increases to double always. It is alsoconsidered that convolution acts as a low pass filter. A band limit of the lowpass filter is reduced correspondingly by one octave with each level.

In many cases, there is a requirement of the band pass rather than lowpass filtered image. In order to achieve this requirement for gaining subtracteach Gaussian (low pass) level from the next level which is lower in the pyra-mid. In terms of the sample density these levels are not the same. Thereforeit is considered very important that these new levels should be interpolatedbetween those at a given level and after that level it is subtracted from thenext level which is low. This whole process could be formed by reversing theREDUCE process and so called EXPAND process. So if Gl,k is the imageobtained by expanding Gl k times. Then Gl,k = EXPAND[GGl,k−1] or, to

23

Figure 7: Equivalent weighting function [26]

be precise G10 = Gl,o = Gl, for k > 0,

G(i,j)(i.j) = 4∑m

∑n

Gl,k−1(2i+m

2,2j + n

2) (14)

Here terms for which (2i + m)/2 and (2j + n)/2 are integers contribute tothe sum. In the case of the expand operation, double the size of the imagewith each iteration, so that Gl,1 is the size of and Gl,1, and Gl,1 same size asthat of the original image. Examples of these are in Figure 5.

Band pass level or the Pyramid L0,L1,...LN, may now be specified in termsof low pass pyramid level as below.

Ll = G− EXPAND[Gl+1] (15)

Ll = G1 −Gl+1,1 (16)

All these levels are shown in Figure 6. The value at each node in GaussianPyramid could be obtained by convolving a Gaussian like equivalent weight-ing function with the original image. The difference with its original imageconvolution each value of band pass pyramid could be obtained. These op-erations have a close resemblance with the other Pyramid model, which iscalled a Laplacian pyramid, as described next.

24

3.5 Laplacian Pyramid

Laplacian pyramid was developed by Burt and Anderson in 1983 [24]. Thepurpose of the research was compression of an image. To attain the high com-pression, image correlation needs to be removed which is done by combiningtransform coding techniques and prediction. If good compression results arerequired then the de-correlation of the image has an important impact.

3.5.1 Structure

A complete image representation is an important property of the Laplacianpyramid. To recover an original image the process of constructing a pyramidis reversed. The top level of the image LN , is first expanded then added

Figure 8: Laplacian Pyramid

with LN−1, results in GN−1, this process continues, and add LN−2, and formGN−2, and so on. It can be written in this way

G0 =∑

Ll,1 (17)

Here the Pyramid is used for supporting the scaled image analysis. The levelsof Pyramids are formed by some steps which include EXPAND and REDUCEoperations. It is possible with the help of Fast Fourier Transform(FFT). A 20to 30 bit arithmetic with direct convolution and equivalent weighting functionis required to keep a similar accuracy as cascade of convolution with a smallgenerating kernel using 8 bit arithmetic.

25

Figure 9: Pyramid levels expended to the size of the original image [24]

3.6 Steerable Pyramid

The Steerable pyramid is a linear multi-scale, multi-orientation image de-composition [21] . It provides a useful front-end for image-processing andcomputer vision applications. To overcome the limitations of orthogonalseparable wavelet decompositions that were then becoming popular for im-age processing (specifically, those representations are heavily aliased, and donot represent oblique orientations well). Once the orthogonality constraintis dropped, it makes sense to completely reconsider the filter design prob-lem (as opposed to just re-using orthogonal wavelet filters in a redundantrepresentation).

Steerable pyramid is also used in this paper as a method of decomposition.The functions are dilated and rotated version of a single direction wavelet[23].

Calculation of the derivative in each direction is a promising way of cap-turing multi oriented information in a face image. However this process isvery costly. In Steerable filters [22], interpolation of several basis derivativefunctions forms a derivative of an image at any direction. Any nth order di-

26

rectional derivative function is a linear combination of the (n+1) order basisderivatives, also each adjacent basis filter phase is different by π/(n + 1) inthe Fourier domain. One of examples is that in a Gaussian filtered imagea derivative of first order can be interpolated with the help of its two firstorder derivatives. Three second order basic derivatives form the second orderderivative in any direction case.

3.6.1 System diagram

Figure 10 shows a system structure of the Steerable pyramid and Figure 11shows the one level of image decomposition using Steerable pyramid. Thereare four band pass filters. Properties of these filters in terms of radial fre-quencies, portion and angular portion are described below [7]: In Figure

Figure 10: Frequency domain structure of Steerable Pyramid [7]

10, H0(w) is a non oriented high pass filter and on the other hand the L1(w)is the narrow band low pass filter and Bk(w) refer to the band pass filters(k = 0, 1, ...K) and K is the total number of filters. The band pass compo-nents should not be down sampled to avoid the effect of aliasing. If X(w)is considered as the original image then in the Fourier domain the resultantreconstructed image is as follows:

X(w) = [|H0(w)|2 + |L0(w)|2(|L1(w)|2 +n∑k=1

|BK(w)|2)]X(w) + a.t (18)

Where a.t are used for aliasing terms

L1(w) = 0 for |w| > π

2(19)

27

Figure 11: Decomposition based on Steerable pyramid [7]

In the desire of avoiding the amplitude distortion a system transfer functionis constrained as

|H0(w)|2 + |L0(w)|2 + [|L1(w)|2 +n∑k=0

|BK(w)|2] = 1 (20)

And for the system cascading

|L1(w

2)|2 = |L1(

w

2)|2[|L1(w)|2 +

n∑k=0

|BK(w)|2] (21)

And the angular constraint on the band pass filter Bk(w) is determined bycondition of steerability and seen as

Bk(w) = B(w)[−j(cosθ − θk)]n (22)

28

where θ = arg(w), θk = kπ(n+1)

for B(w) =√

(∑n

k=0 |Bk(w)|).In our experiment method four band-pass filters are used in the pyramid, son=3.

29

Chapter 4

4 System Frame Work

This chapter explains the system diagram in detail and its implementationusing different video decomposition methods and filters.

Our system combines spatial and temporal processing for color amplifica-tion and motion magnification in a video. The system first decomposes theinput video in different frequency bands. Here the video is decomposed usingthree different pyramid decomposition methods to observe the differences inthe video quality and performance at output.

• Laplacian pyramid

• Steerable pyramid

• Gaussian pyramid

The same temporal filters are applied to all of these bands of frequencies.The filtered bands are then amplified by a given factor. The amplified signalis added back to the original video to obtain an output video. A choice oftemporal filters and amplification factor can be tuned in consideration ofdifferent applications. All steps of video processing can be seen in Figure 12.

Each block will be explained in detail to see how it works.

4.1 Input videos

Videos used as input in this thesis are all in MPEG-4 format. We used 9videos in our experimnet with different frame rates and bit rates to enhancehidden motion and colour vatiation in videos.

4.2 Decomposition

First decompose an input video in order to get pools of different frequenciesto work with it. This technique combines spatial and temporal processing toobserve video color variation and video magnification of subtle changes in avideo. Two types of video decomposition methods have been used to extractthese frequency bands of interest.

1. Gaussian pyramid

2. Steerable pyramid

30

Figure 12: Block diagram of thesis work

These bands or polls of frequencies are magnified differently because:

• They exhibit different signal-to-noise ratios.

• They might contain spatial frequencies for which the linear approxima-tion used in our motion magnication does not hold [9].

The goal of spatial processing is simply to increase temporal SNR by poolingmultiple pixels [2]. Video frames are passed through a spatially low-pass filterand later down sampled for computational efficiency. However, full Steerableand Gaussian pyramids are computed [3]. Now working of these pyramidswill be discussed.

4.2.1 Gaussian Pyramid

System diagram is shown below in Figure 13. In Gaussian pyramids, animage is decomposed into blur and fine scale details to construct pyramids.Fine scale is further decomposed into blur1 and fine1 and so on. We canreconstruct the image back by adding all blur and fine scale levels of thepyramid created.

31

Figure 13: Gaussian Block Diagram

4.2.2 Steerable pyramid

System diagram is shown below in Figure 14: The filters fhi0, f lo0 are used

Figure 14: Steerable Block Diagram

to initially split the image into a high pass residual band H0 and a low passsub band. This low pass band is then split into a low pass band L1 andK + 1 oriented sub bands B0, B1, ..., BK . The representation is substan-tially over complete. The pyramid is built by recursively splitting the lowpass band (L1) using the inner portion of the diagram (i.e. using the filtersfl1, fb0, fb1, ..., fbK).

4.3 Temporal Processing

Now temporal processing is done on these spatial bands created by Steerableand Gaussian pyramids. Time series corresponding to the value of a pixel ina frequency band is taken [9] and filtering techniques are applied to extractthe frequency bands of interest.

32

For example, a band pass filter can be used for which selected frequencieslie in the range 0.4-4.1Hz, which is 24-244 beats per minute, if it is to magnifya human pulse. In case of an extraction of a human pulse rate, a narrow bandaround that value can be used.

Different filters can be used according to interest and different video re-quirements. To extract motion and color magnification one needs to selectthese filters intelligently.

The temporal processing is uniform for all spatial levels, and for all pixelswithin each level [9].

4.4 Filters

The choice of a filter depends on the video being used. For motion magni-fication a filter with a large pass band is used and for color amplification anarrow pass band is used as it will produce low noise results.For color amplification, ideal band pass filters are used since they have sharpcut off frequencies. Low order IIR filters are used for both color and motionmagnification. Figure 15 shows frequency responses of some of the temporalfilters used in the present paper [20].

To see a method of motion and color magnification, some applied videos

Figure 15: Filters used for extracting signal

are shown in Figure 16. For color amplification two videos of an adult withdifferent skin colors are used. One with dark complexity another one with a

33

light skin color, in order to observe changes in their skin color as blood flowsthrough their faces.

Figure 16: Videos used for experiment

For the color amplification, Laplacian, Gaussian and Steerable pyramidsare used for spatial decomposition and then passed through ideal band passfilters with low and high frequency values as shown in Table 1. These valueswill remain the same for all pyrmid structures.

For the baby video, applied filters are IIR and ideal band pass usingLaplacian, Gaussian and Steerable pyramids to see the color variation andvalues of filters are given in the Table 1.

For motion magnification, videos used are face, wrist, guitar, subway,baby2, shadow, and camera. All videos are subjected to Laplacian,Gaussian,

34

Table 1: Parameters used for videos

Video α λc Wl(HZ) Wh(HZ) fs(HZ)

Baby 10 16 0.4 3 30Baby2 150 600 2.33 2.67 30

Camera 120 20 45 100 300Face 100 1000 0.83 1 30

Face2 Motion 20 80 0.83 1 30Face2 Pulse 120 960 0.83 1 30

Guitar Low E 50 40 72 92 600Guitar A 100 40 100 120 600Shadow 5 48 0.5 10 30Subway 60 90 3.6 6.2 30Wrist 10 80 0.4 3 30

and Steerable pyramid decomposition and filters to pass frequency of interestfor motion. Band pass filters are tuned to frequency of interest and theirvalues are shown in the table above. These values will remain the same forall pyrmids used to observe how they respond at output.

For guitar, an ideal band pass filter tuned with frequency of interest valuescan be seen in Table 1 and the result is observed in the video.

For motion magnification of face2, second order IIR filter Butterworthwith frequency shown in Table 1 is used. It is observed that changing thetemporal values can exaggerate motion rather than color amplification. Forcamera and wrist both IIR Butterworth and IIR low pass filters are used toamplify motion. And results and their values can be seen in Table 1.

For the shadow vidoe Butterworth low pass filter is used to amplify mo-tion.

4.5 Amplification

Amplification depends upon user requirements, as there is a question of usinga higher or a lower value of an amplification factor. There is a chance ofviolation of the bound exaggeration of specific movements in a video and thecolor changing effects. There is a risk that noise and artifacts could increase.If it is needed to amplify a human color, an emphasis should be laid on lowerspatial frequencies. Also this way of color magnification can be evaluated

35

with the use of videos.It is needed to amplify the extracted band by multiplying it with exagger-

ation factor specified in the Table 1. A user can also change the exaggerationfactor to play with videos and observe different changes and to see effects onvideos but exaggeration factor has some limitations. It produces more noiseand artifacts. We deliberately amplified motion beyond derived boundariesand where the first-order approximation holds in order to increase the effectand to demonstrate the algorithm performance.

As it can be observed, most of the videos have oscillatory movementsbecause such motion generally has a longer duration and smaller amplitudes.However, this method can be used to amplify non-periodic motions as well,as long as they are within the passband of the temporal bandpass filter. Forexample in shadow, we process a video of a shadow of the sun moving linearly.

4.6 Reconstruction

In reconstruction phase, the ampilfied signals are added back to the originalinput video signals. The pyramids are collapsed to observe the videos to seethe motion and color magnification.

Figure 17: Block diagram for Reconstruction

36

Chapter 5

5 Experimental Results

In this chapter experimental results are presented. Results are discussedin terms of computational complexity and quality of output videos. Thecomparison is implemented in Matlab and results can be observed from tablesand graphs. Dell Laptop having core i3-1.4Ghz processor with Windows 7as an operating system was used.

5.1 Computational Analysis

The computational analysis, a comparison between systems which use dif-ferent video decomposition methods is done and observed. The results aregiven in a tabular form. Following parameters are used to observe perfor-mance improvements in the output videos.

1. Number of frames

2. Frame rate

3. Height of pyramids

4. Execution time of video processing

All the parameters were calculated on the same videos as shown in the tabularform using different pyramid structures. This experiment used 12 differentvideos in order to see the change effects in the pyramid structures in thesystem. Results can be observed in Table 2, 3 and 4.

As seen from the Table 2 the total execution time of a video increaseswith an increase in pyramid levels. As seen from the baby video, executiontime of Laplacian pyramid is significantly less than the execution time ofGaussian and Steerable pyramids. As Laplacian pyramid takes 364.84 secwith a pyramid height of 8, Gaussian pyramid takes 435.6 sec with a heightof 10 and Steerable pyramid take 1764 sec with a pyramids height of 14. Sothe execution time increases with an increases of a pyramids height.

But more decomposition means less noise in the output video. Outputvideos show that Steerable pyramid has been performing better with morepyramid levels.

After processing all videos with different pyramid structures, it can beobserved that output results are almost comparable. If we require a quickoutput result with slightly higher noise, then Laplacian and Gaussian decom-positions shall be applied.

37

5.2 Computational Complexity with Laplacian Pyra-mid based Decomposition

Table 2: Laplacian Computational Results

VideoNo.

FrameRate

No. OfFrames

Height OfLaplacianPyramid

ExecutionTime(s)

Video Frame

1 30 301 8 364.389340

2 30 900 8 327.198549

3 30 301 7 85.517894

4 30 300 8 204.323243

5 29.97 300 8 149.157708

6 29.97 243 8 149.988408

7 30 1001 8 265.842864

8 29.97 25 8 556.589987

9 30 25 8 523.358121

38

5.3 Computational Complexity with Gaussian Pyra-mid Based Decomposition

Table 3: Gaussian Computational Results

VideoNo.

FrameRate

No. OfFrames

Height OfGaussianPyramid

ExecutionTime(s)

Video Frame

1 30 301 10 435.69342

2 30 900 12 387.37619

3 30 301 10 321.49535

4 30 300 10 543.715887

5 29.97 300 10 472.93758

6 29.97 243 12 232.074225

7 30 1001 10 313.640342

8 29.97 25 10 694.651638

9 30 25 12 529.776292

39

5.4 Computational Complexity with Steerable Pyra-mid Based Decomposition

Table 4: Steerable Computational Results

VideoNo.

FrameRate

No. OfFrames

Height OfSteerablePyramid

ExecutionTime(s)

Video Frame

1 30 301 14 1764.289051

2 30 900 14 1632.1394

3 30 301 14 734.9232

4 30 300 14 843.715887

5 29.97 300 14 935.2593

6 29.97 243 12 432.078825

7 30 1001 14 743.640272

8 29.97 25 12 2194.651160

9 30 25 12 1529.758392

40

5.5 Image Quality

Image quality assessment can be further sub divided into two main classes,objective and subjective quality. Subjective quality methods are based onhuman perception of quality and objective methods are based on mathemat-ical algorithms. Objective quality methods are designed while keeping inview human visual system. Objective methods must have a high correlationwith subjective quality to make it useful. Some of the mostly used objectivemethods are Structure Similarity Index (SSIM) and Peak Signal to NoiseRatio (PSNR).

5.5.1 PSNR

One major factor involved in deteriorating image quality is noise. Highervalue of PSNR indicates a good quality image and a lower noise level. Tocompute PSNR take an input video as a reference and compare it with anoutput video to measure the level of noise in the input video as compared tothe output video. PSNR can be defined as;

PSNR = 10log2552

MSE(I, Io)(23)

Where I and Io are input and output videos respectively. Mean square error(MSE) can be calculated as:

MSE =1

MN

M∑y=1

N∑x=1

[I − Io]2 (24)

Inverse proportionality exists between PSNR and MSE. A lower MSE meanslower error and thus higher PSNR.

Below figures 18 to 26 shows PSNR of different videos using Laplacian ,Gaussian and steerable pyramids based on decomposition methods.

A video of a baby sleeping in a cart is captured. As the baby breathes,small movements in his chest are produced. These movements are not soevident in the source video. Our objective is to enhance these movementsand make them evident to the naked eye. An application of these move-ment enhancements can be used to monitor a patients heart beat distantly.Laplacian, Steerable and Gaussian pyramids are used for image decomposi-tion. PSNR plots are plotted to analyze which decomposition method leadsto better results.

41

Figure 18: PSNR of Steerable, Gaussian and Laplacian Decompositions

Figure 18 shows PSNR plot for the baby video when Laplacian, Steerable,Gaussian pyramids were used. In Laplacian pyramid it can be observed thatthe highest peak value of 91 is around initial frame and the lowest value liesaround frame number 270. An average value lies between 84 and 86. WhenSteerable pyramid is used for decomposition in Figure 18, the highest valueof initial frame is observed to be 79 and the lowest peak value is about 69when the frame number is about 270. Most PSNR values lie between 72 and76. The highest value of PSNR is about 82 and the lowest value reaches 71.5when Gaussian decomposition is applied. Most values lie between 74 and 78.

When a comparative analysis of PSNR results is done for the baby video,then Laplacian pyramid performs better than two other pyramids which areSteerable and Gaussian pyramids. Laplacian pyramid has higher PSNR val-ues than Steerable and Gaussian pyramids. Thus it reduces noise in thesource video named baby more effectively as compared to Steerable andGaussian decomposition.

A source video titled as Camera is a video of a SLR camera when it isbeing used to take a picture. This video comprises of 1000 frames. There isa small vibration in the camera because of a shutter movement while takingthe picture. This vibrational movement of the camera is hidden to the humaneye and cannot be observed from the source video. This motion is evidentto the human eye when this video is processed using image decompositionpyramids. This motion magnification can be used to perform a vibrationalanalysis of machinery which may not be evident otherwise. As observed fromFigure 19 the highest PSNR peak value is 80, when Steerable decompositionis used for motion magnification in the camera video. The lowest value is72 and it can be observed around the frame number 500. Most frames have

42

PSNR values ranging from 76 to 78.


As compared to Steerable decomposition, the highest PSNR value ofLaplacian decomposition is 75. The Lowest peak value is 67 which is slightlylower than Steerable decomposition. An average value range can be verysimilar to the Steerable decomposition as shown in Figure 19.It is observedthat Gaussian pyramid performed poorer than Steerable and Laplacain pyra-mid in this case. Gaussian PSNR values are lower than both Steerable andLaplacian. It has the highest PSNR peak of 66 and its lowest PSNR valueis 58. PSNR values mostly range from 62 to 64. If we compare all the threepyramids for this video then it can be concluded that Steerable pyramid hasreduced noise more as compared to Laplacian pyramid, which has higherPSNR values than Gaussian pyramid but still less than Steerable pyramid.

When it comes to counting of a human pulse rate without any equipmentthen one way is to place your finger on the wrist and count small motions ofveins due to blood flow. A human heart pumps blood through veins at smallintervals. A human skin color also changes as blood is pumped by the heart.Capturing a video of human wrist and later processing it to extract thesesmall changes in skin color as well as a motion in veins can help doctors tocheck a heartbeat remotely. A source video of the wrist contains around 900frames. A skin color and motions are not observable in this source video.

Figure 20 shows PSNR values of all frames in the source video titled aswrist. PSNR values for Steerable decomposition vary between 81.5 and 76.7.Most of the frames have PSNR values between 81 and 79.5. These valuesare greater than PSNR values obtained using Laplacian pyramid. Steerabledecomposition resulted in achieving better and higher PSNR values and thus

43


a better video quality.PSNR plot in Figure 20 shows that gaussian graphhas the highest value of 77.3 and the lowest up to 72.7. It also has a loweraverage value than Steerable pyramid graph as most of the frames have PSNRvalue lying in the range from 75 to 77. In this case Steerable decompositionproduced better results.


One of the great challenges faced by the humanity is conservation ofthe environment. These environmental changes are difficult to analyze asthey are slow and span over long time periods. When these changes areobserved by humans, an irrecoverable damage has already been made to theenvironment. In this video we captured a shadow as the sun sets down.There is a considerable change in a shadow length in a small time period.

44

This change is not evident to the human eye otherwise. This idea can beused to observe changes in the environment. Melting glaciers can be observedin the same way and even a small change can be observed timely and aconsiderable damage can be avoided. Steerable decomposition was used andresulting PSNR plot for the shadow video output is shown in Figure 21. Thehighest and the lowest PSNR values are 87.5 and 75 respectively. Gaussiandecomposition resulted a better video quality as it has higher PSNR valuesas compared to Steerable pyrmaid. So it can be concluded that Gaussiandecomposition produces better results in terms of quality.

Laplacian image decomposition has pretty much the same PSNR valuesas of Steerable decomposition in Figure 21.


Small changes in a skin color can be used to predict a human pulse rate.This task can be achieved by processing face2 video using decompositionpyramids. Later these methods can be evaluated based on PSNR values,which depict that which method performs better.

Figure 22 shows PSNR values of frames in face2 video while using Lapla-cian pyramid. The highest value is 82.2 and the lowest value is 78.3. PSNRplot for the same source video but using Steerable decomposition is alsoshown in Figure 22. Steerable decomposition results are slightly improvemedin PSNR values than the Laplacian decomposition.

When compared to Steerable and Laplacian decompositions PSNR plots,Gaussian decomposition results are in higher PSNR values than Laplacianbut less values than Steerable. PSNR plot in Figure 22 shows values whenGaussian decomposition is used.

PSNR plot in Figure 23 is for a source video titled as Subway. Laplacianpyramid was used for video decomposition. Small motions were made visible

45


after processing the video. The highest value is 78 and the lowest value is66. PSNR values using Laplacian are less than PSNR values when Steerabledecomposition is used instead.

Figure 23 also shows PSNR results for Gaussian pyramid decomposition.The highest and the lowest peaks are 80.5 and 67 respectively. These valuesare less than Steerable pyramid decomposition but greater than Laplaciandecomposition.


For baby2 video Steerable has the highest PSNR value followed by Lapla-cian and then Gaussian. This video is of a new born baby and his heart beatis detected when we amplify the skin color by using these decompositionalgorithms. The highest PSNR value in Figure 24 is 96.5.

46


One of test video to detect a pulse rate by skin variations was titled asface video.

Face video was subjected to three decomposition algorithms and resultingPSNR plots are shown in Figure 25. We get the highest PSNR values whenSteerable pyramid used for image decomposition.


Guitar is a source video’s title which is used for motion magnification of astring. Respective PSNR plots for Gaussian, Laplacian and Steerable decom-positions are shown in Figure 26. By comparing the PSNR plots of Steerableand Gaussian with Laplacian decomposition plot, it can be deduced thatLaplacian performed better for this test video and produced better results.

47

Steerable decomposition produced better results as compared to Lapla-cian and Gaussian decompositions for most of the test videos.

5.5.2 Structural Similarity Index

Structural Similarity Index (SSIM) is a quality metric which measures astructural similarity between two frames. SSIM is still used as an alterna-tive for evaluation of perceptual video quality [25]. SSIM considers qualitydegradations in the frames as perceived changes in the variation of structuralinformation between two frames.

SSIM(n) =[2µIR(n)µID(n) + C1][2σIRID(n) + C2]

[µIR2(n) + µID

2(n) + C1][σIR2(n) + σID

2 + C2](25)

µ(IR)(n) , µ(ID)(n) are mean intensity of nth frame of reference (IR) anddistorted (ID) video sequence, σ(IR)(n) and σ(ID)(n) are contrast of nth frameof reference (IR) and distorted (ID) video sequence. C1, C2 are constants usedin order to evade any instabilities in the structural similarity comparison.

SSIM is calculated for entire sequence of video of length N.

SSIM =1

N

N∑n=1

SSIM(n) (26)

Figures below are the SSIM graphs comparisons when Laplacian, Gaussianand Steerable pyramids are used as video decomposition and reconstructionmethods.

Here SSIM has been calculated for all videos taken as an input. Firstwe calculated SSIM for each frame in a video sequence as shown in thegraphs below. Then in Table 5 SSIM has been calculated for complete videosequence.

48

Figure 27: SSIM of Steerable, Gaussian and Laplacian Decompositions

SSIM graphs of baby video in Figure 27 shows that SSIM value for Lapla-cian decomposition has its peak value in initial frames around the value of 1,Gaussian has around 0.99998 and Steerable 0.99997. So Laplacian pyramidsperform better in case of the baby video.


49

Baby2 video SSIM values calculated for each frame in video sequence areshown in Figure 28. Steerable pyramid decomposition has the highest valuearound 1.0000 and the lowest value is 0.99998. Comparing to other pyramidstructures it has been seen that Steerable pyramid perform better in termsof structural similarity.


Figure 29 shows SSIM graph for the camera video. SSIM values are al-most overlapping for Steerable and Laplacian pyramids and their mean SSIMvalues are almost 0.9998 which are better than Gaussian pyramid decompo-sition for the same video.


50

In face2 video Steerable has high values around 1 and low values around0.99999 which are higher than Gaussian and Laplacian values for same thevideo as shown in Figure 30. So Steerable perform better than other pyramidstructures.


Face video is similar to face2 video. Steerable pyramid has higher SSIMvalues than other pyramid structures as it is shown in the Figure 31.


51

Similar graphs for the guitar video have been constructed. It can beseen in Figure 32 that Laplacian pyramid has its high mean value around 1followed by Gaussian and then Steerable pyramids.


In the shadow video’s SSIM graph in Figure 33, Laplacian and Gaussianpyramids are almost overlapping and has a high average value of 0.99999.Both perform better than Steerable in this case.


In Figure 34 Laplacian, Steerable and Gaussian pyramids are all over-lapping each other in the subway video. So all perform equally in terms ofSSIM.

52


In the wrist video Figure 35 Steerable pyramid has higher average andpeak values than other pyramid structures and gives better SSIM values.

53

Table 5 below shows all the calculated mean structural similarity index(SSIM) values for a complete video sequence. It shows the mean value ofall frames in the video sequence using different pyramid structures for videodecomposition and reconstruction.

Table 5: SSIM Values For Complete Video Sequence

VideoNo.

LaplacianPyramid

GaussianPyramid

SteerablePyramid

Video Frame

1 1.0000 0.9999 0.9999

2 1.0000 0.9999 1.0000

3 0.9998 0.9997 0.9999

4 0.9999 0.9998 0.9999

5 1.0000 0.9999 0.9999

6 0.9994 0.9994 0.9994

7 1.0000 1.0000 0.9999

8 0.9996 0.99996 0.9995

9 0.9999 0.9999 0.9999

54

Chapter 6

6 Conclusion and Future Work Suggestions

6.1 Conclusion

A standard video has been taken as input and magnified to amplify the smallmotion in it which was invisible to human eye. This method processes pixelsat specific positions in a video where it has got low frequencies and amplifiesthem to see small changes in video.

Furthermore, work improvements in Eulerian motion magnification andcolor amplification by using Gaussian and Steerable pyramids for video de-composition were investigated. To observe small changes in a video and toextract subtle changes Steerable and Gaussian pyramids were used. The ef-fects on different videos with different formats and noise environments whensubjected to this system were observed. These techniques were implementedin Matlab to see the improvement in the system performance. Proposedmodel was found to be better in performance then contemporary Eulerianmagnification using Laplacian pyramid in some cases. Magnification usingSteerable pyramid showed better results than Laplacian, as seen from PSNR,SSIM graphs and comparison tables in Chapter 5.

This model will be more effective for magnification in case of specificapplications, as concluded by our experimentation.In the future it is recommended to investigate system using wavelet and othervideo decomposition also with different filter structures to see improvementin system.

The method how to take information from these small videos changescan be applied to different scenarios such as for security, medical, sports, liedetection or weather purposes.

55

References

[1] LIU, C., TORRALBA, A., FREEMAN, W. T., DURAND, F., AND ADELSON, E. H.

2005. “Motion magnification.”,ACM Trans.Graph. 24, 519–526.

[2] WANG, J., DRUCKER, S. M., AGRAWALA, M., AND COHEN, M. F. 2006. “The

cartoon animation filter.”, ACM Trans. Graph.25, 1169–1173.

[3] HORN, B., AND SCHUNCK, B. 1981. “Determining optical flow Artificial

intelligence.”, ACM Trans. Graph.17, 1-3, 185–203.

[4] FUCHS, M., CHEN, T., WANG, O., RASKAR, R., SEIDEL, H.-P., AND LENSCH, H. P.

2010. “Real- time temporal shaping of high speed video streams.”, Computers & Graphics 34,

5, 575–584.

[5] CONG YONG SU, YUETING ZHUANG, LI HUANG, FEI WU. “Steerable pyramid-based

face hallucination.”, in: Proceedings of the IEEE International Conference on Pattern

Recognition , Hangzhou, China, 2005, pp. 813–824 .

[6] E.P. SIMONCELLI, W.T. FREEMAN, E.H. ADELSON, D.J. HEEGE “Shift able multi scale

transforms”, IEEE Trans. Inform. Theory, 38 (2) (1992), pp. 587–607.

[7] Z. LIU , K. TSUKADA , K. HANASAKI , Y.K. HO , Y.P. DAI “Image fusion by using

steerable pyramid.”, Pattern Recognition Letters, Volume 22, Issue 9, July 2001, Pages 929–

939.

[8] H. OLKKONEN, P. PESOLA. “Gaussian Pyramid Wavelet Transform for Multi resolution

Analysis of Images”, Graphical Models and Image Processing, Volume 58, Issue 4, July 1996,

Pages 394– 398 .

[9] HAO-YU WU, MICHAEL RUBINSTEIN, EUGENE SHIH, JOHN GUTTAG, FREDO

DURAND, WILLIAM T. FREEMAN. “Eulerian Video Magnification for Revealing Subtle

Changes in the World.”, ACM Trans. Graph. Proceedings SIGGRAPH (2012), volume 31, no

4.

[10] Metkar Shilpa, P.; Talbar Sanjay, N., "Dynamic Motion Detection technique for fast and

efficient video coding.", IEEE Region 10 Conference TENCON 2008, vol., no., pp.1,5, 19-21

Nov. 2008.

[11] Dongming Zhang; Gang Cao; Xiaoguang Gu, "Improved motion estimation based on

motion region identification.", 2012 International Conference on Systems and

Informatics (ICSAI), vol., no., pp.2034,2037, 19-20 May 2012.

[12] Erturk, S., "Multiplication-Free One-Bit Transform for Low-Complexity Block-Based

Motion Estimation.", IEEE Signal Processing Letters, vol.14, no.2, pp.109,112, Feb.

2007.

[13] Yamana, F.; Suguyama, T.; Abe, K., "Edge plane detection in spatio-temporal images

by using edge vector and edge reliability.", 15th International Conference on Pattern

Recognition Proceedings, 2000, vol.3, no., pp.660,663 vol.3, 2000.

[14]

Celebi, A.; Hyuk-Jae Lee; Erturk, S., "Bit plane matching based variable block size

motion estimation method and its hardware architecture." , IEEE Transactions on

Consumer Electronics , vol.56, no.3, pp.1625,1633, Aug. 2010.

[15] S.-C. Hsu, S.-F. Liang, and C.-T. Lin, “A robust digital image stabilization

technique based on inverse triangle method and background detection.", IEEE

Transactions on Consumer Electronics, vol. 51, no. 2,pp. 335 - 345, may 2005.

[16] Gibson. 1.1.. The Perception of the Visual World (Riverside Press,Cambridge. 1950).

[17] A. A. Shafie; Fadhlan Hafiz; M. H. Ali; “ Motion Detection Techniques Using Optical

Flow.”, World Academy of Science, Engineering and Technology , 2009.

[18] C. McCarthy and N. Barnes. “Performance of optical flow techniques for indoor

navigation with a mobile robot.”, In Proc. of the International Conference on Robotics

and Automation,volume 5, pages 5093-5098. IEEE, 2004.

[19] J. Xiao, J. Chai, and T. Kanade. “A closed-form solution to non-rigid shape and motion

Recovery”, International Journal of Computer Vision, 67(2):233-246, 2006.

[20] Wang Rui; Zhao Yan; Tang Yujun; Yuan Yan, "A spatio-temporal filtering method for

motion estimation.", 6th International Conference on Computer Science & Education

(ICCSE), 2011, vol., no., pp.830,834, 3-5 Aug. 2011.

[21] Karasaridis, A.; Simoncelli, E., "A filter design technique for steerable pyramid image

transforms.", IEEE International Conference on Acoustics, Speech, and Signal

Processing, 1996. ICASSP-96. Conference Proceedings., 1996, vol.4, no.,

pp.2387,2390 vol. 4, 7-10 May 1996.

[22] Freeman, W.T.; Adelson, E.H., "The design and use of steerable filters,", IEEE

Transactions on Pattern Analysis and Machine Intelligence, , vol.13, no.9,

pp.891,906, Sep 1991.

[23] Unser, M.; Chenouard, N.; Van De Ville, D., "Steerable Pyramids and Tight Wavelet

Frames in L2.", IEEE Transactions on Image Processing, vol.20, no.10, pp.2705,2721,

Oct. 2011.

[24] Burt, P.J.; Adelson, E.H., "The Laplacian Pyramid as a Compact Image Code,", IEEE

Transactions on Communications, vol.31, no.4, pp.532,540, Apr 1983.

[25] Shahid, M.; Rossholm, A.; Lovstrom, B., "A reduced complexity no-reference artificial

neural network based video quality predictor,", 4th International Congress on Image

and Signal Processing (CISP), 2011, vol.1, no., pp.517,521, 15-17 Oct. 2011

[26] C. H. Anderson and J. R. Bergen and P. J. Burt and J. M. Ogden, "Pyramid Methods in

Image Processing,", RCA Technical Report, pp.29-6, Dec. 1984.

Documents

VIDEO COLOUR VARIATION DETECTION AND MOTION MAGNIFICATION ...830532/FULLTEXT01.pdf · AND MOTION MAGNIFICATION TO OBSERVE SUBTLE CHANGES ... 12 Block diagram of thesis work ... culation