Introduction to Image Processing - Home Page of Jerzy … · · 1998-02-11Introduction to Image Processing DESS + Maˆıtrise d’Informatique, ... The image processing domain has

Introduction to

Image Processing

DESS + Maıtrise d’Informatique, Universite de Caen

Jerzy Karczmarczuk

Caen 1997/1998

Contents

1 What do we need it for? 3

2 2D Geometric Transformations 72.1 Typical Linear Transformations . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Scaling, Rotation, Shearing . . . . . . . . . . . . . . . . . . . 72.2 Other Affine Transformations and Perspective . . . . . . . . . . . . . 9

2.2.1 Linear Transformation of Triangles . . . . . . . . . . . . . . . 112.2.2 Nonlinear Deformations (Overview) . . . . . . . . . . . . . . . 12

2.3 How to do Geometry in Discrete Space . . . . . . . . . . . . . . . . . 17

3 Filtering 203.1 Introduction: Filters and Convolutions . . . . . . . . . . . . . . . . . 203.2 Typical Filters and their Applications . . . . . . . . . . . . . . . . . . 21

3.2.1 Smoothing filters . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 De-noising by median filtering . . . . . . . . . . . . . . . . . . 233.2.3 Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.4 Laplace and Sobel operators . . . . . . . . . . . . . . . . . . . 253.2.5 Sharpening, and Edge Enhancing . . . . . . . . . . . . . . . . 26

3.3 Fourier Transform and Spatial Frequencies . . . . . . . . . . . . . . . 273.3.1 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . 29

4 Colour Space Manipulations 304.1 Colour Spaces and Channels . . . . . . . . . . . . . . . . . . . . . . . 304.2 Transfer Curves, and Histogram Manipulations . . . . . . . . . . . . . 31

4.2.1 Transfer Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2.2 Practical gamma correction . . . . . . . . . . . . . . . . . . . 334.2.3 Histogram Equalization . . . . . . . . . . . . . . . . . . . . . 34

4.3 Transparence Channel . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Image Algebra 385.1 Addition, Subtraction and Multiplication . . . . . . . . . . . . . . . . 38

5.1.1 Some Simple Examples . . . . . . . . . . . . . . . . . . . . . . 395.2 Working with Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2.1 Construction and Usage of the Transparence Channel . . . . . 43

1

CONTENTS 2

6 Some Special 3D Effects 456.1 2D Bump mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.2 Displacement Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2.1 Another example: turbulence . . . . . . . . . . . . . . . . . . 48

7 Other Deformation Techniques 517.1 Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.2 Morphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Chapter 1

What do we need it for?

Our course deals with the creation of images, and concretely with the synthesis andrendering of 3-dimensional scenes. What is the role of the image processing here, dowe really need to add new topics to a domain already sufficiently rich? The answeris: yes, we do. Of course, in order to construct a sound 3D model and to launch aray tracer, one does not need to master all, sometimes very specific 2D techniques,although at least the 3D scene should be constructed in a way adapted to its 2Dpresentation. (If you choose badly the position, the direction, or the focal attributesof your camera, even a wonderful composition of your scene won’t help you. . . ) Ifthe rendering program/device has been constructed and piloted correctly, perhapsno post-processing is ever needed. However much more often one has to do someminor corrections, or more important alterations of the produced images.

You might wish to add through post-processing some special effects which wouldbe extremely costly if integrated into the 3D rendering, or just add some 2D externalelements to the image, such as some text or frames. Also, when composing a syn-thetic scene with a “natural” texture, parts of a photograph etc., more often thannot the contrast, brightness or colour distribution should be adjusted.

The image processing domain has some independent, creative aspects as well.We will not speak here about artistic creation and painting techniques, althoughthe author confesses that this is for him a fascinating subject. We might have tothink seriously about:

A Creation of various coloured textures: regular and stochastic; based on realphotographs or totaly synthesized; simple or showing a replicated geometricpattern; texts, etc. In general – everything we may need to enrich the surfacesof 3D objects.

B Compositions and collages; clips from one image added to another; eliminationor replication of some fragments; retouch.

C Colour space transformations

• Adjustments of the luminosity, contrast, hue (colour bias) or/and gammacorrection.

3

4

• Histogram equilibration (equalization) or stretching.

• Dithering and halftoning. Colour quantization and creation of micro-patterns: modifying the pixel structure of the image in order to simulatenon-existing colours.

• Thresholding; all kind of transfer curve manipulations and creation ofartificial colour variations. Also: ex post colouring of gray photographs.

• “Image algebra” (or arithmetic if you prefer). Addition, multiplication,and other variants of image “blitting” or “rasterOps”. These techniquespermit to change colour distributions, but also to add/subtract imagefragments, administrate the “sprites” in animated pictures, etc.

D Geometric transformations: rotations, scaling, simulated perspective; non-linear transformations adapted to the texture-mapping manipulations or thedeformation introduced by some non-standard cameras: panoramic, fish-eye,etc. Arbitrary deformations: warping.

E Composite deformations and blending: morphing.

F Special effects:

• Bump mapping and other pseudo-extrusion techniques which give to theimage a 3D appearance, notably the “embossing” technique.

• Lighting effects: halos and glows, lens reflections, distance (“atmospheric”)attenuation introduced ex post.

• Particular artistic effects: transforming realistic images into pointilist(van Gogh like) or other impressionist tableaux; “hot wax”, aquarelle,or carbon/chalk pictures, etc. One may wish to transform a photographinto an ancient copperplate engraving, or a comics strip style drawing.The possibilities are infinite. The main problem is to transform humanimagination into an algorithm. . .

G “Classical” filtering manipulations: edge enhancing, noise removal, anti-alia-sing by blurring, sharpening, etc.

H More analytic operations, which recognize or segment the image fragments:contour tracing, or zone (colour interval) selection, essential for the cuttingand/or replacing picture fragments. The contour finding and representationis a very large domain per se, we will mention briefly some standard filteringtechniques, but we cannot discuss other, more modern and fascinating subjectsas the active contours (“snakes”) or the watershed algorithm. These, anyway,serve principally for the image analysis and interpretation rather than as acreation aid.

5

I Other manipulations which belong to the image analysis, and which will beomitted from these notes, or just briefly mentioned:

• Vectorization: transformation of a bitmap into a set of geometric objects– lines, circles, etc.

• Segmentation through the Hough transform: representation of geomet-rical entities of the image in the space of parameters which define theseobjects.

• Karhunen-Loeve (Hotelling) transform which is a powerful tool of thestatistical analysis of images, signals, etc.

• Reconstruction of images from various linear signals, for example fromtheir Radon transforms generated by X-ray scanners.

(The Radon or Karhunen-Loeve transforms may serve also for more prosaicmanipulations. They might help to localise and to parameterise the lines whichshould be by definition horizontal or vertical, but they are not, because thephoto we put in the scanner was slightly slanted.)

J We shall not discuss either the image compression (which is the object ofanother course for the 5-th year students: DESS “Images”). The list of omis-sions is anyway almost infinite: wavelet representation, procedural creationof pictures, for example through IFS (Iterated Function Systems) or the L-systems (Lindenmeyer “grammatical” approach to the generation of iterativeand recursive pictures, very good for the simulation of plants), etc.

These items are not independent, but strongly related. For instance, the simulated3D effects are often not geometric manipulations, but just some specific modifica-tions of the colour distribution (bump-mapping). (But displacement maps are ofgeometric nature.)

During the preparation of these notes we have used very different software, commer-cial and free. The image processing packages are very abundant and it is easy tofind several free or shareware programs very powerful and user friendly. The com-mercial package heavily used was the well known Photoshop, but the Unix users mayobtain almost all its special effects (and many more!) from a free package GIMP, awonderful interactive and programmable tool, still evolving.

As our ambition is to explain the essentials of the image processing, it was nec-essary to do some low-level programming, to treat images as matrices of numericalvalues (gray level, colour triplets or palette indices), and to process them as such.Of course it is possible to use any programming language to do so, and we havetried several of them. The necessity to perform many experiments, to change inter-actively this or that element of the processed picture precludes the usage of classicalcompile-and-run languages, as C++ or Java, the interactivity is much more impor-tant than the brutal efficiency, so we used the scientific programming system Matlab.It is a commercial package, but there are other, free systems well adapted to matrix

6

processing, such as SciLab or Tela. (But Matlab is a scientific, vectorized computa-tion package, and has excellent interfacing tools which permit both high-level andlow-level visual data processing. This is slightly less developed in above-mentionedfree systems (to which we may add Rlab and Octave), which have other objectivesthan image processing.

There are also some powerful programming/integrating machines specificallyadapted to the treatment of images as Khoros. The low-level programming is left tothe user, but it is seldom needed. All typical image algebra and geometry are alreadyready to use in Khoros standard libraries, and the combination of modules and theirinteractive construction and execution using the visual, dataflow style interface isreally very comfortable and elegant.

Khoros has a mixed status: one can get freely the sources and compile them(which sometimes is not trivial. . . ), or buy a commercially packed full system, withplenty of documentation and examples.

We acknowledge the existence of some other similar packages, commercial, butwhose distributors offer sometimes some working demonstration versions, for exam-ple the IDL system, which offers a good graphical programming language and verydecent interfacing.

Of course we will use also some drawing packages, for example the MetaPost sys-tem which permits to include some PostScript drawings into the document withouthaving to program them using the horrible PostScript language (very clumsy, butsometimes very useful!). MetaPost has the advantage of generating a particularlysimple, developed PostScript, easily convertible into PDF by the ConTeXt package ofHans Hagen, without the necessity of using the commercial Adobe Distiller product.

These notes are constructed independently of the other parts of the author’s courseon image synthesis, and they may be read separately, but – of course – they shouldbe treated as a part of a bigger whole. This is the first version, very incomplete,and probably very buggy. Please contact the author if a particularly shameful erroris discovered.

Chapter 2

2D Geometric Transformations

2.1 Typical Linear Transformations

We begin thus to discuss some geometric manipulations of the 2D images. Theywill be presented, if necessary, in a coordinate-independent fashion, using abstractvector algebra, but the difference between the 3D scenes, where we are interested in“real” objects and their mutual relations, and 2D images is essential. There is noneed to introduce abstractions where the only thing to do is just to transform thepixel coordinates in a loop. There is no particular need for the homogeneous coor-dinates, although they might simplify the presentation of the simulated perspectivetransformation.

Only continuous geometry is discussed in this section. The real problems, themanipulations of discrete pixel matrices is postponed to the section (2.3).

2.1.1 Scaling, Rotation, Shearing

The scaling is a very simple operation:

x→ sxx, y → syy, (2.1)

or, if you wish (xy

)→(sx 00 sy

)(xy

)(2.2)

where the pair sx, sy may contain negative components, but then one has to reinter-pret the negative coordinates; usually both x and y are positive, starting at (0, 0),and rarely we think of the image as of something centered around the origin of thecoordinate system (although it might help while considering an image to be a filter,see section(3). We have then to add the compensating translation. If in the originalpicture x varies from 0 to A, and sx is negative, the real final transformation is

x→ sx(x− A). (2.3)

7

2.1 Typical Linear Transformations 8

The rotation matrix is well known:(xy

)→(

cos(θ) − sin(θ)sin(θ) cos(θ)

)(xy

), (2.4)

but, what is sometimes less known is the fact that this rotation can be composedout of three shearing (slanting) transformations parallel to the coordinate axes. Thex-shearing transformation which gives the effect shown on Fig (2.1) (any Braziliansamong readers?. . . ) has the representation (2.5).(

xy

)→(

1 κ0 1

)(xy

). (2.5)

Fig. 2.1: The x-shearing transformation

Defining κ = tan(θ/2) we prove easily that the rotation matrix fulfils the followingidentity: (

cos(θ) − sin(θ)sin(θ) cos(θ)

)=

(1 −κ0 1

)(1 0

sin(θ) 1

)(1 −κ0 1

), (2.6)

which give together the chain of transformations shown on Fig. (2.2).

Fig. 2.2: Rotation from shearing

It seems a little useless, since it is more complicated than a simple rotation matrix,but it might be faster, if there is no pixel interpolation involved. Horizontal orvertical slanting displace entire rows or columns, and if we have a fast pixel blocktransfer routine, some time may be economised. Beware however of aliasing!

The slanting deformation can be easily generalized producing a trapezium shown onFig. (2.3).

2.2 Other Affine Transformations and Perspective 9

Fig. 2.3: Generalized shear-ing

The mathematics of this transformation is quitesimple, we see that the slanting coefficient in (2.5) isnow x-dependent, and that this dependence is linear.So, the whole transformation is not linear any more:y → y, x→ x+ (α + βx)y.

This might be considered as a poor-man perspec-tive (horizontal) transformation: the figure representsa rectangle disposed horizontally, perpendicularly tothe vertical plane defined by the sheet of paper, or thescreen of this document. We look at this stripe from above and from the right, andthe shape is intuitively correct.

However, simulating the perspective in such a way is a rather bad idea. Theproblem is that – as easily seen from the figure – the displacement and the com-pression are strictly horizontal. But the proportions along the other direction aremodified as well. We know that the perspective is affine only in the homogeneouscoordinates.

2.2 Other Affine Transformations and Perspec-

tive

The real perspective looks like on the Fig. (2.4). The details of the transformationdepend on the relation between the simulated orientation of the original image andthe screen. Our example figure is placed vertically and perpendicularly to the screen,but, for example the “Star Wars” text is horizontal. We might obtain an obliqueorientation of the original also, like on Fig. (2.5).

Fig. 2.4: Once upon a time. . . there were some perspectives

Now, how to obtain these transformations? The technique used is the following.The image is enclosed in a rectangular box. We can move arbitrarily the corners ofthis box, producing an arbitrary quadrilateral. The parameters of the perspectivetransformation which combines

• the simulated position of the original image in the 3D space, and

• the parameters of its projection on the screen


Fig. 2.5: Perspective; oblique orientation

is retrieved from these corners, and all the rest is straightforward. For the sake ofcompleteness we re-derive the perspective projection formulæ. The following entitiesare known:

• The position of the projection point (the eye): ~xP . Usually in the 3D compu-tations this point is fixed, for example at origin, or ~xP = (0, 0, 1), etc. Herethis is not an independent geometric object, we will find it from the resultingprojection quadrilateral.

• The projection plane (screen): ~x′ · ~n = d.

The homothetic projection is shown on Fig. (2.6).

~xP~n

~x

~x′

Fig. 2.6: Perspective Projection

The vector ~x′ is the homothetic map of ~x, so we can write that

~x′ − ~xP = κ(~x− ~xP ). (2.7)

But ~x′ lies on the projection plane. Thus, multiplying the equation (2.7) by ~n weget

κ =d− ~xP · ~n

(~x− ~xP ) · ~n, (2.8)

from which ~x′ can be easily computed. But here we are interested in solving a totallydifferent problem! First, we simplify the equation (2.8) identifying the projection


screen with the plane xy (so ~n is the unit vector along the z axis, and d = 0), andplacing the focal point xP at (0, 0, r). We obtain the following transformation(

x′

y′

)=

r

r − z

(xy

). (2.9)

Here we don’t know the vector (x, y, z) yet. It belongs to the original image, consid-ered always rectangular, with its canonical Cartesian system, say {~x0, ~u,~v}, where~x0 is a distinguished point, for example one corner, or the center of the image, and~u,~v define the axes. Of course, if ~x0 is the left lower corner, the natural choice willbe ~u = ~x1 − ~x0, and ~v = ~x2 − ~x0, etc. Some other convention might be easier if theorigin is chosen at the image center.

So, we need to find this coordinate system, i.e. 3 vectors with two of themperpendicular. There are thus 8 unknowns, and we have 8 equations for the 4distorted corners. We leave this exercice to the reader.

2.2.1 Linear Transformation of Triangles

When a fragment to fragment mapping is needed, as in morphing (see the section(7.1)), usually both the source and the target areas are split in simple polygons, forexample in triangles. Triangles are always convex and their simplicity ensures thatthe mapping is unique, and without pathologies. The task consists in mapping thetriangle spanned by the three points ~p0, ~p1, ~p2 into the triangle defined by ~q0, ~q1, ~q2.The mapping should be linear. The Fig. (2.7) shows the result.

p0

p1

p2

x

q0

q1

q2

x’

Fig. 2.7: Linear Triangle Mapping

The solution goes as follows. We establish within the first triangle a local coordinatesystem spanned by ~u = ~p1 − ~p0, and ~v = ~p2 − ~p0. The axes are not normalized.Every internal point ~x of the triangle admits the representation ~x = ~p0 + α~u + β~v.Knowing that the problem is planar, and that we can treat the vector product asscalar (pseudo-scalar, but this is unimportant; it has one component only), we get

~x =1

~u ∧ ~v((~x ∧ ~v)~u− (~x ∧ ~u)~v) , (2.10)


i.e., α = ~x ∧ ~v/~u ∧ ~v, etc. In the second triangle we introduce the corresponding

base ~g = ~q1− ~q0 and ~h = ~q2− ~q0, and we restore ~x′ = ~q0 + α~g+ β~h. The only detailwhich remains is the correct implementation in the discrete case, as always.

Obviously, if the problem of triangle mapping is solved, any polygons may be treatedafter their triangularisation. A natural caveat seems appropriate here: if the map-ping is linear, and the triangular components of a polygon are treated separately,the resulting global mapping is continuous, but straight lines might break at thetriangle boundaries, as shown on Fig. (2.8).

Fig. 2.8: Linear Transformation of Polygons is Dangerous

Such technique might not be acceptable. Moreover, even in the case of quadrilaterals,if the target figure is convex, there is a choice of two diagonals, which add someambiguity to the mapping strategy. The lines will break in different places. Thesection (7.1) discusses some other methods of deformation, essentially non-linear.

2.2.2 Nonlinear Deformations (Overview)

The perspective is already nonlinear, but we want to treat here some more generalcases, especially useful in texture mapping. If a flat picture is projected on thesurface of a 3D object, and then this object is projected on the screen, the twotransformations have to be composed. If it is possible to derive the composite trans-formation before the rendering, this process can be accelerated. If for the renderingthe ray tracing is used, this is usually useless. We have to find the intersection ofthe ray with the point in the 3D space, and from its coordinates we may deducethe colour of the pixel. But if a polygon scan is used, or if the radiosity machineprepared all the 3D surfaces in a more clever way, and we have only to project ev-erything on the screen, some simplifications can be obtained. Some faster renderingengines like 3DS or the dynamic games: Doom, Quake, etc. pre-prepare the mappedtextures.

Another application of nonlinear deformations is the general warping, whichwill be discussed in section (7.1). The warping is ususally an “artistic”, manualmanipulation of the image, but there is at least one nonlinear flat transformationwhich may be considered algorithmic, and it is strongly related to the perspectivetransformation: the normalisation of images (photographs) obtained with very wide


lens camera, for example with the “fish-eye” lenses. We may wish to “flatten”, torestore straight lines of such picture as Fig. (2.9), or inversely, to compose onepanoramic strip out of flat image fragments, as on Fig. (2.10). The first picture hasbeen taken from the NASA collection, essentially unprotected, and the other fromthe BYTE Web page, with a possible copyright infringement. If somebody knowsmore about their restrictions, please let me know).

Fig. 2.9: Fish view of a cosmic enterprise

Fig. 2.10: Brooklyn bridge composed out of flat fragments

It is possible of course to combine small picture fragments in an affine way, withoutintroducing curves, nor the the “Cinerama” style discontinuities of tangents. TheFig. (2.11) shows it very properly done.

All these operations need a very strong analytic apparatus, or an interactiveadjustment, by a human. For example, in order to render the straight lines on the


Fig. 2.11: Reconstruction of a wide-angle shot

Cap Canaveral photo, either one has to deduce the true geometric proportions ofthe 3D scene which requires a rather intelligent geometric segmentation, or one triesthrough dynamic warping to transform all the circle segments into straight lines,which may not be unique.

Of course, if the focal properties of the fish-eye camera are known, the restora-tion of the picture (or its simulated generation from a flat image) is essentiallystraightforward, although not entirely trivial.

We describe now partially, and with some simplifications the fish-eye camera. Thisis not a – strictly speaking – image processing problem, and it will migrate to the3D-geometry section in future releases of these notes, but for the moment the readermight treat this subject as an exercice in image deformation.

The most classical fish-eye model (not necessarily fully practical) is based on thestereographic projection shown on Fig. (2.12). The idea is thus very simple. Insteadof projecting the 3D scene on a flat screen, we project it on spherical surface. In sucha way we can cover 180 or more degrees without introducing geometric pathologies.(Of course, we cannot cover 360◦, but there are already some standardised techniques


of producing and displaying the images covering the full solid angle, for exampleIPIX, which is even accessible as a Web browser plug-in).

focus

spher. proj.final proj.

Fig. 2.12: “Fish-eye”-like deformation

We have the following geometric entities:

• The projection sphere with radius R, considered usually to be very small ascompared with the true scene dimensions, but it is not small, when a simulatedprojection is used to deform an already existing 2D image.

• The main focus which is not necessarily the center of the sphere.

• The stereographic projection point, and the projection plane: we need to mapthe sphere sector to a flat disk. We may choose for example the pole oppositeto the main vision direction (the center of the image disk), and for the plane– the “equator”. Other variants are also possible, but more difficult to treatmathematically.

• Finally, we have to define somewhere the limiting angle. This is essential forthe fish-eye rendering, but also for the flattening: we have to know how farshall we go, the size of the 180◦ image is essentially infinite. . .

If we want just to simulate the fish-eye camera and to construct a distorted imagefrom something perfectly standard, we may use the center of the sphere as the focalpoint. This is neither the standard stereographic projection, nor a general fish-eye,but the IPIX normalized camera. The original image may be placed on a planetangent to the sphere (this is just the choice of the scaling factor).

We have the following relation between the “real” radius r on a flat image tangentto the projection sphere, and the distorted radius z on the equator plane:

r

R=

z/R√1− z2/R2

(2.11)


which can be easily inverted to generate the fish-eye distortion

z

R=

r/R√1 + r2/R2

. (2.12)

The task of transforming the Cartesian coordinates (x, y) to r and some angularvariable is left to the reader. But we show the result of this transformation on Fig.(2.13). Of course you know the original. (The result could be of better quality if wehave followed the rules suggested in the next section. We have not interpolated thepixel colours.)

Fig. 2.13: Babel as seen by a fish

There is one important point in this exercice which shows how the conceptual andgeometric difference between the 3D scene creation and th 2D picture manipulationreflects on some mathematic details. Here we don’t know anything about the sphereradius R, but we know the size of the original, and the dimensions of the new imagewe are manufacturing. If the vertical half-height of the tangent image (centered) isequal to A, and that of the equatorial projection – B, we have

1

R2=

1

A2− 1

B2(2.13)

We suggest very strongly that the local readers repeat these calculations for anyposition of the focal point, not necessarily in the center of the sphere. This is a verygood examination subject.

2.3 How to do Geometry in Discrete Space 17

2.3 How to do Geometry in Discrete Space

This is a very important section. If a discrete set of points (or intervals, but localizedas pixels) is stretched or compressed, one has to define the interpolation procedurewhich is being used. There is no unique algorithm to do this. Moreover the pixelsare usually square (or rectangular) oriented canonically wrt. the x and y axes. If werotate the image the pixels change their positions, but their shape does not rotate.They must occupy canonical positions also, their coordinates must be integer again.So, all kind of very nasty effects is expected: holes, pixel fusion (after rounding),aliasing etc. The loops

for (y=0;y<ymax;y++)

for (x=0;x<xmax;x++)

{newpt=transf(x,y); NA[newpt.x][newpt.y]=A[x][y];}

in general cannot be applied directly. The basic idea which eliminates the topologicpathologies (holes), although does not prevent the aliasing and other troubles re-sulting from the discretisation, is the application of the inverse transform. Theprogram calculates first the image of the transformed contour: if the original imageis a rectangle, and the transformation is simple, for example affine, it is only neces-sary to find the new coordinates of the 4 corners, which may be then connected bystraight lines. In general case we have to find the boundaries of the result space, andthen we will fill this space regularly with pixels. The program goes now as follows:for all (x′, y′) in the transformed zone the program calculates (x, y) – the inversemapping. The result will usually be a pair of reals, and not integers.

In the most primitive case it suffices to round the coordinates and to assignthe pixel value corresponding to the coordinates found. But caveat programmator!Rounding or truncating is a harsh operation, and if the transformation is repeatedseveral times, the distortions will accumulate. Fig. (2.14) shows the result of therotation of a picture by 360 degrees in 10 stages, without any interpolation.

Fig. 2.14: Discrete rotation iterated

Of course this result is not very acceptable, and this manipulation was done onpurpose. A serious (and in particular: multi-stage) transformation must interpolate


the pixel values. The result might be then a little smeared or fuzzy, but in generalis quite good. Fig. (2.15) shows the result of the same manipulation, a full rotationin 10 slices of 36 degrees, but with interpolation.

Fig. 2.15: Discrete rotation with interpolation

The interpolation might be performed in many ways. The simplest may even be lin-ear. The source pixels are treated as points occupying the vertices of a rectangulargrid. If the reverse transformation constructs a point between the vertices, for ex-ample if we obtain a pair (x, y) reduced to the unit square whose corners correspondto original pixels denoted by I00, I10, I01, and I11, we obtain the resulting value bythe bilinear interpolation

Ixy = (1− x)(1− y)I00 + x(1− y)I10 + (1− x)yI01 + xyI11. (2.14)

In practice a bicubic interpolation is much better, and it is not so complicated. (Thisis a good examination subject. . . )

There is a small problem near the image edges: what shall we do if the reversemap gives coordinates outside the pixel grid? Or even so near the edges or thecorners, that a bicubic interpolation is impossible. A slightly different interpolationscheme might then be used, simpler than cubic. First we “collapse” all the pixels totheir corners (point-like vertices) using the nearest-neighbour approximation. Thevertices far from the image edges are just averages of the 4 neighbouring pixels, andthe boundaries are calculated as follows from the Fig.(2.16).

We may use the following equations which are quite intuitive:

i =1

4(A+B + C +D) (2.15)

f + i = A+B etc. (2.16)

(e+ i)/2 = A. (2.17)

The other vertices are computed by symmetry and iteration. Note that the exteriorvertices extrapolate rather than interpolate the pixel colours.

e =1

4(7A−B − C −D) (2.18)


A B

C D

e f g

h i j

k l

Fig. 2.16: Nearest-neighbour pixel interpolation

f =1

4(3A+ 3B − C −D) (2.19)

h =1

4(3A+ 3C −B −D) (2.20)

i =1

4(A+B + C +D) (2.21)

(2.22)

It may be troublesome, and throw us outside the allowed colour space (negativeor too big intensity). Now this matrix is converted using the inverse transformtechnique, and the pixels are reconstructed from the vertices using equations similarto those above. The reader is kindly asked to solve explicitly these equations in thereverse direction. What happens if the transformed image contains L-like concaveboundaries?

Chapter 3

Filtering

3.1 Introduction: Filters and Convolutions

Mathematically the convolution of two functions: f(x) and g(x) is given by theintegral

(f ? g)(x) =

∫f(z)g(x− z) dz (3.1)

which has to be generalized into two dimensions and discretized. We get thus theconvolution formula for two matrices A and B:

(A ? B)ij =∑k

∑l

AklBi−k,j−l, (3.2)

where the indices run through all the intervals where they are defined. Usually k,etc. is greater (or equal, depending on the convention used) than zero, and goes upto the upper limit of the matrix dimension. When it becomes negative, the elementis neglected. This seems trivial, but is not necessarily so: sometimes it is preferableto apply the cyclic convention, where the negative indices “wrap around” the matrix– the image is not considered to be a finite patch of a plane, but a torus. In thisway the problems with boundary conditions may be omitted.

Usually one of the concerned matrices is the image (or three images – one foreach colour plane), and the other is the filter which is usually much smaller thanthe image. The sum (3.2) should thus be reformulated in a way which minimizesthe number of operations.

A useful convention used in this domain is the cyclicity of the filter matrix. Wedon’t need to apply the toroidal boundary condition to the image, but very often thefilter intuitively should be “symmetric about zero”, in order to to generate spuriousanisotropies.

Convolutions, as shown by the equations above are linear operations, and they arenot sufficient to obtain all kind of posible special effects, but they are universalenough. (We will discuss here essentially one kind of non-linear filter – the median,and its variants, but its presentation will be rather superficial). In the section (4)

20

3.2 Typical Filters and their Applications 21

we will discuss some details of the colour representation of images, but the theoryof colours will be treated in a separate set of notes. For us a pixel is a numeric valuewhich can be normalized to the interval [0, 1), or [0, 255) if you wish. If the image iscoloured, we can treat the three component separately. here we will not discuss thecolour mixing or conversions at all. Henceforth we shall either consider gray-levelimages, or treat the three channels identically.

The interpretation of the filtering process can be given on many different con-ceptual and mathematical levels. This course cannot treat all the mathematicaldetails of the signal processing theory, so for the moment the reader may think thatthe filtering produces the value of a given result pixel by a linear combination ofits neighbours. This may smooth the image if we calculate the averages (i.e. if allthe weight factors are positive), or it may enhance local differences if some weightfactors are negative.

A general warning is necessary. A convolution of the image with any matrix, andin particular with a filter possessing negative elements may throw the image out ofthe legal colour space. The filtering application shoud warn the user, and permit tocorrect the result by cropping the illegal values (they will remain either minimal ormaximal), or renormalise the whole image by shifting and/or multiplication of thepixel values by an appropriate constant. There is no unique solution in such a case.Image processing packages ususlly offer to the user a possibility of adding an offsetimmediately during the filtering. We shall discuss this problem under a differentangle in section (5).

3.2 Typical Filters and their Applications

3.2.1 Smoothing filters

Look at the horizontal fragment of the da Vinci “Last Supper” fresco on the Fig.(3.1).This picture is extremely noisy. By convoluting it with a matrix

1

25

1 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 11 1 1 1 1

, (3.3)

where the factor 1/25 is the obvious normalisation factor, produces the transforma-tion shown on Fig. (3.2).The noise has been considerably attenuated. Of course the image is blurred now,but if we don’t need the full resolution, it is better to smooth it before, as the plainresampling usually does not get rid of the noise, as shown on the left of the Fig.(3.3).


Fig. 3.1: Last Supper – fragment

Fig. 3.2: Image blurred with a 5× 5 homogeneous filter

In order to show the relation between the de-noising through smoothing and thedeterioration of contours we have exaggerated the size of the filter mask: 3 × 3would be sufficient. The uniform, square mask is simple and fast, but more powerfulmethods can be applied. In particular, an often used smoothing filter is the gaussianfunction

g(x) =1

Ne−x2

2σ2 . (3.4)

In two dimensions x is replaced by r =√x2 + y2. The Gaussian filter is sometimes

used for special effects, and its range is far from bein local – sometimes several dozensof pixel along both directions are concerned, and usually it is parametrable. it maybe computed directly by a floating numer routine, or as an iterated convolution of

the matrix

(1 11 1

)with itself. For example, the second and the fourth iterations


Fig. 3.3: Reduced images

give

g2 =1

16

1 2 12 4 21 2 1

, g4 =1

256

1 4 6 4 14 16 24 16 46 24 36 24 64 16 24 16 41 4 6 4 1

. (3.5)

Here the variance is proportional to the size of the matrix, but this can be changed,we wanted just to show that a Gaussian-like filter can be obtained without floating-point calculations, but by iterations. of course, instead of generating a big filtermatrix, a small one is iteratively applied.

As mentioned above, in several interesting cases the size of the Gaussian matrixis big. The complexity of the convolution algorithm is proportional to the sur-face of the filter, and filters too large are inefficient. In the case of Gaussians wehave another possibility of simplification. The two dimensional Gaussian exponentexp(−r2/2σ2) = exp(−x2/2σ2) exp(−y2/2σ2). We may thus apply the two factorsseparately, first convoluting the columns in each row separately, and then repeatingthe same operations to all rows. The complexity is reduced from N2, where N is thefilter size, to linear. The one-line or one-column filtering 1×m matrix again does notneed floating point computations, but may be obtained by convoluting [1, 1] withitself, and normalizing. By using asymmetric Gaussian filters, with different hori-zontal and vertical variance it is possible to obtain many interesting effects, whichwill be discussed later.

3.2.2 De-noising by median filtering

Fig. (3.4) shows two more smoothing experiences. The image on the left was ob-tained by the Gaussian blurring followed by a sharpenig filter which will be discussedin the next section. The image on the right is the application of the median process.Instead of averaging the pixel values around the center, the filtering chooses onerepresentative value.Concretely: first a mask – 3×3, 5×5 or other is defined. Within this mask all 9, or25 pixels are identified and sorted with respect to their values. The middle valuereplaces the central pixel of the block. An even size of the block may also be used,


Fig. 3.4: Gaussian filtering (sharpened), and median denoising

although it is less popular. But the existence of one central pixel is not needed. Inthe even case the resulting picture will be shifted “colourfully” by 0.5 pixels.

One should not exaggerate with the size of the median zone, as it introduceshomogeneous colour patches. But it can be used then as an artistic tool. Thereare some recent variants of this scheme: the size of the reprentative block is notfixed, but of varying size: 2, 3, 4, 5, etc., not necessarily centered. For each blockthe (normalised) statistical variance of the picture is calculated, and the block withminimal variance is chosen for the median computations. The author of this tech-nique claims that a good denoising effect is obtained, and the edges become lessfuzzy than in the standard model. Of course, the variance-sensitive techniques areknown for years, for example the linear (but adaptive) Wiener filter.

3.2.3 Gradients

Taking the averages: homogeneous or Gaussian, is equivalent to the integration ofthe picture with a positive weight function. This operation eliminates the fluctua-tions, the high frequencies present in the image. The measure of these fluctuationswhich can help us to localize the edges is the operation inverse – the differentiation.In a discrete structure the gradient can be of course approximated by the differenceof two neighbouring pixels. The gradient is a vector, and we can choose to take itsabsolute value, or a directional quantity. In any case, if the gradient filter is appliedto a homogeneous colour zone, the result is close to zero.

Fig. (3.5) shows three different variants of gradient-like filtering. The left pictureis the result of the filter [−1, 1], where 1 is at the central position, and all theremaining values of the filetr matrix are zero. This is thus a horizontal gradientwhich strenghtens the vertical edges. The result is so weakly visible that we hadto augment the luminosity and the contrast of the image. (Note that this filterproduces plenty of negative values which are cropped to zero.) The side-effect ofthis image enhancing was the introduction of some noise.

The central variant is a diagonal filter

0 0 −10 1 00 0 0

, but with an additive constant

(equal to 128 – the mid-value of the intensity full interval). This is the classicalembossing filter – the simulation of a bas-relief. If this filter is applied to a full


Fig. 3.5: Lena make-up

colour image, the result is not very different (why?). The image at the right is

another directional, gradient-like filter, but more complex:

0 −1 −11 1 −11 1 0

. Note

that the overall weight here is 1 and not zero. No constant is needed to produce theembossing effect. Previously the filtering modulated a flat, gray surface, here theimage itself defines the artificially extruded plane.

The reader should understand this effect. (Actually, extruding the original pic-ture seems artistically worse than embossing the flat surface.)

3.2.4 Laplace and Sobel operators

The discrete version of the second derivative has the following form:

f ′′(x)→ f(x+ h)− 2f(x) + f(x− h)

h2. (3.6)

The basic discrete version of the Laplace filter which computes ∂2f/∂x2 + ∂2f/∂y2

is just L1 =

0 −1 0−1 4 −10 −1 0

. This filter is used to detect the edges, and also for

the sharpening. It is not unique, we may define other Laplacian-like filter matrices,for example

L2 =

−1 −1 −1−1 8 −1−1 −1 −1

, L3 =

1 −2 1−2 4 −21 −2 1

, L4 =

−1 0 −10 4 0−1 0 −1

,

(3.7)or asymmetric variants, vertical or horizontal only. The left variant on Fig. (3.6)has been obtained with L2 which gives contours slightly better than other versions(at least in the case of Lena).


Fig. 3.6: Laplacian and Sobel edge filtering

The right contour is obtained with the combined action of two Sobel filters:

Sh =

1 2 10 0 0−1 −2 −1

, Sv =

1 0 −12 0 −21 0 −1

. (3.8)

Both act independently on the image producing the horizontal and vertical partialcontours: I1 and I2. Now we combine these images with I =

√I2

1 + I22 . The right

picture on Fig. (3.6) was produced by a simplified operation: I = I1 + I2, whichis very slightly less regular, but much less expensive. More about the algebraiccombination of images is presented in section (5).

Of course, if we want to segment the picture and to find the geometric descriptionof the contours, all this work is still ahead, the filtering prepares only the bitmap.Anyway we have barely scratched the surface of this domain; there are more powerfulmethods, for example the application of adaptive gradient filters followed by thesearch of directional maxima.

3.2.5 Sharpening, and Edge Enhancing

Laplacian, Sobel or Prewitt filters, XY-oriented or diagonal, may serve to identifythe edges. How can we just enhance them in order to sharpen the picture? If weadd 1 to the central element of the Laplacian matrices, L1 or L2, etc., we obtainsharpening filters. Such filter, for example−1 −1 −1

−1 9 −1−1 −1 −1

(3.9)

may also be understood a little differently. Imagine that we have smoothed the image

L0with the filter 1/9

1 1 11 1 11 1 1

. We call it L1. A weighted avarage interpolates

3.3 Fourier Transform and Spatial Frequencies 27

between the original and the blurred one: L = αL0 + (1− α)L1. When α decresaesfrom 1 to 0 we move from the original to the blurred version. But what happens ifα > 1 ? This is an extrapolation which increases the distance between the smoothedimage and the result. It gets sharpened, at least it should be sharper than theoriginal. In order to get the matrix (3.9), the value α = 10 is needed.

3.3 Fourier Transform and Spatial Frequencies

Of course we won’t treat here the rich and enormous domain of spatial frequencyanalysis. Our aim is to process the images in order to render them better, and notto analyse them. The review of the discrete Fourier transform will thus be verysuperficial. We shall discuss here essentially two applications:

• Rescaling, and

• Fast convolution (and correlation).

The basic truth to remember is that the Fourier transform converts distances intofrequencies which are, in a sense, dual to distances. When we look at the formula

f(x)→ g(ω) = F [f ](ω) =

∫ +∞

−∞f(x)eiωxdx (3.10)

we see immediately that if g(ω) = F [f(x)], then F [f(ax)] = 1/ag(ω/a). Shrinking(horizontally) the transform, dilates the function. The discrete case is slightly moredifficult to visualise. Here “shrinking” or “dilating” is less meaningful, we have justa number of points. Moreover, if we look at the discrete FT formula:

gk =N−1∑j=0

fje2πijk/N (3.11)

we see that the periodicity (exp(2πi) = 1) implies that for large j, approaching Nwe don’t have “very high frquencies”, but again low, and negative. The highestfrequencies correspond to the index j = N/2.

In the discrete case we don’t have “distances”. The linear length measure is con-ventional. So, if we want to scale a function, to dilate it by a factor p, we need to passfrom N to p×N points. (Obviously, enlarging a picture means augmenting the num-ber of pixels). If our aim is to enlarge the picture without introducing new, spuriousfrequencies corresponding to the sharp edges between the “macro-pixels”, we addjust some zeros to the Fourier transform, we correct the scale (a multiplicative fac-tor), and we invert it. Where shall we add those zeros? Of course in the middle, forexample the vector g0, g1, g2, g3, g4, g5 changes into g0, g1, g2, g30, 0, 0, 0, 0, g3, g4, g5.We have take into account the following properties of the FT:

• g0 corresponds to the frequency zero (the global average), is real, and hasusually the biggest numerical value among all the Fourier coefficients.


• If g corresponds to a FT of a real function, (and the images are usually real), wehave the following symmetry: gN−k = gk. This symmetry must be preservedwhen we add some zeros. In our example the length of the vector was 6, witha solitary g3 in the middle. It had to be duplicated.

The Fig. (3.7) shows the spectral stretching of a vector. The original had 32elements (dashed line), the dilated – 128 (solid blue). Note that this dilating whichdid not introduce new frequencies, smoothed the function (and introduced some smalloscillations, which unfortunately may be visible on images. You may call them (cumgrano salis) diffraction patterns.

0 20 40 60 80 100 120

−2

0

2

4

Fig. 3.7: Spectral stretching of a curve

The Fig. (3.8) shows the splitting of the transformed image before the inversion ofthe transform.

Fig. 3.8: The dilation splits the Fourier transform

The Fig. (3.9) shows the result of such a dilation on the eye of Lena. We discardedthe colours in order to avoid the delicate discussion on the chrominance transforma-tions. The colour diffraction patterns are even more disturbing than the gray ones. . .The left picture presents once more the classical cubic interpolation, and the rightone – the diffractive dilation of the original. Of course, using the median filter orothers it is possible to smooth out the square diffraction patterns, but usually withthe aid of well chosen wavelets it is possible to improve the result substantially. TheFourier transform is a venerable, but not the only spectral tool on the market!


Fig. 3.9: Cubic interpolation vs. spectral stretching

3.3.1 Discrete Cosine Transform

The complex Fourier transform has some advantages over slightly more complicatedcosine or sine transforms, but these, and especially the Discrete Cosine Transform(DCT) is used very frequently. It might be used for the spectral filtering, but itsmain application is the image compression, for example in JPEG documents. Wegive here for the reference the formulæ used. If Fkl is the matrix representing theimage (or its fragment: JPEG uses 8× 8 blocks), its transform is defined by

Gpq = CpCq

M−1∑m=0

N−1∑n=0

Fmn cosπ(2m+ 1)p

2Mcos

π(2n+ 1)q

2N,

0 ≤ p < M

0 ≤ q < N. (3.12)

where

Cp =

{1/√M, p = 0√

2/M, 1 ≤ p < M, Cq =

{1/√N, q = 0√

2/N, 1 ≤ q < N. (3.13)

The formula (3.12) is invertible, and its inverse is given by:

Fmn =M−1∑p=0

N−1∑q=0

CpCqGpq cosπ(2m+ 1)p

2Mcos

π(2n+ 1)q

2N,

0 ≤ m < M

0 ≤ n < N. (3.14)

The usage of this formula for the image compression is discussed in another setof notes. We want only to signal here that usually the DCT is dominated by lowfrequencies, and the other can be eliminated without introducing visible distortions.The DCT of the “eye” (gray) picture gives G11 = 1589, (this is the frequency 0),12 values near the origin are bigger than 100, and the remaining (of 240) are much,much smaller!

Chapter 4

Colour Space Manipulations

4.1 Colour Spaces and Channels

By the name “Colour Space” one usually denotes various linear or non-linear com-binations of the three “basic” colours in conformity with the generally acceptedtrichromic theory. An introduction to the thory of colours is presented elsewhere.If the manipulations of colours are linear and “synchronous” (the same operationapplied separately to each colour plane of the image), we can use the space we want,and specifically the representation RGB, because often the images are stored in thecomputer memory as 3 matrices, one for each plane R, G or B.

These planes will be called channels. A colour image needs at least 3 channelsin order to represent (more or less) faithfully all hues, but we might need more thanthat. In particular:

• We shall use the subtractive CMYK (Cyan – Magenta – Yellow – blacK) space,as this particular set of channels is well adapted to the printing process: themore ink you put on paper, the darker is the result. When preparing a pic-ture to be printed and adjusting some colours, this space is most often used.But also for some interactive colour balance adjustments on the screen theCMYK space is useful, as the factorization of the global luminance (or rather:darkness) makes it easier to tune the picture.

• When some irreversible manipulations take place, for example when the his-togram equalization eliminates some colours in favour of other, more frequent,such operation cannot be done separately for each plane, otherwise the coloursmight get severely distorted. Usually one separates then the luminance andthe chrominance, transforming everything for example into the CIEL*a*b, oreven XYZ space. The luminance channel undergoes the histogram massacre,but the chroma channels are left intact, and then the RGB representation isreconstructed.

• If for some reasons the colours must be quantized, and an optimized palettechosen among all the 224 TrueColours, a judicious choice of channels may be

30

4.2 Transfer Curves, and Histogram Manipulations 31

very helpful.

• When combining the image fragments, superposing or masking them the trans-parency, or α channel is very important. This is not a “visible” colour, butaffects the display of other channels.

A multi-channel image may have several artificial channels, which during thefinal rendering will be flattened out, and disappear, but without them the im-age processing would be a horribly cumbersome task, very clumsy and difficultto learn.

Some examples might be useful here. Looking at the original Lena portrait we shouldremark something strange. Not only the plume of her hat is violet, but her eyesas well. . . In fact, the colour shifting is a popular technique in advertising and inpress, to render the food pictures or the skin of beautiful girls more appetising. TheFig. (4.1) shows conspicuously that the picture of Lena is stained. For – probably– some deep philosophical reasons the Playboy editors found that the overall toneof the image should be rose. Using the CMYK separation, and eliminating someMagenta we “normalize” the result.

Fig. 4.1: “Playboy” models’ life is not always rose. . .

4.2 Transfer Curves, and Histogram Manipula-

tions

The left picture on Fig. (4.2) is not very interesting. One can hardly distinguish thedetails (in fact the standard brightness and contrast of the author’s screen are suchthat the picture is almost completely black). It was produced by Povray from thefile piece3.pov written by Truman Brown without the application of the standardgamma correction. Of course, we could enhance manually the brightness and thecontrast, or modify the transfer curve, but sometimes a more automatic approachwould be useful. We shall come to that.


4.2.1 Transfer Curves

What are transfer curves? They are just transformations in the colour space. Adiagonal line Iout = Iin is the identity. In order to enhance the contrast thecurve should make clear areas clearer, and dark – darker. The overall brightness isenhanced by the vertical lifting of the curve. The two remaining fragments on Fig.(4.2) show a manually tuned curve and the result of the operation.

Fig. 4.2: Adjusting transfer curves

More or less the same result may be obtained automatically through the analysisof the histogram of the image. The full TrueColour histogram is a vector with2563 entries, and is usually not done for obvious reasons, the number of “bins” isenormous, and most of them will be empty anyway. One usually constructs threehistograms, one for each plane, or even just one for the luminance. The result inthe case of the “piece3” example is shown on Fig. (4.3).

0 20 40 60 80

0

5

10

15

20

25

Fig. 4.3: Histogram of the Piece3 image

We see that the histogram is null above the index 80. More than two-thirds of thecolour space is empty for each channel. In such a case it would be sufficient tomultiply every colour by 3 (more or less), and that would render the image brighter.But not enough! The histogram is still biased towards small luminance. Moreover,in cases where dark and bright areas coexist, but they are far from equilibrium, asshown at the left of Fig. (4.6), no histogram stretching is possible.


4.2.2 Practical gamma correction

A very important and typical transformation in the colour (mainly: intensity) spaceis the power-law:

Io = Iγi . (4.1)

In fact such correction is usually introduced already during the acquisition of a realscene by a camera. But if the image is synthesised, created without camera. . .

The well-known RT package Povray until recently produced “raw” images, butthe version 3 permits to add a power-like correction directly during the rendering. Arelatively easy way to brighten the piece3 image has been shown. We know alreadythat the bright segment of the transfer curve (intensity near one) is irrelevant, asthe histogram is concentrated around zero. This dark part can be easily lifted bythe formule (4.1) with γ about 0.33. But the contrast will be too low, the imagewill be grayish, as shown on Fig. (4.4), where γ was chosen equal to 4. (In fact thisis the inverse of what is normally called γ in the image transfer jargon.)

Fig. 4.4: “Piece3” with gamma-correction

Before continuing the subject we have to point out that the manipulation of thetransfer curves might be used to obtain completly delirious artistic effects. We givejust one complex example, which begins here, but which will be modified later.

We begin with a gray, Gaussian random noise image. Some percentage of back-ground, white pixels turned into gray. Then the pixel diffusion was iterated a fewtimes. This operation displaces a pixel, exchanging it with one of its neighbours,randomly chosen. Despite the naıve hope that this shall produce a uniform chaos,what we really see, is the agglomeration of colours: when two identical pixels ap-proach each other, the diffusion tends to keep them near. Chaotic diffusion increasesfluctuations. Such are the laws of physics, but we cannot comment that here.

Then, resulting image is blurred with a Gaussian filter, and an arbitrary, manualmanipulation of the transfer curves for each component separately finishes the task.


Fig. 4.5: Random colour distribution

4.2.3 Histogram Equalization

However, we can perform then an irreversible (in general) operation called his-togram equalization: The pixels will change colours in such a way that the coloursalmost non-existent on the image (with very low histogram value) might disappear,but the colours strongly populated will be alloted more “Lebensraum”. The Fig.(4.6) shows one possible result of the equalisation operation.

Fig. 4.6: Euro-Disneyland, the truth, and the treachery. . .

This manipulation is not unique, at least three popular variants exist, and theymight be accompanied by some contrast and brightness, or gamma correction. Theidea goes as follows: first the histogram is constructed, and the avarage colour (orbrightness) computed. Then, for each colour within the histogram add its populationto an accumulator. If the result is bigger than the average, subtract this averageand allot one new colour bin to this entry. If after the subtraction the result is stillbig, because this colour was very popular, subtract the average again, and add anew bin, and repeat this operation until exhaustion. The accumulator may havesome residual value different from zero. To this value the next histogram column isadded, and the iteration repeated. If some colour is so rare, that adding its histogramentry to the accumulator does not make it pass the threshold, this histogram entry


is eliminated. At the end of each iterations we have a range of new colours whichcan be attributed to one on the original image.

A transformation “one-to-many” cannot be reversible in general. There are, asmentioned above, three popular strategies of choice:

1. Choose the center of the alloted range of colours. In the new histogram thepopular colours will be more widely spaced. This is the simplest possibility.

2. Choose randomly, independently for each pixel a new colour among all eligible(within the calculated range). This introduces a noise into the image, but itmight be less disturbing than a severily quantized colour space.

3. Choose – if possible – the colour of the pixel’s neighbours (e.g. the median, orthe average). This avoids the noise without impoverishing the colour space, butit blurs the image. This variant is used rarely, as it is is quite time-consuming.

We remind once more that the equalization might be done for each channel sep-arately, or globally for the luminance without touching the chroma channels, andthen reconstructed. The first variant usually decreases the colour saturation. (Ap-parently this is the method chosen by PaintShop. Photoshop produces equalizedimages much more colourful.)

We present now a complete Matlab program which does the histogram equal-ization (one channel), and we show some results. On Fig. (4.6) the equalizationhad been done with Photoshop. PaintShop produces something similar, but almostcompletely gray, the colours have been too well equilibrated. . .

a=imread(’picture.bmp’); %a: matrix NxM.

[n,m]=size(a); %dimensions

nw=zeros(n,m); %New matrix

his=zeros(256); % The histogram vector, initialized.

for iy=1:n,

for ix=1:m, fc=a(iy,ix)+1; his(fc)=his(fc)+1; end;

end;

avg=sum(his)/256; % The average.

lft=zeros(1,256); rgt=zeros(1,256);

nh=zeros(1,256); % New hist

rule=input(’Which rule? (1 or 2) ’);

rr=1; accum=0;

for z=1:256,

lft(z)=rr; accum=accum+his(z);

while accum>=avg,

4.3 Transparence Channel 36

accum=accum-avg; rr=rr+1;

end;

rgt(z)=rr; % Number of alloted columns is ready

if rule==1, nh(z)=floor((rr+lft(z))/2)-1; % The new value

else nh(z)=rgt(z)-lft(z); end; %The interval.

end;

% New image reconstruction

for iy=1:n,

for ix=1:m,

z=a(iy,ix)+1; % Sorry, no zero index in Matlab

if lft(z)==rgt(z), nw(iy,ix)=lft(z)-1;

else

if rule==1, nw(iy,ix)=nh(z);

else nw(iy,ix)=lft(z)-1+rand*nh(z); %The new value here.

end;

end;

end;

The second rule may be modified, instead of using a new random value for thehistogram bin where each pixel is tossed, the random generator adapts itself, givingmore chances to the poorer. We don’t give the solution for a possible third rulewhich assigns the new colour depending on the pixel environment. This is slow, anddelicate: when a new colour is assigned in a xy loop, we don’t know yet the coloursof all the neighbours of the modified pixel. So, anyway a mixed strategy seemseasier to adopt: the rule 2 is used, and then the image is despeckled by averagingor median smoothing.

4.3 Transparence Channel

The administration of the transparent areas on the image will be discussed in thenext section. We note only here that there are several possibilities to deal with“invisible” zones of a picture.

• For indexed images one specific palette index is treated as the “transparent”colour. This is often used with GIFs.

• A full α channel is an integral part of the image. In such a way it is possibleto specify the degree of opacity.

• If more than one transparency channel is used, for example if a specific opac-ity channel is attched to every other “visible” colour plane, the transparencebecomes colourful, which can be used for many special effects, artistic, ortechnical, such as selective visualisation of very complex volumetric data.

4.3 Transparence Channel 37

The full power of the transparence channels shows itself when image are composedand superposed. In the simplest case the displaying engine should only lookup thetransparency value of a pixel and to display it or not. More precisely: it shoulddisplay either the pixel or the background, explicit, or by default. Of the full αchannel is present, it is necessary to perform the interpolation, the displayed colouris equal to I = αI + (1− α)B, where I is the image, and B – the background. Wesee that α here is the opacity rather than the transparence.

Some delicate questions may be posed concerning the influence of filtering on thetransparence channels. The subtraction of two transparence values is rather difficultto understand, and unless you know what you are doing, it is better to stay faraway from that. On the other hand, a blending between a “normal” colour and thetransparence channel is extremely important – this is a standard way to introducesoft shadows to images.

Chapter 5

Image Algebra

5.1 Addition, Subtraction and Multiplication

Apparently there is nothing really fascinating here. If we manipulate images asnumeric matrices, we can add them, multiply by constants or element-wise, biasthe pixel by some additive constants, etc. There is just a few intuitive rules tomaster. The reader knows already almost everything if he learned well the filteringoperations

1. Multiplication by a constant c < 1 darkens the image, and c > 1 lightens it.Such multiplication is the simplest case of histogram stretching.

2. Averaging two images interpolates between them. Usually adding is followedby the division of resulting values by 2, otherwise the result which alwaysenhances the intensity of the concerned channel may be illegal. By varyingin a loop the parameter α ∈ [0, 1] used to combine additively two images:I = (1−α)I1 +αI2, we obtain a blending (fading-off) between the two sourceimages, often used in animation, or real movies.

If we dont need just one interpolation, but a whole sequence (for example inmorphing discussed in the section (7.2)), or other kind of animation, the linear(in time) blending might be too sharp, and often a different interpolator: asigmoidal function: I = (1−s(α))I1 +s(α)I2, where s(α) = 3α2−2α3, is used.(Here α is the interpolation time, between 0 and 1. This function (which is oneof the Hermite basic splines) maps the unit interval into itself, and the “speed”of mapping varies smoothly at the beginning and the end of the process.

Don’t forget that geometric manipulation of images need more elaborate inter-polation between pixels. Several image processing packages, as PhotoShop of-fer a smoother interpolator: the bicubic, two-dimensional Catmull-Rom spline.This is not discussed here.

3. The subtraction in general may produce negative numbers, and the arithmeticpackage should crop them to zero. Such effects as embossing, etc. need the

38

5.1 Addition, Subtraction and Multiplication 39

addition of an additive constant to the result of a subtraction. Of course thisaddition is performed before the eventual cropping.

4. The possibility to multiply the images means that the colour space is nor-malised to [0, 1]3, all three planes contain the percentage of the “full” colour.Some packages, e.g. PaintShop apparently do it wrongly, and the multiplica-tion of two dark images might produce something bright. Of course, if 0 isblack and 1 is white, the multiplication can only darken the image, and thisis the way of adding ex post shadows to an image.

How to lighten an image fragment? Simple: take the negative of the imageand the negative of the “anti-shadow”. Multiply them, which will darken theresult, and invert it back again.

Of course the division of images is an ill-defined operation, and should be avoided,unless you have some private crazy ideas about its meaning.

(The reader who follows a more detailed course on image analysis knows nev-ertheless that a division is used to sharpen the images. If the image is smoothedby its convolution with a Gaussian-like filter, in the Fourier space the transformedimage is the product of the transforms of the image and the filter. So, if we have ablurred image we can – looking at the supposed edge smearing – extract the widthof the blurring filter which would produce the same effect, prepare artificially thetrasform of this filter, and divide the image transform by it. The inverse transformof the result should sharpen the image. This technique is often used in astronomy.Of course it might introduce, and usually does, a substantial local noise, due to thefact that big frequencies have been enhanced. The Fourier transform of a Gaussianis a Gaussian, and dividing by it raises considerably the values of the “frequencypixels” far from the origin.)

5.1.1 Some Simple Examples

Let us construct an artificial shadow shown on Fig (5.1). There is absolutelynothing fascinating here, this picture has been made in one minute. The shadow isconstructed from the distorted letter, multiplied by the background image.

A very simple image package will require from the user the correct choice of thefilling gray before applying the multiplication, otherwise the shadow may becometoo dense or too clear. But a more intelligent and more interactive approach ispossible also – don’t forget the α channel. Even if it is not available explicitly (as insome contexts in Photoshop, where apparently the transparence manipulations havebeen designed by a committee of 1439 ambitious experts. . . ), it is usually possibleto declare globally the transparence of the blending colours.

We have mentioned that in order to “anti-shadow” a fragment of the image weshadow its negative and we invert le result. But this is not always what we want.If a bump-mapping texture produces a strong light reflex on the surface, this effect

5.1 Addition, Subtraction and Multiplication 40

Fig. 5.1: Simple shadow

should be added to the normal, “diffuse” colour. More about that will be found inthe section (6.1) devoted to the bump mapping.

Another example is a variant of the solarization effect. A solarized picture is in facta badly exposed and/or badly processed photograph. The dark and middle zonesremain without changing, but parts which on the original are too light are “burnedout” and become dark. Fig. (5.2) shows two variants of this effect. At the left wesee the original Photoshop solarization, and at the center – our modified variant,which additionnaly enhances the contrast.

Fig. 5.2: Solarization

The middle picture is particularly easy to obtain: it suffices to compute the differencebetween the image and its negative, and to invert the result. In fact, for light areaswe get 1 − (I − (1 − I)) = 2(1 − I), an enhanced negative. For the light parts wehave 1− (1−I−I) = 2I. The “original” solarization may be obtained with a trivialfilter – divide the image by two.

Obviously, the solarization applied to colour picture gives usually useless anddisgusting effects (Who wants a deadly green Lena?) But the particular intensity

5.2 Working with Layers 41

shift may introduce some almost specular, silky reflexes, whose colour may then becorrected by the “hard light” composition rule discussed in the next section. Thisgives us the right picture on Fig. (5.2).

5.2 Working with Layers

Layers are conceptually simpler than channels, but technically they may be quitecomples. They can be thought of, as superposed transparencies. When we haveto compile a rather complicated algebraic superposition of various images, it ispreferable to put all the parts on separate layers, eventually to duplicate and masksome of them, and when the image is ready, we can collapse all the layers into one.Conceptually a multilayer image is just a M × N × D “tensor”, where D is thenumber of layers. Of course, it is possible to do everything using separate images

In popular packages like Photoshop they are integrated into the interface. Theadvantage of such protocol is thet the layers are ordered and we see immediatelywhich one may cover the others. The layer interface provides automatically somenumber of operators, for example a two-layer image may be composed declarativelyas a “normal” superposition, where the upper image dominates everywhere whereit is not transparent, or a multiplication, difference, etc.

Perhaps it might be useful to comment various layer combination modes chosesbe the Photoshop creators according to user wishes. We know that the normal modecovers the underlying layers. Here are some of the remaining modes. We don notplan to teach the reader how to use Photoshop, but to suggest which facilities shouldbe present if one day he tries to construct his own image processing superpackage,which will replace Photoshop, Gimp, and all the others. You will see that many ofthese modes are simple combinations of more primitive operations.

There is a difference between global layer composition mode, and the drawingtool modes.

Suppose that the upper layer pixel has colour c0, and the layer beneath – c1.

• Dissolve. This mode constructs the pixel colour c by a random choice betweenc0 and c1, depending on the opacity of c0. If the upper layer is opaque, alwaysc0 is chosen. This might be used to simulate a spray (airbrush) painting.

• Behind. Used for painting. If a layer is thought of, as an acetate sheet, thenormal painting (brush, etc.), replaces the existing colour. The “behind” modeis equivalent to painting at the back of this sheet. Of course it has some senseif the sheet contains some transparent or semi-transparent areas. In such aways one layer can be used twice.

• Clear. It is just the eraser, but attached to the line (stroking) tools, or fillingcommands (path fill and paintbucket). It renders the touched areas transpar-ent, and of course is used when the manual erasing would be too cumbersome.


• Multiply. Multiplies the pixel contents considered as fractions between 0 and1, channel by channel. The result is always darker, unless one of the concernedpixels is white. This may be a global option.

• Screen mode. This is the “anti-shadowing” effect. For each channel separatelythe inverses c1 and c0 are multiplied, and the inverse of the result is taken.The final effect is always lighter. Screening with white gives white, and withblack, leaves the colour unchanged.

• Overlay mode. This is a little complicated, and may either multiply or screenthe pixels depending on the base colour. If the base (c1) and the blending(upper, c0) colours are random, the result is difficult to understand. Theidea is to preserve the shadow and the lights of the base colour: where it isstrong, it remains, otherwise the blending colour “wins”. The result is af welooked at the base image through a coloured glass, but “active”, i. e. a whiteblending colour may lighten the image, give some milky appearance to it, whileblack does not destroy tha image, but darkens it, and the darkening is morepronounced where the areas are already dark.

Yes, obviously an example is needed. . . Fig. (5.3) shows the effect of overlay-ing.

Fig. 5.3: Overlay mode

Exercice. Try to deduce the underlying colour algebra for this operation.

• Soft Light mode. Now the blending colour may darken or lighten the baseimage. Imageine that c0 represents a diffused spotlight. If it is light, lighterthan 0.5, then the image is lightened, otherwise it is darkened. (An “anti-light”effect). If the blending colour is white or black, the result is pronounced, butnever turns int white or black.

• Hard Light mode. Here the effect is stronger, the image is lightened or dark-ened, depending on the blending colours. If c0 is white, the result is screened,if it is dark, the result is multiplied, so it is possible to get very light (white)highlights, or very deep shadows.


• Darken. This is simple. The darker colour is chosen. There is also the“Lighten” option.

• Difference. The result is the absolute value of the subtraction of two colours,the result is never negative.

• Hue mode. The hue (spectral location) of the blending colour replaces thebase hue, but the saturation and the luminance remain. Imagine that thebase image has been converted into gray, and then artificially coloured withthe blending pixels. The Color mode replaces not only the hue, but alsothe saturation, the effect of artificial tainting is more prononced then in theprevious mode. The Luminosity mode is the inverse of Colour – the hue andsaturation remain, the luminance is taken from the blending image.

There is finally the saturation mode which takes the saturation only fromthe blend layer. This mode may be used for the selective (through paint-ing) elimination of some too saturated colours, for example for simulating theatmosphering colour attenuation with the distance.

5.2.1 Construction and Usage of the Transparence Channel

All these, algebraic and layer operations are mathematically trivial, and belong tothe practical folklore of the image processing art. We have mentioned that thework with the transparence channel may be delicate. If it is not accessible directly,several possibilities to simulate it exist, provided the selection tools and the imagearithmetics is sufficiently complete. In particular, knowing that the transparenceper se cannot produce visible results, superposing a partially transparent image I1

with the opacity α < 1 over the base image I0, means that the result is computedas I = αI1 + (1−α)I0 (unless some non-standard arithmetic: “screen”, “soft light”,etc. mode is used. We shall not discuss this here).

Some image synthesis packages, as 3DMax generate automatically the alphachannel: everything which belongs to the image is white, and the background isblack (if we choose to “see” the opacity in such a way). This may be very importantwhen the image is subsequently used for animation – the system automatically filtersout the transparent zones when composing the final picture.

If you have to do it by hand, and if you wish to obtain an effect shown on Fig.(5.4), you must

• Pick out the fragment of the original image (at the left) which will be erased,and construct a mask M .

• M – say – black on white, is multiplied by the image. The face is erased.

• The same mask, but inverted, multiplies the replacement (unfortunately itsoriginal disappeared somewhere. . . )


Fig. 5.4: Composition of two images

• The two components are added.

Chapter 6

Some Special 3D Effects

6.1 2D Bump mapping

The technique of bump-mapping is often used in the synthesis of 3D scenes, where itsimulates spatial textures: by deforming the normals to the surface of the renderedobject, it modifies the illumination conditions, and simulates bumps, rugosity, holes,etc.

The aim of the bump mapping in the domain of image processing might bedifferent. Of course, we can produce some 3D effects which simulate the extrusion ofa flat contour, for example a text, but more often this technique is used to add sometexture to the surface of the image: to simulate a canvas, a piece of hairy tapestry,etc. The 2D image is already rendered, and adding some additional lighting effectshould not deteriorate the colour contents of the picture, so the bump mappingshould not be overdone. The shadows should not be too profound, and the specularspots rather narrow. And, in general, the size of bumps should not be too big either.Fig. (6.1) shows the result of the application of a bump map.

Fig. 6.1: Bump mapping

This effect is produced by the simulation of a directed lighting on a two-dimensionalimage, and in general may be quite involved, with many parameters. The basic ideagoes as follows. Imagine that you have a (fake) extrusion given by a gray (intensity)image, whose one-dimensional section is shown on Fig. (6.2). The light beam isrepresented by the red dashed lines. The bump map has the same size as the workimage. Those regions which correspond to the angle 90◦ between the normal to the

45

6.1 2D Bump mapping 46

bump profile and the light direction will be enhanced. Any specular model maybe used, for example the Phong formula Is = k cos(θ)n where n may vary between,say, 3 and 200. But beware: in modeling glossy effects on a 3D scene the specularcontribution is always positive. Here the shadow is active and may darken someparts of the image. We have to choose a priori the “neutral” direction (usually“horizontal”) where the lighting effect vanishes.

Fig. 6.2: Bump mapping geometry

Our mathematics is conditioned by the fact that we don’t have the profile geometryexplicitly given. The bump map is just an image, where the black areas representthe extrusion, and white is “flat” (or vice-versa). These are the main computationsinvolved:

• Deduce the fake normal vector to the bump image.

• Compute the scalar product of this normal and the direction of light.

• Enhance the result using the Phong (or other) formula, and split the dark andlight areas in order to superpose them on the work image.

• Complete this last algebraic manipulations.

The blue (mostly negative, slightly truncated) profile on the Fig. (6.2) is the cos(θ)where θ is the angle between the normal and the light direction vector, normalizeso as to “neutralize” the horizontal parts of the image.

The normal is a vector orthogonal to the gradient, and the gradient is just anexercice in filtering. It should be done carefully in order not to truncate too early thenegative values. We can obtain separately the x and y components of the gradient, ordirectly, by a judiciously chosen offset, the gradient whose xy projection is collinearwith the light beam direction. The normal to the image has the same property. We

6.2 Displacement Maps 47

obtain a standard directional “embossing” effect, which is then separated into lightand dark contribution by the (signed, and truncated) subtraction of the neutralgray. The contrasts of the shadows and reflexes should be enhanced, and the rest isalmost trivial.

6.2 Displacement Maps

This is an extremely popular and powerful deformation technique. In general, ge-ometric (“horizontal”, not just in the colour space) deformations, such as twirling,bumping of a part of the image etc. are not presented as mathematical, analyticallygiven transformations, but their sources are shapes themselves.

The displacement map is an image. It may be of the same size as the deformedsubject, or any other, in which case we will use scaling. Call Ixy a point on theoriginal image, and Dxy the colour value corresponding to the reduced point of themap. The reduction takes place if the sizes of I and D are different. Then, the point(x, y) of the image corresponds to

x =WI

WD

x, (6.1)

y =HI

HD

y (6.2)

on the map. The inverse transformation, which is trivial, will also be needed. Notethat in general these transformations are real, not integer. The value ofD determinesthe displacement vector, which in general has two components, so the displacementmap should be at least a two-channel picture. (Photoshop uses the Red and Greenchannels, Blue is neutral). Since the value of the pixel is somewhat conventional(the interval [0, 255] is meaningless), one usually adopts an additional convention:the maximum displacement length (in pixels) s is established independently ofthe map D. The minimum colour (0) of D corresponds to the maximum shift in onedirection, say, to the left, and the maximum value – to the right. The “gray” value(say, g) is neutral. Of course a more general algorithm can introduce a special offset,but it is a complication which can be resolved separately. Now, the deformation goesas follows.

For all the pixels (x, y) of the new, deformed image the point (x, y) on D iscomputed. Suppose that the map is by convention normalised so that the maximumcolour value is equal to 1 (then g = 1/2). The vector generated by the two usedchannels of the map image is Dx, Dy). From this value the displacement vector iscalculated:

(dx, dy) = (s · (Dx − g), s · (Dy − g)) . (6.3)

(Of course, it is possible to use different horizontal and vertical scales and offsets).Now, the new pixel at (x, y) is not Ixy, but another value Ix′y′ , where

(x′, y′) = (x+ dx, y + dy), (6.4)


interpolated appropriately, since (x′, y′) need not be integer, and the neighbouringpixels are looked up.

As an example we show how to produce an enlarging lens: a circular displacementmap which moves to the left all pixels which are to the left of the center, and to theright the right pixels. The same holds with respect to the vertical coordinate. Fig.(6.3) shows the two displacement channels and the result.

Fig. 6.3: Simulating lenses by displacement map

Here the gradients were linear and the displacement zones were homogeneouseither vertically or horizontally. Much more elaborate effects can be obtain by usingblurred d-maps, and by masking the deformed zones, confining them to some areas.Th Fig. (6.4) shows a little different distortion of the original, which simulates ahemi-spherical transparent “water drop” on the image. Fig. (6.5) suggests how toderive the displacement map image from the desired geometry, but we suggest thatthe reader performs all the computations himself.

Fig. 6.4: Displacement map simulating a transparent hemisphere

6.2.1 Another example: turbulence

Many packages offer the “twirl” effect which simulates a whirlpool. Suppose that wewish to simulate more or less realistically a physical whirl vortex, for example a big


Fig. 6.5: The “water drop”

cyclone needed for an atmospheric texture. We suppose that it is a 2-dimensional(cylindrically symmetric) problem. Fig (6.6) shows some geometric relations whichmust be obeyed according to some conservation laws. Suppose that the “matter”moves counterclockwise, and is sucked into a central region, where the laws aredifferent.

r0

r1

dr

Fig. 6.6: The geometry of a tornado

The thin ring of radius r and width dr occupies the area 2πr dr. If we assume thatthe “matter” inside is not compressible (or that its pressure does not change sig-nificantly), the surface conservation law determines its radial speed. The constancyof the area means that dr = const/r, which, when integrated over time gives thefunctional dependence r(t) = r0

√1− t/t0. The constant t0 is chosen so that at

that time the vortex element falls into the center. This time depends on the initialconditions and on the “force” of the vortex. In order to construct an incrementaldisplacement map we will not need this integrated formula, but it might be usefulto know it.

This result is independent of the angular motion. Here the angular momen-tum conservation determines the rotational speed. For a thin ring its angular mo-mentum is given by M = 2πr dr · ωr, which means that r2ω is constant. Wemay thus postulate that in a short (and constant) interval of time we have thedisplacements:∆r = c/r, and ∆φ = h/r2 with some appropriate constants. These


equations need to be translated into the Cartesian system, an exercice which weleave for the reader. Fig. (6.7) shows the map and the result of its iterated (about15 times) application to some ordinary fractal clouds.

Fig. 6.7: Tornado

We have eliminated the singularity at the center by introducing a small neutral zone,but many other possibilities are possible, for example replacing 1/r by r/(r2 + ε).It should be obvious for everybody that this technique, or a similar one is stillnot a good way to produce realistic cyclones. The clouds here are smeared, whileany satellite photo shows that at the scale of the turbulence the cloud edges arequite sharp. Actually, the clouds are created within the turbulence, and their localproperties are not determined (or only partly) by the spiraling motion.

A good cloud model, which takes into account the pression of the air and thecondensation of the steam due to the adiabatic decompression, and which somehowincorporated the third dimension into play, simply does not exist. Any takers?

Chapter 7

Other Deformation Techniques

7.1 Warping

Warping became an over-used term. It may be a general deformation, but for us itwill mean an interactively produced image distortion, which respects some continuityconstraints – as if we put the image on a rubber sheet, and then the ruber wasdeformed with all the pixels following (and, of course with some interpolation). Thewarping is the subject of a thick book, and we cannot discuss here all the possibletechniques, nor applications.

It is not possible to teach how to practically obtain particularly disgusting de-formations of our favourite political personages either.

Thus, we sketch only the usage of the elastic deformation theory, and we suggesthow a sufficiently general warping package might be structured. We stress thatthe elasticity is just one among many possible models, and its only purpose is tointroduce some regularity into the deformation process, to restrain the madness ofthe creator. Imagine – for simplicity – a one-dimensional elastic line shown on Fig.(7.1). Now, we should not confound the coordinate position x on this axis, andthe dynamic position which we call p(x), or the dynamic displacement of a point,which we call d(x). This is a trivial observation for a physicist, but for computerscience students the fact that the displacements belongs to “the same space” of x isconfusing. In the initially static configuration p(x) = x, or d(x) = 0.

x0

x′0,

x1 x2

x′1 x′2

Fig. 7.1: One-dimensional elasticity

The original configuration is the lower line. The rubber element at the point x0 hasbeen displaced, and finds itself at x′0. Thus, p(x0) = x′0. Its neighbours follow, buttheir displacement is not parallel to the central point, because the element at the

51

7.1 Warping 52

left is attracted by the left neighbours, and the element at the right is less attracted(or more repelled) by its right neighbours. The neighbourhoods are infinitesimal,we may consider x1 = x0− dx;x2 = x0 + dx, but again – do not confound this, withthe – also infinitesimal – displacement d(x0) = x′0−x0. We remind that the originalconfiguration is in equilibrium, the elastic forces cancel themselves.

Now we displace the element at x = x0. What force acts upon it now? Webuild-up the following, easy to grasp equation:

F (p(x)) = k (d(x+ dx)− d(x))− k (d(x)− d(x− dx))

= k (p(x+ dx)− 2p(x) + p(x− dx)) , (7.1)

which takes into account that the net force is the result of the incomplete cancella-tion, and that the difference between the displacements of two neighbours is equalto the difference between the shifted positions, the absolute contributions cancel.

We see that the development of p(x+dx) into the Taylor series about x must goup to the second derivative, as the first derivatives cancel. Knowing that the forceis proportional to the acceleration d2p/dt2, we have

∂2

∂t2p(x, t) = C

∂2

∂x2p(x, t), (7.2)

i. e. the one-dimensional wave equation, as expected. In the multidimensionalcase the spatial derivative should be replaced by the Laplacian. In the equilibrium∆p = 0. When we pass from the (2-dimensional) continuum to a discrete gridindexed by pairs ij : xij + dx = xi,j+1 etc., the Laplace equation takes the form

4pi,j − pi−1,j − pi+1,j − pi,j−1 − pi,j+1 = 0. (7.3)

This means that the equilibrium position of each node is the symmetric barycenterof its four neighbours. We might – perhaps this digression will be useful for somereaders – derive this formula directly within the discrete formulation. Imagine adiscrete grid whose vertices are connected with elastic springs. The potential energyof the system is equal to the sum of the energies of the corresponding oscillators:

U =k

2

∑jl

(pj − pl)2, (7.4)

where j and l are two-dimensional, double indices, locating the vertix in the grid.The sum goes over all pairs of neighbouring indices, i. e. over all springs. Theequilibrium is obtained when the energy reaches its minimum, as a function of thepositions {p}. The derivative over pk gives

∂U

∂pk= k

∑i

(pk − pi) = 0 for all k, (7.5)

which agains shows that pk is the arithmetic average of the positions of its neigh-bours. Now, in order to solve numerically such set of equations, we must fix some of

7.1 Warping 53

the vertices, introduce some boundary conditions. Fig. (7.2) shows what happenswhen we displace one point and fix its position. The left drawing shows the initialequilibrium, which is trivial, but there is no cheating: we have fixed the border ofthe grid, and a MetaPost program found the internal vertices.

Fig. 7.2: Elastic displacement

We see that the elastic adjustment of the neighbourhood does not necessarily pre-vents the mesh cross-over. In this example the mesh boundaries are too close. Fig.(7.3) shows a less drastic mesh folding.

Fig. 7.3: Grid size importance

If huge, but very localized warping is needed, it is better to do it in several stages, orto use a different elastic model, otherwise a very dense grid would be needed. Thesolution of the discretized Laplace equation is easy: the assignment (7.3) is iterateduntil convergence. If the grid is large, this process might be slow.

In general, which facilities should provide a warping package?

1. Of course it should ensure a correct interfacing for the image import, and theimage and animation export. More about that in the morphing section below.The grid should be adaptative.

2. It does not need to be visible. It is perfectly possible to give the user justthe possibility to fix some point or lines (segments, polygons, splines), and thesystem may choose the grid in function of the geometric details.

7.2 Morphing 54

3. The warping may be defined by dragging points, lines, or whole areas. Theinternal boundary conditions may be established, and the system should under-stand that if the warping zone is constrained within a closed area, its exteriordoes not participate in the iterative solution of the Laplace equation.

7.2 Morphing

Morphing is a combination of (generalized) warping and colour interpolation, whichdeforms one image in such a way that at the end it becomes another one. TheInternet is full of cliches with the deformation of human faces one into another (e.g. the Michael Jackson’s “Black and White” clip), or the transformation of humanfaces into a beast (wolf, tiger, etc.). Thus, we shall not show a morphing examplehere.

Usually the warping phase is liberal, and there is no “elasticity” involved, espe-cially if the source and target images are so different that it is difficult to find somecommon topologic patterns. The user splits the source image in simple polygons,usually triangles, although any contours can be used, provided that the morphingsoftware can perform well the nonlinear transformations between arbitrary closedcontours.

On the target image the same set of polygons is automatically re-created, and theuser deforms them manually, moving the vertices. Most of the popular morphingprograms are too tolerant: the user may produce topological singularities forcingsome triangles to overlap, or leaving holes. Even the construction of the coveringtriangles may be clumsy, and some packages help the user by an automatic trian-gulation, for example using the Delaunay algorithm, which produces “equilibrated”triangles, not too elongate, if possible. The user chooses only some number of con-trol points. For example, when morphing faces it is natural to localise the eyes themouth corners, and the face contour.

The affine transformation between triangles has already been discussed, the onlymodification here is that this transformation, and the corresponding texture trans-mutation are multi-stage processes. If a value: the point position or the pixelcolour passes from v0 into v1 in N stages, with v(0) = v0 and v(N) = vN , thenv(i) = 1

N((N − i)v0 + ivn) if we choose to apply the linear interpolation. Often a

more general approach is better for artistic reasons. Instead of using the linear inter-polation: (1−x, x) (where x symbolically denotes i/N), the Hermite cubic function3x2 − 2x3 is used. In the Michael Jackson’s clip a more sophisticated approach hasbeen adopted: some parts of the image converge faster to the final form than others.When changing a human face into a beast (or into a well known politician), theartist might ask himself several questions, for example:

• Perhaps it would be better to go with the warping all the way through beforeinterpolating the colours: first the distortion of the shape, and then put somehairy texture upon it. Or, inversely, first put the hair, scales, or spots on the

7.2 Morphing 55

skin of the original human, and only then transform it into a lion, a fish, orsomebody whose name we know all, but we won’t mention it here.

• The speed of the transmutation need not only be non-linear, but it might beasymmetric. Shall it start slowly, and speed-up, or decelerate?

The abundant amateurish morphing packages on the Internet continue to develop.If you want one day to make your own, please consider the following advice:

1. The interface should be professional, and this does not mean that the numberof menus and buttons should be greater than 10. . . The package should beable to import at least 3–5 standard image formats, and generate at least twodifferent ones. It should also be able to save on the disk the working context,i. e. the point meshes and/or the triangles.

2. Don’t forget that a graphical interface without the “undo” command will soonor later, but rather soon, end in the waste basket.

3. Use a professional colour interpolation scheme. Aliasing in morphs is rarelyacceptable.

4. Generate separate intermediate images, but learn and generate also some com-pound images, for example the animated GIFs. This is not difficult. You mightgenerate also some scripts which will drive the MPEG encoder launched au-tomatically by your package.

5. It is very frustrating when one cannot morph two images which differ (evenslightly) in size. Offer the user the possibility to crop one of the images, orto resize it, or the other one. Preferably both, with the resize parametersconstrained to preserve the proportions.

6. A dense mesh of control points and lines is a mess, badly legible on colourimages. Use colour for the control entities, but permit the user to mask thecolour (or to attenuate it) of the images on the display. Gray images aresignificantly clear, and do not risk to render invisible red control dots.

7. Plug-in the Delaunay or other automatic triangularisation algorithm.

8. Cry loud when a topological incongruity is generated: overlapping triangles,or holes. Think about offering – as an option – a limited warping schema, forexample the elastic model.

9. It will be nice to parametrise the transition speed, to introduce some nonlin-earities, or even to choose different warping/interpolation speed in differentparts of the image, as suggested above.

10. Add some really original examples. This remark is very silly, of course. Butapparently the morphing package creators pretend to ignore it. . .

Documents

Introduction to Image Processing - Home Page of Jerzy … · · 1998-02-11Introduction to Image Processing DESS + Maˆıtrise d’Informatique, ... The image processing domain has