DepartmentofElectricalEngineering - DiVA portal

Institutionen för systemteknikDepartment of Electrical Engineering

Examensarbete

Image interpolation in firmware for 3D display

Examensarbete utfört i Elektroniksystemvid Tekniska högskolan i Linköping

av

Martin Wahlstedt

LiTH-ISY-EX--07/4032--SE

Linköping 2007

Department of Electrical Engineering Linköpings tekniska högskolaLinköpings universitet Linköpings universitetSE-581 83 Linköping, Sweden 581 83 Linköping

Image interpolation in firmware for 3D display

Examensarbete utfört i Elektroniksystemvid Tekniska högskolan i Linköping

av

Martin Wahlstedt

LiTH-ISY-EX--07/4032--SE

Handledare: Thomas EricsonSetred AB

Examinator: Kent Palmkvistisy, Linköpings universitet

Linköping, 9 November, 2007

Avdelning, InstitutionDivision, Department

Division of Electronics SystemsDepartment of Electrical EngineeringLinköpings universitetSE-581 83 Linköping, Sweden

DatumDate

2007-11-09

SpråkLanguage

¤ Svenska/Swedish¤ Engelska/English

¤

£

RapporttypReport category

¤ Licentiatavhandling¤ Examensarbete¤ C-uppsats¤ D-uppsats¤ Övrig rapport¤

£

URL för elektronisk versionhttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10326

ISBN—

ISRNLiTH-ISY-EX--07/4032--SE

Serietitel och serienummerTitle of series, numbering

ISSN—

TitelTitle

Bildinterpolation i programmerbar hårdvara för 3D-visningImage interpolation in firmware for 3D display

FörfattareAuthor

Martin Wahlstedt

SammanfattningAbstract

This thesis investigates possibilities to perform image interpolation on an FPGAinstead of on a graphics card. The images will be used for 3D display on SetredAB’s screen and an implementation in firmware will hopefully give two majoradvantages over the existing rendering methods. First, an FPGA can handle bigamounts of data and perform a lot of calculations in parallel. Secondly, the amountof data to transfer is drastically increased after the interpolation and with this, ahigher bandwith is required to transfer the data at a high speed. By moving theinterpolation as close to the projector as possible, the bandwidth requirements canbe lowered. Both these points will hopefully be improved, giving a higher framerate on the screen.

The thesis consists of three major parts, where the first handles methods toincrease the resolution of images. Especially nearest neighbour, bilinear and bicu-bic interpolation is investigated. Bilinear interpolation was considered to givea good trade off between image quality and calculation cost and was thereforeimplemented. The second part discusses how a number of perspectives can beinterpolated from one or a few captured images and the corresponding depth ordisparity maps. Two methods were tested and one was chosen for a final imple-mentation. The last part of the thesis handles Multi Video, a method that canbe used to slice the perspectives into a form that is needed for the Scanning Slitdisplay to show them correctly.

The quality of the images scaled with bilinear interpolation is satisfactory if thescale factor is kept reasonably low. The perspectives interpolated in the secondpart show good quality with lots of details but suffers from some empty areas.Further improvements of this function is not necessary but would increase theimage quality further. An acceptable frame rate has been achieved but furtherimprovements of the speed can be performed. The most important continuationof this thesis is to integrate the implemented parts with the existing firmware andwith that enable a real test of the performance.

NyckelordKeywords interpolation, view interpolation, image processing, scanning slit, 3D, FPGA

AbstractThis thesis investigates possibilities to perform image interpolation on an FPGAinstead of on a graphics card. The images will be used for 3D display on SetredAB’s screen and an implementation in firmware will hopefully give two majoradvantages over the existing rendering methods. First, an FPGA can handle bigamounts of data and perform a lot of calculations in parallel. Secondly, the amountof data to transfer is drastically increased after the interpolation and with this, ahigher bandwith is required to transfer the data at a high speed. By moving theinterpolation as close to the projector as possible, the bandwidth requirements canbe lowered. Both these points will hopefully be improved, giving a higher framerate on the screen.

The thesis consists of three major parts, where the first handles methods toincrease the resolution of images. Especially nearest neighbour, bilinear and bicu-bic interpolation is investigated. Bilinear interpolation was considered to givea good trade off between image quality and calculation cost and was thereforeimplemented. The second part discusses how a number of perspectives can beinterpolated from one or a few captured images and the corresponding depth ordisparity maps. Two methods were tested and one was chosen for a final imple-mentation. The last part of the thesis handles Multi Video, a method that canbe used to slice the perspectives into a form that is needed for the Scanning Slitdisplay to show them correctly.

The quality of the images scaled with bilinear interpolation is satisfactory if thescale factor is kept reasonably low. The perspectives interpolated in the secondpart show good quality with lots of details but suffers from some empty areas.Further improvements of this function is not necessary but would increase theimage quality further. An acceptable frame rate has been achieved but furtherimprovements of the speed can be performed. The most important continuationof this thesis is to integrate the implemented parts with the existing firmware andwith that enable a real test of the performance.

v

Acknowledgments

I would like to thank Setred AB for the opportunity to write this thesis, especiallyJoel de Vahl for endless questions, discussions and help with Ruby. Thomas Eric-son have also been involved in a lot of discussions and has been of great help withproof reading. The project would not have been viable without their help. DougPatterson have been of great help with discussions of the existing firmware andideas of integration.

Finally, I would like to thank my near and beloved for their constant support.

vii

Contents

1 Introduction 11.1 Setred . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Hardware target . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Scanning Slit 3D displays 52.1 Seeing in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Monocular depth cues . . . . . . . . . . . . . . . . . . . . . 62.1.2 Binocular depth cues . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Understanding the Scanning Slit display . . . . . . . . . . . . . . . 72.3 Rendering for the Scanning Slit display . . . . . . . . . . . . . . . 8

2.3.1 Multi Video . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.2 The Generalised Rendering Method . . . . . . . . . . . . . 11

3 Up sampling basics 133.1 Known techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 Nearest neighbour interpolation . . . . . . . . . . . . . . . . 133.1.2 Bilinear interpolation . . . . . . . . . . . . . . . . . . . . . 143.1.3 Bicubic interpolation . . . . . . . . . . . . . . . . . . . . . . 143.1.4 Other interpolation techniques . . . . . . . . . . . . . . . . 153.1.5 Comparisons between nearest neighbour, bilinear and bicu-

bic interpolation . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Implementation of bilinear interpolation 194.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 View interpolation 235.1 Image sequence creation . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1.1 Interpolation method 1 . . . . . . . . . . . . . . . . . . . . 245.1.2 Interpolation method 2 . . . . . . . . . . . . . . . . . . . . 26

5.2 Image quality improvement . . . . . . . . . . . . . . . . . . . . . . 285.2.1 Quality improvements for method 1 . . . . . . . . . . . . . 28

ix

5.2.2 Quality improvements for method 2 . . . . . . . . . . . . . 29

6 Implementation of view interpolation 316.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Implementation of Multi Video 357.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

8 Conclusions and final discussion 398.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

8.1.1 Performance of the bilinear up scaler . . . . . . . . . . . . . 398.1.2 Performance of the view interpolation-Multi Video chain . . 40

8.2 DVI load reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 418.2.1 DVI load reduction with the up scaler integrated . . . . . . 418.2.2 DVI load reduction with view interpolation integrated . . . 41

8.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Bibliography 43

A VHDL code example 45A.1 Block RAM instantiation . . . . . . . . . . . . . . . . . . . . . . . 45A.2 View interpolation block . . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 1

Introduction

One of the first three dimensional projections that were seen by a big audience wasR2D2’s projection of princess Leia in the first Star WarsTM movie in 1977. Visu-alizations in three dimensions have always been fascinating and comics requiringred and green glasses were popular among young people for a long time. Eventhough a lot of research has been done in the field of 3D visualisations, no perfectworking 3D display or projector exist today. This thesis is a part of Setred AB’sdevelopment of a 3D display.

Even though the idea of 3D visualization is old, no commercial 3D displayor projector with good image quality, frame rate and depth exist on the market.Research in the area has been performed since the beginning of the 20’th centuryand there exist different, more or less working, displays that do not require glassesor other head tracking gear. These displays do, however, all have weaknesses dueto the fact that the problems that arises when trying to build one are many andthe shortcuts few. All known displays today are based on stereopsis, the fact thatthe eyes are separated on the head and therefore captures different information.The major challenge is how to show a different image to the two eyes.

One of the big problems when it comes to image rendering is the huge amountof data that is needed and the short time available for rendering this to achieve realtime behaviour. The rendering stage contains steps like capturing images, depthcalculations, intermediate view interpolation and other operations. On Setred’s 3Ddisplay, most parts of the rendering is done on the graphics card but even thoughmodern GPUs are powerful the complete rendering process consists of massivecalculations and data handling that take long time to perform. If all renderingis done on the graphics card, a massive amount of data has to be transfered tothe screen, which puts extremely high demands on the transfer channels. Thepurpose with this thesis is to move some of the operations to an FPGA located ona PCB near the projector. The implementation will be done using VHDL, VeryHigh Speed Integrated Circuit Hardware Description Language.

The thesis consists of three major parts, which will be treated separately. Thefirst part investigates different methods of changing resolution of images, and onemethod is finally implemented. This is done in chapters 3 and 4. The second

1

2 Introduction

part investigates intermediate view interpolation of stereoscopic images and chap-ter 5 consists of theory on the subject. Chapter 6 shows the solution that wasimplemented and how that was done. The last part treats a method that is usedto modify the output images into a form that is suitable to show on the display.An introduction to this method is given in section 2.3 and the implementation isshown in chapter 7. Apart from this, chapter 2 gives a short introduction to 3Dvisualization and the Scanning Slit display. Image scaling and view interpolationcan be used separately or together, depending on the application and/or otherrequirements.

Web sources that have been of great help during the thesis, and not referred toin the text, are Ashenden [17] for VHDL, Oetiker et al. [18] for LATEX and RubyDocumentation [16] for Ruby.

1.1 SetredSetred aims to become the leading provider of high-end 3D display solutions.The company’s display technology is a result of the joint research between Mas-sachusetts Institute of Technology and Cambridge University.

Setred aims to be the leader in enabling more intuitive and realistic interpreta-tion of three dimensional information. The company’s first product is a 3D displaythat acts as a digital hologram. It works with any software application that usesthe computer’s 3D graphics card, for example CAD applications and games.

The display has a combination of properties that break the current compro-mises.

• True 3D, allowing the observer to look around objects by moving the headsideways.

• No restriction in head movement or number of viewers.

• Full colour and full resolution.

• Possible to make flat panel.

• No headgear or head tracking equipment.

There is currently a 20 inch colour prototype of the display, shown in Figure 1.1.

1.2 BackgroundThe reason why Setred is interested in this project is that they think that thesecomponents can increase the performance and flexibility of the display. If realtime behaviour can be achieved, together with the independence of input resolu-tion, a new area of customers can be addressed. One example is ROVs, RemotelyOperated Vehicles, where two cameras placed on a vehicle could display a threedimensional projection of the environment. One big advantage with performing

1.2 Background 3

Figure 1.1. Setred’s 3D display.

operations in programmable hardware is that a lot of calculations can be per-formed in parallel. Modern FPGA:s have become very powerful and fast and cantherefore handle large amounts of data and a lot of calculations. One of the de-sired consequences with this implementation is that the bandwidth explosion, thatarises after the views have been interpolated, are moved closer to the projector.This decreases the traffic on the connection channels and increased therefore thepossibility of achieving higher frame rate.

The PCB containing the FPGA is connected to a PC through DVI cables, max-imum eight in number. Each DVI cable have three channels, each with a maximumtransfer speed of about 150 · 106 pixels per second [21]. Modern graphics card of-ten have one or two DVI outputs, which means that many graphics cards or manysynchronized PCs have to be used to enable use of all eight inputs. Decreasing theamount of data to be sent over the DVIs also decreases the requirements on thePC(s) running the screen.

The projector, that together with a diffuser creates the screen component onSetred’s 3D display, runs in a fixed resolution. This means that to show an imageon the display, the image must be of this specific resolution. This is not veryflexible since different applications often create outputs with different resolution.The use of smaller images also decreases the amount of data to be rendered by theGPU and sent to the display. To overcome this limitation, a resolution changer isimplemented as near the projector as possible.

The major benefits with interpolating intermediate views on the FPGA insteadof on the GPU are the same as mentioned above. It it quite obvious that the load onthe connection channels between the display and the computer reduces drasticallyif the main data to be sent consists of a few images and depth maps instead ofa complete set of output frames. The number of output frames are usually 16 ormore. It is also advantageous to perform the heavy arithmetic view interpolationson the FPGA due to the possibility of parallel calculations.

The final step, in the signal chain and in this thesis, modifies the images into

4 Introduction

a form that is needed for the display to show correct 3D illusions. This operationhas to be performed after the views are interpolated and hence, the function mustbe implemented on the FPGA as well. The simplicity of reading and writing tothe internal Block RAMs, which can be used as internal buffers, are a great benefitfor this operation and an firmware implementation is therefore not only necessarybut also suitable.

1.3 Hardware targetThe target FPGA is a Virtex-4, XC4VLX25, from Xilinx. This FPGA alreadyexists on the current PCB and is therefore not possible to change in this project.The XC4VLX25 is equipped with 72 Block RAMs, each with a maximum storagecapacity of 18 kilobit, configurable in a number of ways as described in the Virtex-4 User Guide [20]. It is also equipped with 48 XtremeDSP Slices that can be usedas 18× 18 bit multiplier, multiply-accumulator, or multiply-adder blocks.

Two external RAMs, that can be read from and written to by in and out portson the FPGA, exists on the PCB. The connection between the PC and the PCBis the DVI(s) mentioned above.

1.4 MethodBefore the implementation was started, a study of existing methods and previousmade work in the area was done. Some time was also spent on trying to understandthe Scanning Slit display. The implementation was done using Xilinx Webpackand the simulations were performed in Mentor Graphics’ ModelSim. To make thesimulation result easy to interpret, small scripts were written in Ruby that parsesimages to text files and vice versa. These text files were read and written by thetestbench.

1.5 LimitationsThe functions implemented in this thesis are meant to be integrated with theexisting firmware on Setred’s 3D display. This integration will not be performedand explained here since the time needed does not exist within the limits for thisproject. A short discussion of the integration of view interpolation and MultiVideo can be found in section 7.2.

All operations will be performed on greyscale images. Colour images are inter-polated in the same way for each of the RGB channels.

Chapter 2

Scanning Slit 3D displays

Three dimensional projections in mid-air, as in Star WarsTM, has long been afascinating dream. Although it is possible in theory to create such an effect,the practical challenges are currently too big to make an attractive commercialproduct and unless major breakthroughs in the area are made, other ways todisplay three dimensional objects have to be used [1]. One of the systems, whichbasic principles have been known for decades, and currently being developed bySetred is the Scanning Slit 3D display, presented and discussed in [1]. This chapterserves as pure information in purpose to make the reader understand how Setred’s3D display works and to motivate some of the work being done in this thesis,especially chapter 7. Most information is taken from Christian Møller’s thesis [1],which is the cause of Setred being founded. No references will be given unless theinformation is from another source.

First, the reader needs a basic knowledge of how the human eyes and brainwork to create three dimensional images. A short explanation of this will be givenin section 2.1. A short explanation of the Scanning Slit 3D display is given insection 2.2.

2.1 Seeing in 3DThe human eye is a complex creation and details of how the eye is built andcaptures information will not be given here. The interested reader can find this inany book of anatomy or human vision, for example [8] or [12]. One detail that isof importance for this thesis is that the sample time for an eye capturing an imageis finite. This means that the images we see actually are an integral over time, likean image captured by a camera. The sample time is a function of photon density,or simpler the strength of the luminance, and is therefore not constant.

Another interesting field for this thesis is how humans perceive depth, which isan advanced feedback system between the eyes and the brain. The huge amountof information that is processed by the brain for this purpose is said to comprisedepth cues [12]. There are many depth cues, some more important than others,

5

6 Scanning Slit 3D displays

and they are often divided into three categories: extraretinal cues, monocular cuesand binocular cues. The two latter are described below.

2.1.1 Monocular depth cuesThe Monocular cues treat information received from one single eye and processedby the brain to give a perception of depth. The most important of these cues are:

• Linear perspective. Objects that appear over a large depth will be smallerthe further away a point is located. A good example is a straight railroadthat turns into a single point as the road reaches the horizon.

• Interposition. Interposition or occlusion is, by intuition, one of the mostimportant depth cues. When an object is located in front of another object,parts of the object behind will be hidden for the observer.

• Retinal image size. By comparing the relative size of objects of knownsize, their position in depth can be estimated. For example, if a man and ahouse have the same height in an image, the man is probably placed in frontof the house.

There are several other monocular depth cues that are of importance and theinterested reader can read more in [12]. Examples of the depth cues listed abovecan be seen seen in Figure 2.1.

(a) Linear per-spective.

(b) Interposi-tion/occlusion.

(c) Retinal image size.

Figure 2.1. Examples of three monocular depth cues. Image courtesy of ChristianMøller [1].

2.1.2 Binocular depth cuesThe depth cues that are of great interest for this thesis is the binocular depthcues, which exists due to the fact that the human eyes are separated on the head.Because of this, the two eyes capture two images that differs slightly from eachother, which can be seen as a binocular disparity [12]. This disparity creates aneffect known as stereopsis and forms the fundamental principles of all 3D displaysthat exist today. The idea of showing different perspectives for the two eyes andwith that inducing the stereoscopic effect is old, the first stereoscope was built byWheatstone in 1838 [10]. The stereoscopic effect can be achieved in many way,

2.2 Understanding the Scanning Slit display 7

for example by using the well known red and green glasses or by simply separatethe views from the two eyes so they see different images. Figure 2.2 shows an oldstereoscope.

Figure 2.2. Stereoscope.

With this as a background, an explanation of the Scanning Slit 3D display canbe given.

2.2 Understanding the Scanning Slit displayThe goal with the Scanning Slit 3D display is to create a stereoscopic effect, asdescribed in section 2.1.2. This is achieved by placing a scanning slit device, ashutter, with thin vertical slits in front of a display. The shutter acts like a filter,making only a limited region of the screen reaching the observer’s eyes. As long asthe slit is sufficient narrow, the areas on the screen that can be seen with the twoeyes are separated and therefore, the fundamental requirements for stereoscopyare fulfilled. Figure 2.3 shows the principle of the display viewed from above.

Figure 2.3. Principle of the Scanning Slit display.


Setred’s 3D screen has a number of slits open at a time, creating repeated zoneson the display. This is done to increase the effective bandwidth. Details of thisare not important for the thesis and a motivation will therefore not be given. Theinterested reader can find a detailed description in Møller [1]. The open slits areeven spaced with a distance, dopen slits given by equation 2.1.

dopen slits = screen widthtotal number of slits ·

total number of slits5 = screen width

5 (2.1)

With screen width = 40cm and 5 open slits, which is a possible setup on thecurrent Setred display, dopen slits = 40

5 = 8cm which means that the risk for anobserver to see through two open slits is small.

One important behaviour of the shutter is that it should be invisible for theobserver. This is achieved by switching the open slit(s) in a very high rate, syn-chronous with changing the image on the screen. Further discussions of this canbe found in section 2.3.

So far, a short but straight forward explanation of the Scanning Slit display hasbeen given but nothing have been said about the images to show on the displayand how to render them. This will be explained in the following section.

2.3 Rendering for the Scanning Slit display3D computer graphics rendering is a huge field and this chapter will only give abrief explanation of the theories and concepts needed to understand the work donein this thesis. First, a couple of definitions are needed.

The cone, through which a camera captures light, is referred to as the viewingfrustum. The frustum together with the front and back clipping planes definesthe viewing volume, which is the volume seen by the cameras. See Figure 2.4 forillustration.

Figure 2.4. Illustration of viewing volume with frustum, back and front clipping planes.

2.3 Rendering for the Scanning Slit display 9

Stereoscopic systems like the green and red glasses have a great advantage whenit comes to display images since they can display complete views or perspectivesto each eye all the time. For the Scanning Slit display this is not possible andmore advanced render methods have to be used since a viewer looks through anensemble of slices of several 3D perspectives. When rendering images for theScanning Slit display, one of the challenges are to ensure that each slice visiblethrough each slit adds up to form a decent 3D perspective. This can be done byslicing the perspectives with a method known as Multi Video, described in thefollowing section.

2.3.1 Multi VideoMulti Video was first developed to enable multiple images to be shown on a displayat the same time. One possible use of that would be that several persons can watchdifferent TV programs on the same display at the same time. Multi Video can alsobe used for 3D purpose by using several cameras and assign them a Multi Videochannel each. In this case, a Multi Video channel is simply an image captured fromone camera. It is quite clear that this method is useful for slicing the perspectivesin order to synchronize the slices with the slits. With a good slicing method, thestereoscopic effect can be accomplished as described in Section 2.2. A US patentexist regarding using Multi Video for 3D display, in which the number of MultiVideo channels and output frames are the same [11].

(a) Four Multi Video channels and oneslit.

(b) One Multi Video channel and fiveopen slits.

Figure 2.5. Illustration of a Multi Video setup.

Figure 2.5(a) shows the frustums for the center open slit of a four channel MultiVideo setup. Four channels are a very low number for 3D purpose but is used forillustrative purpose. Tests have shown that increasing the number of Multi Videochannels enhances the output quality [9].

Each Multi Video channel is sliced and slices from different channels are mergedto form an output frame. Figure 2.5(b) shows one single Multi Video channel. Thegrey lines show the frustum and the black lines the parts of the channel that willbe used for the current output frame. Notice that not all channels have to be seen


through each slit, as in the leftmost or bottom slit in Figure 2.5(b). Following thesame idea for the other Multi Video channels, one complete output frame can bebuilt. The next output frame is merged in the same way but with the slits shiftedone step.

Two issues with Multi Video are blind zones and tearing. Blind zones can occurif the Multi Video frames are not correctly rendered, for example if the perspectivesare separated too much as in Figure 2.6(a). This is specially a problem when thenumber of Multi Video channels is low. If this problem is taken under considerationduring the perspective rendering process, it is easy to overcome.

The other issue is, as mentioned above, tearing. This effect occurs when theviewer is positioned between two Multi Video channels and thus sees parts of twoperspectives, as in Figure 2.6(b). This problem is not very easy to solve but onesolution is quite intuitive. Increasing the number of Multi Video channels leadsto shorter distance between the cameras and thus increased possibility that theobserver is positioned on, or near, an optimal position. An optimal position iswhere a virtual, or physical, camera is placed.

(a) Blind zones due to incorrect perspec-tive rendering.

(b) Tearing due to to observer’s positionrelative the original perspectives.

Figure 2.6. Two possible problems with Multi Video; blind zones and tearing.

Calculating Multi Video frames

To calculate the Multi Video frames a number of parameters have to be taken intoaccount into account. A couple of these are the distance between the cameras andthe shutter, the distance between the cameras, the number of cameras and thewidth of the shutter slits. This can be understood by looking at Figure 2.5(a).Details of the calculations will not be given here since they will not be performedin this thesis. More details of this can be found in chapter 7. The interested readercan find more information about Multi Video in Haavik [9].

2.3 Rendering for the Scanning Slit display 11

2.3.2 The Generalised Rendering MethodA different approach to the rendering problem has been developed by Møller duringhis PhD, and is patented. This approach is called the Generalised RenderingMethod. The major difference from the methods discussed in detail in this thesis isthat instead of translating the camera along an axis and calculating perspectives,a virtual camera is placed in the open slit and the image on the film plane iswarped. The Generalised Rendering Method is a bit demanding to perform andwill therefore not be considered further in this thesis. An implementation infirmware should be interested to investigate but no time for this exist within thisproject. The interested reader can find a detailed explanation of The GeneralisedRendering Method in Møller [1].


Chapter 3

Up sampling basics

The purpose of all techniques used for image up sampling is the same; createinformation that does not exist. The methods to do this do however vary, bothin computation time, demand of resources and output quality. Unfortunately,there are no magic techniques and a trade off between quality and computationtime and/or demand of resources basically always has to be considered. It isquite clear that the quality of an up sampled image never can be better than theoriginal, but with the best techniques and with small scale factors the difference inquality between the original and the up sampled image can be reasonably small.Unfortunately, the result always suffers from quality losses when the scale factorincreases.

3.1 Known techniques

Various techniques for up and down sampling of images and sequences of imagesare used today and they are all based on mathematical theory for numerical anal-ysis. The most common ones are listed below. All methods have advantages anddisadvantages and to decide which to use a number of parameters have to be takeninto account, for example the maximum scale factor, the available computationresources and the allowed computation time.

3.1.1 Nearest neighbour interpolation

Nearest neighbour interpolation is the most basic of the techniques discussed here.As the name hints, the pixel nearest the sample position is simply copied [4]. Thiswill create an image that looks thorny, but at low computational cost since noarithmetic operations need to be performed. See Figure 3.1 for an example. Thecrosses in Figure 3.1(a) indicate the new sample positions relative the originalimage, Figure 3.1(b) shows the up sampled image.

13

14 Up sampling basics

(a) Original image with samplepositions.

(b) Up sampled image.

Figure 3.1. Example of nearest neighbour interpolation.

3.1.2 Bilinear interpolationBilinear interpolation considers the four pixels surrounding the interpolation point.The interpolated pixel is then calculated through a weighted mean value of thesefour pixels [4]. The result is a much smoother looking image than after nearestpixel interpolation. The disadvantages are the higher computation cost and thatthe image tend to be a little blurry, especially for big scale factors. See Figure 3.2for the calculation principle.

Figure 3.2. Basic principle of bilinear interpolation.

Assume that the distance between two adjacent pixels is one (arbitrary lengthunit), both vertically and horizontal. The interpolated pixel, p, is then calculatedas

p = (1− xv)(A(1− xh) +Bxh) + xv(C(1− xh) +Dxh) =A+ (B −A)xh + (C −A)xv + (D − C +A−B)xhxv (3.1)

3.1.3 Bicubic interpolationBicubic interpolation is based on the same idea as the bilinear but considers 16pixels surrounding the interpolation point. This gives a sharper result compared

3.1 Known techniques 15

to the bilinear interpolation to the cost of longer computation time and/or higherresource demand. Bicubic interpolation is standard in many image editing pro-grams, printer drivers and in-camera interpolations [15]. The calculation of theinterpolated pixel is done in the same way as for bilinear interpolation but is moreextensive and will therefore not be shown here. The reader can get an idea of howthe calculations are performed by looking at Figure 3.3.

Figure 3.3. Basic principle of bicubic interpolation.

3.1.4 Other interpolation techniquesThere are many other ways to interpolate values. It is possible to consider moresurrounding pixels or use spline-, sinc- or polynomial functions. These techniquesdo however require long computation time or lots of resource and an implemen-tation will suffer from non-real time behaviour or too much hardware allocation.These higher order interpolation techniques will therefore not be consider furtherin this thesis.

3.1.5 Comparisons between nearest neighbour, bilinear andbicubic interpolation

In order to decide which method to use a number of considerations have to bemade, as mentioned above. For a software implementation, the computation timeis critical. For a hardware implementation, computation time and resource allo-cation often goes hand in hand due to the possibility of parallelization. Anotherparameter to consider is the image characteristics, straight lines may for example


Figure 3.4. Comparison between nearest neighbour, bilinear and bicubic interpolation,photographic image.

Figure 3.5. Comparison between nearest neighbour, bilinear and bicubic interpolation,synthetic image.

often look better if they are scaled using nearest neighbour instead of a higher or-der interpolation function. Finally, the maximum scale factor is of great interest.For small scale factors, all of the methods may look quite good. To get an idea ofthe differences in image quality between nearest neighbour, bilinear and bicubicinterpolation, two tests were made using Adobe Photoshop c©. The inputs wereuncompressed images which were scaled up 800 percent using these three methods.The result can be seen in Figure 3.4 and 3.5.

As expected, nearest neighbour interpolation gives a sharp image with "visible"pixels which makes the image look a bit thorny. More interesting is the smalldifference between bicubic and bilinear interpolation in the photographic image,the one scaled through bicubic interpolation shows some more details but thedifference is not significant for this scale factor. The synthetic images do howevershow a bigger difference as the edges are sharper in the image modified with bicubicinterpolation.

One thing to remember is that multiple interpolations of the same object createa final result that might differ from one where the interpolation is done in onestep. This is quite easy to understand when considering the mathematics behindthe interpolation. Figure 3.6 shows the same content as Figure 3.5 but with the

3.1 Known techniques 17

Figure 3.6. Comparison between nearest neighbour, bilinear and bicubic interpolation,synthetic image.

scaling done in five equal steps instead of one. The difference is most visible, andalso most obvious with nearest neighbour and bilinear interpolation. This is notvery surprising since these functions only consider pixels in a near surrounding tothe interpolation point.


Chapter 4

Implementation of bilinearinterpolation

After the study of the known techniques used for image scaling today, bilinearinterpolation was chosen for implementation. Even though bicubic interpolationhave been implemented on FPGA:s before [2], the small improvement in qualityis not sufficient to motivate the massive increase in computation cost.

The implementation of bilinear interpolation is quite straight forward; fourvalues have to be multiplied by a weight and then added. The natural way ofdoing this is to keep the weights, ci in the range [0,1] such that Σci = 1 so thatadding the products gives the interpolated value. To accomplish this, a fixed pointrepresentation of the fractionals has to be used. One solution is to use constantsin the range [0,8] such that Σci = 16, which gives a sum of products from whichthe interpolated value is obtained by a division of 64. This division is easily doneby six right shifts, or in this case by reading the eight most significant bits of the14 bit sum. This three bits fixed point representation limits the accuracy to 1/8,which can be considered enough to obtain a good result.

The distance between the sample points in the original image is calculated as

disthor = input image width · fixed point scale factoroutput image width

(4.1)

distvert = input image height · fixed point scale factoroutput image height

(4.2)

respectively, where disthor is the horizontal sampling distance, distvert the verticaland fixed point scale factor = 8 as motivated above.

If a 640 × 480 pixel image will be scaled to 1024 × 768 pixels, the horizontaldistance, disthor is 640·8

1024 = 5 and the vertical, distvert, is 480·8768 = 5. Figure 4.1

shows an example with these values, where Pi,j is the pixels in the original imageand Xi,j the interpolation points. Note that if the proportion of the image is tobe preserved, disthor = distvert and only one calculation has to be performed.

19

20 Implementation of bilinear interpolation

With this fixed point representation of the coefficients, equation 3.1 needs tobe modified. This is done in equation 4.3.

p = (8− xv)(A(8− xh) +Bxh) + xv((8− xh)c+Dhx) =64A+ 8(B −A)xh + 8(C −A)xv + (D − C +A−B)xhxv (4.3)

Figure 4.1. Example of bilinear interpolation.

Figure 4.1 together with equation 4.3 gives the interpolated output, X, partlyshown as a matrix in 4.4.

X =

P1,13P1,1+5P1,2

86P1,2+2P1,3

8 . . .3P1,1+5P2,1

83 3P1,1+5P1,2

8 +5 3P2,1+5P2,28

83 6P1,2+2P1,3

8 +5 6P2,2+2P2,38

8 . . .6P2,1+2P3,1

86 3P2,1+5P2,2

8 +2 3P3,1+5P3,28

86 6P2,2+2P2,3

8 +2 6P3,2+2P3,38

8 . . ....

...... . . .

=

=

P1,13P1,1+5P1,2

86P1,2+2P1,3

8 . . .3P1,1+5P2,1

8(9P1,1+15P1,2)+(15P2,1+25P2,2)

64(18P1,2+6P1,3)+(30P2,2+10P2,3)

64 . . .6P2,1+2P3,1

8(18P2,1+30P2,2)+(6P3,1+10P3,2)

64(36P2,2+12P2,3)+(12P3,2+4P3,3)

64 . . ....

...... . . .

(4.4)

One issue for all up scaling function is that the amount of data is increasedafter the scaling has been performed. This means that the scaling function mustbe able to either control the input data flow or have a good output controller. Thelatter presupposes that the operations after the up scaling can be performed withthe higher data rate.

In this implementation, it is assumed that the input data flow can be controlled,for example by reading the input images from an external RAM. See Figure 4.2for block diagram of the implementation setup.

4.1 Discussion 21

Figure 4.2. Block diagram of the bilinear interpolation function.

The interpolation is done as following.

1. Read the two first rows from the RAM to the input buffers.

2. Calculate the first sample positions according to equation 4.1 and 4.2 andread the needed pixels from the buffers.

3. Interpolate the entire rows from that set of input rows. Store the result inthe output buffers.

4. Empty the output buffers and read in the next row from the RAM. Jump topoint 2.

A couple of things need to be said here. Since the rows are being interpolatedsynchronous, there are no need to wait for an entire row to finish before jumpingto the next stage. Only a couple of pixels are needed in the input buffers beforethe interpolation can start and only a couple of pixels in the output rows need tobe stored before the output buffers can be read. The explanation was done in thisway for simplicity.

In stage 4, only one row needs to be read since the latter of the two alreadybuffered will be used for the next interpolation as well. It is up to the interpolationblock to keep the rows in order. Point one will therefore only be performed at thebeginning of a new image.

The number of output buffers needed for the interpolation are direct dependentof the scale factor and the time demands. To double the size of an image, fourbuffers are needed for every pair of input rows and so on. It is also possible toonly use two output rows, to the cost of double computation time.

4.1 DiscussionOne drawback with the constant representation used in the interpolation is thatthe sample positions on the rightmost side of each row can be of a distance unequalto the others. This is due to the truncation error that is a natural cause of thefixed point representation. Figure 4.3 illustrates this problem. Only one row is

22 Implementation of bilinear interpolation

Figure 4.3. Example of discontinuity in sample distance due to truncation.

shown for simplicity, the vertical interpolation is analogous to the horizontal andis omitted from the example. A small row, five pixels wide, is to be scaled to eightpixels. The sample distance is 5·8

8 = 5 according to equation 4.1 and the samplepositions are marked as x in the figure.

The images that are meant to be modified with this scaling function are as-sumed to be of quite big sizes, at least 100 pixels, and a small discontinuity inthe sample positions will therefore hardly be noticeable. Hence, this can barely beseen as a problem.

The speed of this interpolation function is strongly dependent of the hardwareresources available. Each interpolation requires four multiplications according toequation 3.1 and hence, the number of available multipliers determines how manyinterpolations that can be performed at the same time and with that the minimumtime required to scale up an image.

4.2 Future workThe interpolation function it not useful unless it is integrated with the existingfirmware. This should not be that difficult to do if the current system is fullyunderstood. The integration can be done with the use of an external RAM asassumed above or by direct communication with the preceding blocks, or the PCif the up scaling is wanted in the beginning of the signal chain. Since the newsample positions are calculated inside the block, the interpolation function itselfis quite dynamic. Two thing that could be more dynamic are:

1. The determination of how many interpolations that should be performedat the same time, depending on the scale factor and maybe which otherprocesses that are currently running on the FPGA. To get as good perfor-mance as possible, as many interpolations as possible should be performedin parallel.

2. The allocation of the output buffers depending on the scale factor and/orthe existing time demands. If the allowed computation time make it possi-ble to perform more interpolations sequential instead of parallel, hardwareresources can be saved.

For the moment, all of the factors above are fixed can therefore not be changedafter the firmware have been synthesized and programmed on the FPGA.

Chapter 5

View interpolation

Based on information from one image and the corresponding depth map, it ispossible to interpolate what a camera placed a small distance aside would capture.If information from two cameras, separated from each other, capturing the sameobject is known the number of views that can be interpolated and the qualityof the views increases drastically. From stereoscopic images it is also possible tocalculate a disparity map, which shows how two corresponding pixels differ in thetwo images. In this thesis only horizontal disparity is considered, which meansthat the two cameras both have to share the film plane and be vertically alignedto each other. There are several ways of calculating disparity maps, but none ofthe methods will be explained here. Thulin’s thesis [3] has an investigation of prosand cons of different methods and will be used as background. In this thesis, bothdisparity and depth maps will be considered in order to investigate the differencesin speed and complexity between two implementations. Interpolation from a singleinput, stereoscopic inputs and more than two input images will be discussed. Thecreation of disparity and depth maps will not be considered and it is proposed thatthey are created on the graphics card and fed to the FPGA in a known format.

This chapter considers methods for visualizing imaged on the 3D-screen fromthe disparity or depth maps and the original single or stereoscopic images. It partlyreuses work done by Thulin [3] and major parts are based on three dimensionaltransformations. Since the images are to be shown on the 3D-display, which arefed with a sequence of images, the focus will be the creation of this sequence.

5.1 Image sequence creationA number of methods for view interpolation are known but no deep investigationof pros and cons with all of them will be presented here. Two methods that haveproven to give good result with relatively small means are chosen for further anal-ysis. The two methods are quite similar but approaches the problem in differentways. The following sections explain how the methods work and discusses theirweaknesses and strengths. A number of papers on view interpolation exists, twothat are of interest are Zitnick et al. [6] and Chen et al. [7].

23

24 View interpolation

5.1.1 Interpolation method 1In this interpolation method the input image is read synchronous and the outputpositions for each pixel are calculated. In other words, the disparity, dr or depth,zr, at position xr is read and the output position xi is calculated as

xi = xr + warping coefficient · zr ·∆x+ translation coefficient ·∆x (5.1)

or

xi = xr + dr ·∆x (5.2)

where ∆xε[0, 1] is the distance between the original image and the wanted view.Figure 5.1 gives an intuitive explanation of the interpolation method for five

pixels. Since the interpolation is performed row by row, only a part of one row isshown.

The depth, zr, or disparity, dr, and pixel value, pr, are read at position xr inthe input image, as shown in Figure 5.1. The output position, xi is calculated asshown in Equation 5.1 or 5.2 and the pixel value from xr in the input image iswritten to this position, as shown in Figure 5.1(b). xi might be smaller, equal toor greater than xr depending on the sign of the disparity or depth. This meansthat some pixels in the output row might be overwritten and some might not bewritten at all. The numbers in Figure 5.1(b) indicates the order in which the pixelsare treated.

(a) Part of depth row. (b) Part of output image row.

Figure 5.1. Graphical explanation of interpolation method 1.

Equation 5.2 is not very difficult to understand. dr tells the vertical disparitybetween to pixels and for ∆x = 1 this translation will, ideally for stereoscopicinputs, give the other input. Equation 5.1 is based on three dimensional transfor-mations and is not very intuitive. A proof is therefore shown below.

ProofConsider Figure 5.2, where the original image is captured from camera r and the wantedperspective is given by camera i. The window coordinates {xw, yw, zw} are transformedinto normalized device coordinates {xnd , ynd , znd} by the inverted viewport matrix V−1.The image in r is reprojected into camera space by the inverted projection matrix P−1

r ,giving the operator P−1

r V−1. Then, the camera is translated from r to i by the matrixXr,i, giving a total operator Xr,iP−1

r V−1. The object is then projected to camera i by

5.1 Image sequence creation 25

Figure 5.2. Illustration of the transformation.

the projection matrix Pi and the coordinates are transformed back to window coordinatesby V. This gives the final transform T = V(Pi(Xr,i(P−1

r ·V−1))), where

V =

12W 0 0 1

2W0 1

2H 0 12H

0 0 12

12

0 0 0 1

= viewport matrix concatenated so that z-values ε[−1, 1].

Pi =

2nearrighti−lefti 0 righti+near

top−bottom 00 2near

top−bottomtop+bottomtop−bottom 0

0 0 − far+nearfar−near

2far·nearfar−near

0 0 −1 0

= reprojection matrix.

Pr =

2nearR−L 0 R− xrnear

D+near

top−bottom 00 2near

top−bottomtop+bottomtop−bottom 0

0 0 − far+nearfar−near

2far·nearfar−near

0 0 −1 0

=

render projection matrix,inverted to get theunprojection matrix.

Xr,i =

1 0 0 xi − xr0 1 0 00 0 1 00 0 0 1

= x-translation matrix transforming from r to i.

W/H = width/height of frustum.near/far = distance to the near and far depth clipping planes, ≥ 0.top/bottom = coordinates for the top and bottom horizontal clipping planes.L/R = coordinates for the left and right vertical clipping planes.D = distance from camera to projection plane.xi = x position for interpolated view.xr = x position for original view.righti = R− xinear

D.

lefti = L− xinearD

.


T = V(Pi(Xr,i(P−1r ·V−1))) =

=

1 0 W (near−far)(R−L)far (xr − xi) W ((L−R)near+2D(top−bottom))

2D(R−L)(top−bottom) (xr − xi)0 1 0 00 0 1 00 0 0 1

,

which gives xi = xr+W (near−far)(R−L)far (xr−xi)·z+W ((L−R)near+2D(top−bottom))

2D(R−L)(top−bottom) (xr−xi) =xr + (warping coefficient · depth+ translation coefficient)∆x, where

warping coefficient = W (near−far)(R−L)far

translation coefficient = W ((L−R)near+2D(top−bottom))2D(R−L)(top−bottom)

∆x = xr − xi¤

The quality of the interpolated image decreases as |∆x| increases, due to thefact that views far away from the original camera position contains informationthat does not exist in the original image. The largest problem with this inter-polation method is that empty surfaces can occur, especially at positions wherethe depth varies much between adjacent pixels, for example at sharp edges. Tocreate many perspectives from wide angles, more than one input image can beused. The information from these images can be combined in different ways tocreate a better interpolation. See the following section for further discussions onquality improvements.

One detail that are of great importance in the interpolation is the direction inwhich the rows are read. Consider Figure 5.3, where a view to the left of an inputimage is wanted. Assume that the input image is read from the left. A pixel Abehind the image plane will be shifted to the left and written in the output frame.If, later in the interpolation stage, a pixel B is located behind the point A, it willalso be shifted to the left and may overwrite pixel A. This is not desired sincepoints further away should be hidden by nearer points. If the input image insteadis read from the right this will not be a problem since, to use the same example,pixel A will overwrite the previous written pixel B. It can be shown analogousthat the input should be read from the left if the wanted view is to the right ofthe input image.

5.1.2 Interpolation method 2In the method used by Thulin [3], the output row is built synchronous by readingthe depth or disparity of a specific pixel and then calculate which pixel to copyfrom the input row. To do this, unique depth or disparity maps are needed for eachperspective. The interpolation is illustrated in Figure 5.4. Since the interpolationis performed row by row, only a part of one row is shown.

Figure 5.4(b) shows the image warping. The disparity of position xi in thewanted view’s disparity map is read. Then, the read address in the original imageis calculated as shown in equation 5.3, where the ± depends on how the disparityis defined. The pixel value from position xr in the original image is written to

5.1 Image sequence creation 27

Figure 5.3. Illustration to show the importance of correct read direction.

position xi in the output image. The numbers 1-5 in Figure 5.4(b) indicates theorder of the pixels being warped. Note that some pixels in the original imagemight not be read and some might be read more than once.

(a) Part of input and warped disparityrow.

(b) Part of input and output image row.

Figure 5.4. Graphical explanation of interpolation method 2.

xr = xi ± di (5.3)

The unique disparity maps are created as shown in equation 5.4.

di = dr ·∆x (5.4)

Figure 5.4(a) shows how the disparity map is being warped according to itself.This warping is a rough approximation and assumes that the disparity is contin-uous and strictly limited so that surrounding pixels have approximately the samedisparity. If this is not the case, strange artifacts will occur.

This explanation was done for disparity maps but the same discussion can beused for depth maps.


Figure 5.5. Camera setup for view interpolation with two input images.

5.2 Image quality improvementThis section presents some ideas that can be used to improve the quality of theinterpolated views. Both interpolation methods are considered.

5.2.1 Quality improvements for method 1One way to improve the image quality for the interpolated views is to use morethan one input image. Figure 5.5 shows a camera setup with two input images. Ifa third view at position xi < 0.5 is wanted, one way to combine the informationfrom the two input images is to first calculate the view from image R and store theresult into a buffer and then calculate the view from image L and store the resultinto the same buffer. Since L should be the dominating image, |xi−xL| < |xi−xR| ,the interpolated image will basically consist of information from image L, but withgaps filled with information from image R. This will increase the image qualitydrastically but still not give a perfect result. One major advantage with thispadding approach is that no arithmetic operations are needed to build the outputimages as no interpolation is performed. Another benefit with not performingpixel interpolation is that the sharpness is maintained. One disadvantage is thatmore input images implicates longer computation time.

One way to further improve the interpolation quality is to consider a numberof surrounding pixels in the output image. When an output position is calculated,check if the pixels surrounding this position have been written earlier. If they havenot, these positions can be padded with the current pixel but without marking thepixel as written. This means that if a "correct" pixel is to be written to the positionlater in the interpolation process, this padded pixel will be overwritten and if not,there will not be an empty position. There are two major disadvantages withthis surrounding pixel padding. First, knowledge of which pixels that have beenwritten and which have not is required. Even though one single bit per pixel isneeded for this the list grows quite big, but if the space is available it shouldn’t bea problem. Another possibility is to mark the pixels in the output buffers in someway but since the BRAMs need to be addressed, they are not direct accessible likea register, such an operation would be to time consuming. The other drawbackexist due to the fact that the BRAMs only have one write port and hence, only one

5.2 Image quality improvement 29

position can be written to at the same time. This means that the interpolationtime would increase even more.

The major quality losses occurs at depth discontinuities as mentioned above. Ifsuch discontinuities were treated with extra care, big quality improvements couldprobably be achieved. On way to do this is through segmentation, described inZitnick et al. [6]. Here, the image is divided into segment, where pixels withsimilar depth or disparity are placed in the same segment. Different segmentscould then be treated differently and boundaries can have special treatment. Onebig problem with this operation is that the images have to be sorted by depth ordisparity which causes extra computation time. Another way is to smooth anduse colour values for segmentation, also discussed in Zitnick et al. [6]. This willhowever decrease the sharpness of the image and is also time consuming.

5.2.2 Quality improvements for method 2With this method, as with method 1, the image quality increases with the numberof input views. Consider Figure 5.5 again. If a third view at position xi is wanted,one way to combine the information from the two input images is to calculate theview from both image L and image R and make the output a mean of these two. Ifthe two interpolated views in addition are weighted, depending on the x position,the result might be better. An even better result can be obtained by using morethan two input images.

This method do not suffer from empty spaces as method 1 does. Instead ofempty spaces, fields with the same pixel value may occur, making the image looksmeared out. Which one of the artifacts that damages the images most dependson the characteristics of the image.

The stage where the disparity map is being warped according to itself is, assaid above, a rough approximation. One way to enhance this approximation is tomake the transformation in many small steps. This could increase the accuracy ofthe disparity to the cost of longer computation time.


Chapter 6

Implementation of viewinterpolation

Initially, both method 1 and 2 were implemented and tested but since method 1has some major benefits the work was in a later stage focused on this method.The decisive factors for this choice were:

• The same depth maps can be used for all outputs, there is no need to calculatea depth map for each perspective.

• No arithmetic operations need to be performed to combine information frommore than one input image.

• The sharpness of the images is maintained.

The crucial limitations to consider are the available number of multipliers andamount of internal Block RAM (BRAM), which is suitable to use as buffers. Thefactors mentioned above all demand one or both of these resources.

The implementation uses depth maps to warp the images. This choice wasdone because the depth maps are already created in the rendering performed onthe graphics card, using OpenGL, and the first use of this application will betogether with computer synthesized images. If disparity maps should be used, thedepth maps must be transformed into disparity maps, which is an unnecessaryoperation right now.

Since one entire sequence is created from the same set of depth maps andimages, the different views can be calculated in parallel on the FPGA. The im-plementation is done for two input images since this is a probable setup and givesgood result from a relative wide viewing angle. The input images and depth mapsare assumed to be stored on external RAM on the PCB, as described in chapter4, so that full control of the input data flow is obtained. Reading and writing tothis RAM will not be considered here. The internal BRAM will be used as bufferson the input, after the view interpolation and on the output after the Multi Videoslicing. See Figure 6.1 for a block diagram of the entire chain of view interpo-lation and Multi Video slicing. The interpolation is, when the theory discussed

31

32 Implementation of view interpolation

in chapter 5 is understood, quite straight forward and can easily be explained asfollowing:

1. When the input buffers are not empty, the read address creator creates a readaddress to these. This address is also sent to the input selector together witha signal that tells if the current transformation is the first or second of thatparticular row.

2. The input selector receives data from the input buffers, the read address andcontrol signal and calculates warping and translation factors for all perspec-tives.

3. The data, address, warping and translation factors are sent to the transfor-mation unit which calculates the output positions.

4. The data is written into the calculated positions in the view buffers. If therow is finished and two interpolations have been performed, as discussed insection 5.2.1, Multi Video slicing can be performed. Otherwise, return tostep 1.

Figure 6.1. Simplified block diagram for view interception and Multi Video. All con-nections are not visible.

Figure 6.2 and 6.3 show two input images, left- and rightmost and two interme-diate perspectives that have been interpolated with the implementation describedabove. Even though small artifacts can be seen, the quality is astonishing good.On the right side of the interpolated images of the ant, vertical lines are visible.These lines are positions that have not been written and since the buffers are notemptied between rows, some pixels keep the value from the previous row.

6.1 Discussion 33

Figure 6.2. Input images and interpolated intermediate perspectives, ant.

Figure 6.3. Input images and interpolated intermediate perspectives, space ship.

6.1 DiscussionThe result obtained by this view interpolation function is better than the author’soriginal expectation. Further improvements can be made but the result is alreadysatisfactory.

The first part of the interpolation, where the image which is most far awayfrom the wanted perspective is used, does a lot of work that is later overwritten.This is quite unnecessary but the author could not come up with a good idea toavoid this redundancy. One method that would decrease the computation time,to the cost of more buffering, is to interpolate the same perspective from bothinput images into two separate buffers at the same time and also keep in mindwhich pixels interpolated from the dominating image that have been written. Thedominating image is the left for a perspective to the left and vice versa as discussedin section 5.2.1. The pixels that have not been written from the dominating imagewould then be filled up with the corresponding pixels from the non-dominatingimage. This method would use one more buffer and one more multiplier per viewbut the average computation time would decrease quite drastically.

6.2 Future WorkAs discussed in section 5.2, quality improvements lead to longer computation time.Despite this, some extra filling should be done to eliminate the vertical lines thatare obvious in Figure 6.2. The surrounding pixel padding is very easy to implementand should be further investigated.

The function should, together with the Multi Video slicer, be integrated withthe existing firmware. This is shortly discussed in section 7.2.

34 Implementation of view interpolation

Chapter 7

Implementation of MultiVideo

The implementation of Multi Video is quite simple, see Figure 7.1 and 7.2 forillustrations of the basic principles of the operation. For a shutter with slits ofequal size, the width of each slice should be the same. This means that basicallyall information that is needed before starting to slice the perspectives are whichperspective to start with and the width of each slice. Which perspective to startwith is determined by the shutter setup and the slice width can be calculated as

slice width = image widthnumber of zones · number of Multi Video channels . (7.1)

This gives a quite complete description of the method but some problems andlimitation still exist. One problem is the beginning and end of each row where theslice seldom is as wide as the others, as illustrated in Figure 7.3. Notice that onlya few cameras and frustums are shown and that the scale might not be correct, allfor illustrative purpose. To determine the width of the outer slices, more advancedcalculations have to be done.

One limitation with the approach mentioned above is that all shutter slitshave to be of the same width. If different widths of the slits are wanted or varyingslice widths are wanted of any another reason, the width of each slice have to becalculated separately.

As mentioned in section 2.3.1, the function used to calculate the Multi Videofrustums are a function of many variables. Setred have earlier developed a goodand dynamic software application that calculates the frustums for an arbitrarynumber of Multi Video channels and output frames and this application will beused for this thesis. The reason for not using the method mentioned above or tocalculate the frustums on the FPGA is that maximal flexibility is wanted. Eventhough the parameters to the function can be sent to the FPGA, changes in thefunction are hard to do if it is implemented in hardware. The calculations of thefrustums are not that demanding and they only have to be performed once foreach camera setup and do therefore not burden the GPU very much. The amount

35

36 Implementation of Multi Video

Figure 7.1. Example of Multi Video output frame.

Figure 7.2. Another example of Multi Video output frame.

of data that need to be sent to the FPGA is relatively small as well since thefrustum values only are sent to the FPGA at start up or when the camera setupis changed. During operation, the frustum values are stored in BRAM.

The slicing process is very easy and is done as following:

1. Compare the frustum with the current address. If the address is a thresholdvalue, where the next perspective slice begin, jump to point 2. Otherwise,jump to point 3.

2. At a threshold, the multi video channel should be changed, as illustratedin Figure 7.1. Read pixel from the updated channel and write this to theoutput frame. Read new frustum threshold from memory and increase thewrite address. Jump to point 1.

3. Inside a slice, read pixel from the current channel and write this to the outputbuffer. Increase the address and jump to point 1.

Notice that no warping or shifting is done in this process which means thatthe read and write address always should be the same for all pixels. As the view

7.1 Discussion 37

Figure 7.3. Illustration of different slice width at screen edge.

(a) Multi Video sliced ant model. (b) Multi Video sliced ship model.

Figure 7.4. Outputs from the Multi Video slicer.

buffers are read synchronous from top to bottom, it is suitable to read the pixelsfrom a certain position in all views at the same time and swap output buffersas each threshold is reached. This would end up with a big multiplexer/decoderstructure.

Figure 7.4 shows two output frames from the Multi Video unit with 16 MultiVideo channels. Figure 7.4(a) shows the ant and Figure 7.4(b) the space ship fromchapter 6.

7.1 DiscussionThe number of frustum sets are always the same as the number of output framesand since the frustums are only written in the setup phase, it is suitable to use thesame buffers to the frustum values as to the output frames. The output row is notmore than 1024 pixels wide which means that position 1024 and up are available forfrustum values, if the addressing starts at zero. The BRAMs can be implementedwith dual read ports, as described in [19], which means that this RAM sharingdoes not result in reading collisions. The Block RAMs are 18Kb each and can be

38 Implementation of Multi Video

configurable in different ways [20]. To fit the frustums, the memory should need tobe 10 bits wide, since the row is 1024 pixels wide and 210 = 1024. This is howevernot suitable since the BRAM can be configured as 16bit × 1024 or 8bit × 2048according to [20]. The latter configuration can be used, if extra care is taken whenthe frustums are read and written. Eight bits means that the maximum value tobe stored is 255. If the frustum values are stored as real value modulo 256 andthis is compensated for in the reading process, an eight bit representation of thefrustum thresholds could be used. Since the frustum values are increasing, sucha compensation would be easy to implement. If, for example, 250 is the currentthreshold and the next is read as five, the correct next freehold is 256 + 5 = 261.

It could be interesting to perform the frustum calculations on the fly on theFPGA instead of pre-calculate them all and store the data. This would decreasethe traffic on the connection channel and the work load for the PC to a cost oflower flexibility.

7.2 Future workThe most important next step in this work is to integrate the view interpolationfunction and Multi Video slicer with the existing firmware. To do this, a goodunderstanding of the existing system is required. Some interjacent controllers tocreate control signals might also be needed. As mentioned in chapter 1, the imagesare assumed to be placed in external RAM and a controller for this RAM shouldbe created or, if one exist, modified.

Tests have shown that increasing the number of Multi Video frames increasesthe image quality on the display [9]. 16 channels, as used in this thesis, is arelative small number as the best quality is achieved with about 200 channels,where for example the tearing is almost gone. When trying to render this amountof perspectives on the FPGA, the resources will be insufficient. 200 rows cannot be buffered at the same time and 200 multiplications can not be performedsimultaneously. On the other hand, the number of output frames will be low incomparison and a lot of redundant information will therefore be created. If only theneeded pieces of each perspective could be interpolated, this quality improvementwould not increase the computation cost very much. With method 2 this is noproblem since the gather approach makes it possible to pick a pixel in the outputframe and interpolate it’s value. The problem with this method is that the depthmap has to be modified for each perspective. If a limitation of the maximumallowed disparity is set, an area covering the wanted slice and the near surroundingcan be interpolated using method 1. This will decrease the work load but stilllead to redundant calculations. A further investigation of this problem should beinteresting and probably give quite good quality improvements.

Chapter 8

Conclusions and finaldiscussion

When the functions have been implemented, some measurements and discussionsof performance are of interest. This, together with some conclusions and overalldiscussions are given in this chapter.

8.1 Performance

This section contains some measurements of performance of the bilinear interpo-lation block and the view interpolation-Multi Video block as well as some furtherdiscussions of possible performance improvements. Remember that these mea-surements and discussions only treat the implemented blocks and no care hasbeen taken of the existing firmware. The latency of the complete signal chainfrom the PC to the screen might be significant higher.

8.1.1 Performance of the bilinear up scaler

The performance of the bilinear interpolation block is strongly dependent of anumber of factors, as discussed in section 4.1. The decisive factor is the availablehardware resources which determines how much of the interpolation that can bedone in parallel. With unlimited hardware resources one entire image can basi-cally be interpolated in a couple of clock cycles, but with limited resources thecomputation time increases. Two examples are shown in Table 8.1.1, the first pro-ducing one output pixel per clock and hence using four multipliers and the secondusing all 48 multipliers that exists in the XC4VLX25. The values are theoreticallyand might differ slightly from the real values. A clock frequency of 200 MHz isassumed.

39

40 Conclusions and final discussion

Table 8.1. Calculation times for two different implementations.

Pixels per clock Number of multipliers calculation timerequired per image @ 200 MHz

1 4 3.90 ms12 48 0.33 ms

8.1.2 Performance of the view interpolation-Multi Videochain

The latency induced for each 1024 pixel row in the view interpolation-Multi Videochain is shown in Table 8.1.2. Both number of clock cycles and time in microsec-onds for a clock frequency of 200 MHz are given. The rightmost column gives thelatency for an entire 1024× 768 set in milliseconds. All numbers are measured inModelSim.

Table 8.2. Latencies measured in Modelsim.

Process latency row latency row latency entire set[clock cycles] [µs @ 200 MHz] [ms @ 200 MHz]

Filling in buffers 1024 5.12 3.93View interpolation 2060 10.3 7.91Multi Video slicing 1041 5.21 4.00Read out buffers 1026 5.13 3.94

Total 5151 25.76 19.78

A total latency of 19.78 ms gives a theoretical maximal frame rate of 119.78·10−3 ≈

50 Hz which should be enough to avoid flicker. This can however be optimizeda bit since the values above assume that all the operations are done sequentially,meaning that only one operation is performed at a time. Looking at the signalchain it is quite clear that many of the operations can be performed simultane-ously. If the left input row is stored from the left and the right from the right,according to the discussion of read directions in section 5.1.1, the input bufferscan be read as soon as they contain useful data. Besides this, an improvementthat can be done directly without any assumptions or limitations is that the inputbuffers can be filled while the output buffers are read. This change is very easyto make and would decrease the latency to about 16 ms and hence enable a framerate of about 63Hz.

The FPGAs are currently aimed to run in 200 MHz, limited by the RAM. Ifa faster RAM is used the clock frequency can probably be increased, enabling anenhanced frame rate.

8.2 DVI load reduction 41

8.2 DVI load reductionThis section discusses the reduction of the traffic on the DVI cables and the con-sequences of this.

8.2.1 DVI load reduction with the up scaler integratedTests have shown that small up scalings can be performed with bilinear interpo-lation without major quality deteriorations. The maximal scale factor to achievealmost non visible distortion is about two. Sending an image of half the widthreduces the amount of data by a number of four, if the proportions will be pre-served. Consider a single DVI channel, where a maximum of 3 ·150 ·106 = 450 ·106

pixels per second can be sent according to section 1.2. Input greyscale images of1024× 768 pixels can, using all three channels, be sent in a maximal frame rate of3·150·106

1024·768 ≈ 572 Hz. If the views are interpolated on the graphics card a minimumof 16 frames will be sent, limiting the frame rate to 572

16 ≈ 35 Hz. If the inputframes instead are 640 × 480 pixels, and scaled up on the FPGA, the maximalframe rate over one DVI is 3·150·106

640·480·16 ≈ 91 Hz.

8.2.2 DVI load reduction with view interpolation integratedThe maximal frame rate achieved with 16 frames and one single DVI is, as cal-culated in the previous chapter, approximately 35 Hz. If the view interpolationinstead is performed on the FPGA from stereoscopic inputs, the amount of datathat need to be sent over the DVI is reduced by a factor of four. This means thatthe maximal possible frame rate is increased by a factor of four to 140 Hz.

Note that the calculations made above are for greyscale images, the differencefor colour images are larger. Since the colour images are coded in RGB, 16 ·3 = 48frames need to be sent, limiting the frame rate to approximately 12 Hz for asingle DVI. For interpolation from stereoscopic inputs on the FPGA, 2 · 3 = 6colour images and two depth maps has to be sent, limiting the frame rate toapproximately 70 Hz. This means that flicker free sequences can be shown fromone single DVI without massive calculations on the graphics card. With thisbehaviour, a relative simple PC can be used to feed images to the screen.

8.3 ConclusionsThe measurements discussed in this section show that performance enhancementscan be achieved if parts of the rendering process are performed on the FPGA.The major advantage is however the decreasing requirements on the PC and DVIconnections, making the screen more flexible since no special components have tobe used. Some work is still needed before this improvements are of use. First,the blocks need to be integrated with the existing firmware. To do this, a goodunderstanding of the firmware is required and some extra control blocks might beneeded. This integration should not be a problem but requires time consumingwork, time that does not exist within the frames of this thesis. When the functions

42 Conclusions and final discussion

have been integrated and tested, further quality and speed improvements can bedone. Some ideas of improvements are discussed in earlier chapters.

Bibliography

[1] Christian Møller. Scanning Slit 3D displays. PhD thesis. Cambridge University.2005.

[2] Real-time FPGA-based architecture for bicubic interpolation: an applicationfor image scaling. IEEE Computer Society. 2005.

[3] Oscar Thulin. Intermediate View Interpolation of Stereoscopic Images for 3D-Display. Linköpings Universitet. 2006.

[4] Schalkoff, Robert J. Digital image processing and computer vision. 1989. ISBN0-471-50536-6.

[5] GL Functions. Microsoft Developer Network.<http://msdn2.microsoft.com/en-us/library/ms537040.aspx>. 17 August2007.

[6] C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winderand Richard Szeliski. High-quality video view interpolation using a layeredrepresentation. 2004.

[7] Shenchang Eric Chen and Lance Williams. View interpolation for Image Syn-thesis. ACM Press. 1993. ISBN 0-89791-601-8.

[8] Martin J. Tovée. An introduction to the visual system. Cambridge UniversityPress. 1996. ISBN 0-521-48290-9.

[9] Ola Haavik. Improved rendering on scanning slit 3D displays. Norwegian Uni-versity of Science and Technology. 2007.

[10] Charles Wheatstone. Contributions to the physiology of vision. - part thefirst. on some remarkable, and hitherto unobserved, phenomena of binocularvision. Philosophical Transactions.

[11] Deiter Just and Hartmunt Runge. Technique for autostereoscopic image, filmand television acquisition and display by multiaperture multiplexing. UnitedStates Patent 6674463, 2004. 2.3.2, 4.3.2

[13] Tomaso Poggio. Vision by man and machine. Part of Readings from ScientificJournal - The Perceptual World. 1990. ISBN 0-7167-2068-X.

43

44 Bibliography

[12] Michael W. Levine. Fundamentals of Sensation and Perception. Oxford Uni-versity Press. 2000. ISBN 0-1985-2467-6.

[14] Xilinx Virtex-4 Family Overview.<http://direct.xilinx.com/bvdocs/publications/ds112.pdf>. 17 August 2007.

[15] Digital Image Interpolation.<http://www.cambridgeincolour.com/tutorials/image-interpolation.htm>. 13August 2007.

[16] RDoc Documentation. <http://www.ruby-doc.org/core/>. 13 August 2007.

[17] Peter J. Ashenden. The VHDL Cookbook. First Edition.<http://tams-www.informatik.uni-hamburg.de/vhdl/doc/cookbook/VHDL-Cookbook.pdf>. 20 Mars 2007.

[18] Tobias Oetiker, Hubert Partl, Irene Hyna and Elisabeth Schlegl. The Not SoShort Introduction to LATEX2ε. Version 4.20.<http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf>. 7 May2007.

[19] Xilinx Synthesis Technology User Guide. Xilinx, Inc.<http://toolbox.xilinx.com/docsan/xilinx5/pdf/docs/xst/xst.pdf>. 2 April2007.

[20] Virtex-4 User Guide. UG070 v2.3. Xilinx, Inc.<http://www.xilinx.com/bvdocs/userguides/ug070.pdf>. 21 August 2007.

[21] Digital Visual Interface. Digital Display Working Group. Revision 1.0.<http://www.ddwg.org/lib/dvi_10.pdf>. 3 September 2007.

Appendix A

VHDL code example

A.1 Block RAM instantiationThis code instantiate the internal Block RAMs, as shown in the Xilinx SynthesisTechnology User Guide [19].

entity row_memory isport (clk : in std_logic;

we : in std_logic;w_address : in std_logic_vector(address_width downto 0);r_address : in std_logic_vector(address_width downto 0);di : in std_logic_vector(pixel_size downto 0);do_prim : out std_logic_vector(pixel_size downto 0);do_sec : out std_logic_vector(pixel_size downto 0));

end row_memory;

architecture row_memory_arch of row_ memory is

type ram_type is array (0 to memory_depth)of std_logic_vector(pixel_size downto 0);

signal RAM : ram_type;signal read_address_prim : std_logic_vector(address_width downto 0);signal read_address_sec : std_logic_vector(address_width downto 0);

begin

process (clk)begin

if (clk’event and clk = ’1’) thenif (we = ’1’) then

RAM(to_integer(unsigned(w_address))) <= di;end if;read_address_prim <= r_address;

45

46 VHDL code example

read_address_sec <= w_address;end if;

end process;

do_prim <= RAM(to_integer(unsigned(read_address_prim)));do_sec <= RAM(to_integer(unsigned(read_address_sec)));

end row_memory_arch;

A.2 View interpolation blockThis code performs the interpolation of the wanted perspectives. The warp_factorand transl_factor are pre-calculated as explained in chapter 6.

entity transformation_unit isPort (clk : in std_logic;

reset : in std_logic;enable : in std_logic;pixel_in : in std_logic_vector(pixel_size downto 0);depth_in : in std_logic_vector(depth_width downto 0);check_address : in std_logic_vector(address_width downto 0);warp_factor : in std_logic_vector(coeff_size downto 0);transl_factor : in std_logic_vector(coeff_size downto 0);pixel_out : out std_logic_vector(pixel_size downto 0);write_address : out std_logic_vector(address_width downto 0);we : out std_logic);

end transformation_unit;

architecture transformation_unit_arch of transformation_unit is

signal pixel_tmp : std_logic_vector(pixel_size downto 0);signal pixel_tmp_2 : std_logic_vector(pixel_size downto 0);signal write_address_tmp : signed(33 downto 0);signal depth_tmp : signed(25 downto 0);signal tmp : signed(10 downto 0);

begin

process(clk, reset)

beginif reset = ’1’ then

we <= ’0’;pixel_tmp <= (others => ’0’);pixel_tmp_2 <= (others => ’0’);pixel_out <= (others => ’0’);depth_tmp <= (others => ’0’);write_address_tmp <= (others => ’0’);write_address <= (others => ’0’);tmp <= (others => ’0’);

A.2 View interpolation block 47

depth_tmp <= (others => ’0’);elsif clk’event and (clk = ’1’) then

if enable = ’1’ then

-- Delay pixelpixel_out <= pixel_tmp_2;pixel_tmp_2 <= pixel_tmp;pixel_tmp <= pixel_in;

-- Pipelined pixel position calulationtmp <= signed(’0’ & check_address);depth_tmp <= signed(’0’ & depth_in) * signed(warp_factor);write_address_tmp <= (signed(tmp) & "00000000") +

(depth_tmp + signed(transl_factor));

-- Output write address to view bufferswrite_address <= std_logic_vector(write_address_tmp(17 downto 8));

-- If pixel warped outside image, do not writeif unsigned(std_logic_vector(write_address_tmp(18 downto 8)))

< ’0’ & unsigned(input_width) thenwe <= ’1’;

elsewe <= ’0’;

end if;

end if;end if;end process;end transformation_unit_arch;

UpphovsrättDetta dokument hålls tillgängligt på Internet — eller dess framtida ersättare —under 25 år från publiceringsdatum under förutsättning att inga extraordinäraomständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten vid ensenare tidpunkt kan inte upphäva detta tillstånd. All annan användning av doku-mentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligheten finns det lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsmani den omfattning som god sed kräver vid användning av dokumentet på ovan be-skrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan formeller i sådant sammanhang som är kränkande för upphovsmannens litterära ellerkonstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se för-lagets hemsida http://www.ep.liu.se/

CopyrightThe publishers will keep this document online on the Internet — or its possi-ble replacement — for a period of 25 years from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for his/her own use andto use it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other uses ofthe document are conditional on the consent of the copyright owner. The publisherhas taken technical and administrative measures to assure authenticity, securityand accessibility.

According to intellectual property law the author has the right to be mentionedwhen his/her work is accessed as described above and to be protected againstinfringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity, pleaserefer to its www home page: http://www.ep.liu.se/

c© Martin Wahlstedt

Documents

DepartmentofElectricalEngineering - DiVA portal