11
van Zon and Ali: Automated Video Chain Optimization 593 AUTOMATED VIDEO CHAIN OPTIMIZATION Kees van Zon and Walid Ali Philips Research, Briarcliff, NY, USA ABSTRACT Video processing algorithms found in complex video appliances such as television sets and set top boxes exhibit an interdependency that makes it is difficult to predict the picture quality of an endproduct before it is actually built. This quality is likely to improve when , algorithm interaction is explicitly considered. Moreover, video algorithms tend to have many programmable parameters, which are traditionally tuned in manual fashion. Tuning these parameters automatically rather than manually is likely to speed up product development. We present a methodology that addresses these issues by means of a genetic algorithm that, driven by a novel objective image quality metric, Jinds high-quality configurations of the video processing chain of complex video products. 1 INTRODUCTION Ever since the invention of television, the communication of information in the form of video has played an important role in the global human society. With TVs still going strong, the advent of the PC, the internet, and cellular telephony have brought recent boosts to video communication. In the midst of ongoing innovations, one may observe the video delivery chain - all processing steps applied to the video signal on its path from source (camera) to destination (screen) via transmission in time and space - to become increasingly complex. Restricting ourselves for practical purposes to the area of consumer television, Moore’s Law has allowed the introduction of low-cost high-speed digital signal processing that led to a proliferation of new features at the receiver side, to new compression-based transmission standards that allow the communication of higher resolution video in the available bandwidth (but also introduce new types of artifacts), and to new flat display technologies such as plasma and LCD whose characteristics differ widely from the traditional cathode ray tube (CRT). Meanwhile, competition in the global economy continues to drive the TV market to shorter product cycles and to higher performances at lower cost. In the IC industry, which delivers many of the ever more complex TV components, these often contradicting requirements are addressed by increasingly sophisticated IC design tools; and, with a TV’s software content growing rapidly due to the advent of programmable components for both control and signal processing Manuscript received June 25, 2001 purposes, more and more emphasis is also put on system design tools that support hardware/software co-design. As image quality remains an important distinguishing feature in a mature television market, many of the digital processing functions introduced in (especially high-end) TV sets focus on image enhancement. Some typical functions are contrast enhancement, sharpness enhancement, noise reduction, blocking artifact reduction, scan rate conversion, and deinterlacing; many others exist [ 13. The growing complexity caused by the introduction of new and more complicated video enhancement functions has consequences however for the overall system behavior. Complexity and video quality can be inversely related: if not properly managed,. increased complexity may, ironically, degrade the image quality even when introduced for the opposite purpose. In this paper, we will investigate that premise and propose a functional design methodology which may help TV manufacturers as well as TV component manufacturers to further improve video quality while speeding up product development. 2 DEFINING VIDEO PROCESSING CHAINS We focus on digital video‘ processing functions that operate in the YUV domain or in the RGB domain, which covers virtually all image enhancement functions. Each function is regarded as a black box whose behavior can only be controlled through interfaces that are provided for that purpose. Of a set of N such functions, each function f, has the following specifiers: - anidentifierne (1, .. ,N} - - - an input bit precision Bh,,, E {B& . . . B”} an output bit precision BOub,, E {B- . .. B”} a set of P, programmable parameters {P,,l ... Pap, } where P,i E {Pn,jmh ... Psi,,,} V i=l.. .P, Here, Bmh and B , , represent the lowest and highest interface precision (with corresponding internal precision) to be considered, and Pn,i,mh and P,,jmax are the minimum and maximum value of parameter Pn,i, respectively. For the sake of simplicity, we furthermore assume that no two functions have the property of commutativity, i.e., [va,b 3 aE{l ... N),bE (1 ... N),a+bIf,(f~,)#ft,(f.)] (1) 0098 3063/00 $10.00 @ 2001 IEEE

Automated video chain optimization

Embed Size (px)

Citation preview

van Zon and Ali: Automated Video Chain Optimization 593

AUTOMATED VIDEO CHAIN OPTIMIZATION Kees van Zon and Walid Ali

Philips Research, Briarcliff, NY, USA

ABSTRACT

Video processing algorithms found in complex video appliances such as television sets and set top boxes exhibit an interdependency that makes it is difficult to predict the picture quality of an endproduct before it is actually built. This quality is likely to improve when , algorithm interaction is explicitly considered. Moreover, video algorithms tend to have many programmable parameters, which are traditionally tuned in manual fashion. Tuning these parameters automatically rather than manually is likely to speed up product development. We present a methodology that addresses these issues by means of a genetic algorithm that, driven by a novel objective image quality metric, Jinds high-quality configurations of the video processing chain of complex video products.

1 INTRODUCTION

Ever since the invention of television, the communication of information in the form of video has played an important role in the global human society. With TVs still going strong, the advent of the PC, the internet, and cellular telephony have brought recent boosts to video communication. In the midst of ongoing innovations, one may observe the video delivery chain - all processing steps applied to the video signal on its path from source (camera) to destination (screen) via transmission in time and space - to become increasingly complex. Restricting ourselves for practical purposes to the area of consumer television, Moore’s Law has allowed the introduction of low-cost high-speed digital signal processing that led to a proliferation of new features at the receiver side, to new compression-based transmission standards that allow the communication of higher resolution video in the available bandwidth (but also introduce new types of artifacts), and to new flat display technologies such as plasma and LCD whose characteristics differ widely from the traditional cathode ray tube (CRT). Meanwhile, competition in the global economy continues to drive the TV market to shorter product cycles and to higher performances at lower cost. In the IC industry, which delivers many of the ever more complex TV components, these often contradicting requirements are addressed by increasingly sophisticated IC design tools; and, with a TV’s software content growing rapidly due to the advent of programmable components for both control and signal processing

Manuscript received June 25, 2001

purposes, more and more emphasis is also put on system design tools that support hardware/software co-design.

As image quality remains an important distinguishing feature in a mature television market, many of the digital processing functions introduced in (especially high-end) TV sets focus on image enhancement. Some typical functions are contrast enhancement, sharpness enhancement, noise reduction, blocking artifact reduction, scan rate conversion, and deinterlacing; many others exist [ 13. The growing complexity caused by the introduction of new and more complicated video enhancement functions has consequences however for the overall system behavior. Complexity and video quality can be inversely related: if not properly managed,. increased complexity may, ironically, degrade the image quality even when introduced for the opposite purpose. In this paper, we will investigate that premise and propose a functional design methodology which may help TV manufacturers as well as TV component manufacturers to further improve video quality while speeding up product development.

2 DEFINING VIDEO PROCESSING CHAINS

We focus on digital video‘ processing functions that operate in the YUV domain or in the RGB domain, which covers virtually all image enhancement functions. Each function is regarded as a black box whose behavior can only be controlled through interfaces that are provided for that purpose. Of a set of N such functions, each function f, has the following specifiers:

- anidentifierne (1, .. ,N} - - -

an input bit precision Bh,,, E {B& . . . B”} an output bit precision BOub,, E {B- . . . B”} a set of P, programmable parameters {P,,l ... Pap, } where P,i E {Pn,jmh ... Psi,,,} V i=l.. .P,

Here, Bmh and B,, represent the lowest and highest interface precision (with corresponding internal precision) to be considered, and Pn,i,mh and P,,jmax are the minimum and maximum value of parameter Pn,i, respectively.

For the sake of simplicity, we furthermore assume that no two functions have the property of commutativity, i.e.,

[ v a , b 3 a E { l ... N),bE (1 ... N),a+bIf,(f~,)#ft,(f.)] (1)

0098 3063/00 $10.00 @ 2001 IEEE

594

This often holds true for video processing functions, many of which are non-linear in nature, e.g., due to adaptive (data dependent) processing. Commutativity can also be lost in other ways, e.g., when a change in sampling rate occurs. Consider, for example, a sharpness enhancement function which enhances the perceived sharpness of an image by boosting high-frequency components, and a downscaling function which reduces the size of an image. When the sharpness is enhanced first and the result scaled down in size, the high, frequencies that were originally added will tend to be eliminated, thereby reducing the effect of the Sharpness enhancement function (Figure la). When the order of these functions is reversed however, the sharpness of the smaller image will be enhanced after scaling and a sharper image will result (Figure 1 b).

I-...\? 1-77 h Mf, Mf, Mf,

a) sharpness enhancement followed by downscaling

IEEE Transactions on Consumer Electronics, Vol. 47, No. 3, AUGUST 2001

Mf, Mf, ‘hf, b) downscaling followed by sharpness enhancement

Figure 1 - impact of function order

We need two more assumptions in order to define a video processing chain for our purposes: i) all N functions operate on the same video component (e.g., luminance Cy), red (R), blue (B)); ii) the N functions are applied in strictly cascaded fashion. A generic video processing chain CN can now be defined as any cascade of the N video processing functions that we started out with. Many different configurations of this chain are possible, defined by the specifiers of the individual functions as well as by the order in which these function appear in the chain. The latter plays a role because of the non-commutativity property defined by equation (1). Each configuration of the chain can now be identified uniquely as

where k represents the k-th permutation of the N functions, Ba&) and Bo,,,&) are the input and output bit precision of the n-th function in the chain, and P,,o,i4s the value of i-th parameter of the n-th function in the chain, respectively. Note that the identifier n of the n-th function in the chain depends on the permutation of the functions, hence the notation n(k).

3 DEFINING VIDEO CHAIN OPTIMIZATION

Since the behavior of a given video chain depends on the order of its functions and on all specifiers of all N functions, the perceptual quality of the video produced by this chain will also depend on these parameters. Denoting this quality as QN, we may therefore write

QN = a(%) (3)

The problem of Video Chain Optimization can now be defined as finding that confguration of our video chain CN that produces the best possible video quality. Denoting this golden configuration as CN, we get:

G = C N I Q ( C N ) = b r (4)

with QN the highest quality that achievable by chain CN:

UN = Q(C(4 Bin.n(M, BouLn(4, pn(4.i))

2 Q(C(k Btn.ngc), Bcwtn(0, Pn0.i))

(5)

v k fk B1n.m f B i n , n ~ , Bcwtn(Q * &.n(4, Pn(u.1 f PPW,.~

The fact that many video processing functions have non- linear andor data dependent behavior not only causes loss of commutativity, but also causes the problem of video chain configuration to be Np (Non Polynomial) har& cf. equation (7). The exact impact of a random change to the chain configuration is therefore hard to establish.

Note that a video chain can be - and in practice will have to be - optimized for aspects other than image quality alone. Particularly, it is generally desired to reduce the chain’s complexity in order to minimize the cost of its implementation. This can for instance be supported by minimizing the bit precision and by keeping certain functions together so they can share implementation resources. Although such additional requirements can readily be combined with video quality into a single cost function, we will not consider such possibilities in this paper and restrict ourselves instead to optimizing for video quality alone.

4 AUTOMATE VIDEO CHAIN OPTIMIZATION?

Current practice in the design of video chains, i.e., the application of a given set of video processing functions for implementing consumer video equipment and key components thereof, involves a great deal of experimental optimization using subjective evaluation of intermediate results. The time that needs to be spent on optimizing such a chain geneqlly increases with the number of

van Zon and Ali: Automated Video Chain Optimization 595

configurations that this chain can have; simultaneously, the likelihood that the global optimum will be found decreases. While it is hard to collect exact data on this issue, experts in the field acknowledge that this is a correct observation and that experimentally optimizing a chain can take months of tweaking [2]. To get an impression of the magnitude of the problem that we are trying to solve, we calculate the size of what we call the design space of a video chain CN. The dimensionality D(CN) of this space equals the number of parameters that can be varied. From equation (2), we can derive that this number amounts to

(6)

Here, one dimension accounts for the order of functions; N+l dimensions account for the bit precision at the interfaces (since Bo, , = B,,,+l for the N-1 intermediate bit precisions, plus the input and output precision of the overall chain); and the summation accounts for the programmable control parameters. Since only discrete points can be reached in this space, the number Z(CN) of possible configurations of CN can be calculated exactly:

N

n=l D(CN) = 2 + N + C Pn

N Pn p - p APnraLrme) ('I

where AI',,,, is the step size used for incrementing parameter P",,. When we simplify things a bit by defining

Z(CN) = N! (l+Bm,-Bmin)"' . n (1 + n=l 1=1

B = 1 +B"-Bmfin (8)

(9)

PN= CPn (1 0)

- Pn.i.mln v n E (1 ... NI, i E (1 ... pn) APn i

P = l +

N

n=l we get

Here, N represents, as before, the number of functions in chain CN; B is the number of possible values of the bit precision at each interface; P is the number of values that each programmable parameter can have (which is made constant by simply dividing the range of each parameter into P-1 intervals), and P N represents the total number of programmable parameters of the chain.

TQ get an impression of the size of the configuration space, let's assume a modest chain with four functions (N=4), an interface bit precision of 7.. 11 bits (B=5), three programmable parameters per function (Pn=3, P ~ 1 2 ) , and ten possible settings per parameter (P=lO). For this chain, we get

As it turns out, even this little chain has a very large number of possible Configurations - far too many for exhaustive exploration. And because complex functions may actually have far more programmable parameters than three - a high quality scan rate converter may have over one hundred [3] - the design space of a video chain can be of truly cosmic proportions.

The fact that high-quality video products are being produced and sold today proves that such large design spaces can be handled with today's design methods. Two factors that can significantly reduce the size of a video chain's design space are cost and expertise. Whereas the former poses restrictions on the order of fbnctions and the possibilities for the bit precision, the latter helps eliminate unfavorable parameter settings. The design space can also be shrunk by the engineering technique of divide and conquer - which in this case means splitting a chain into subchains. When the four-hction chain mentioned above may a priori be split into two subchains which can be optimized independently, the size of each subchain's design space equals

3 6 a Z(G) = 2! .5 . 10 = 2.5.10

The size of the original design space is in that case reduced by about 1 08.

As mentioned at the beginning of this section however, the fact remains that designing a video chain involves a great deal of experimental optimization and is a time consuming process that does not warrant the best possible result. Assisting designers by. automating the optimization process can reduce manual efforts and thereby shorten the product design cycle, and can also lead to better end results by searching the design space more thoroughly which increases the likelihood of finding favorable configurations that give a video quality close to Q N .

5 STATIC VS. DYNAMIC OPTIMIZATION

Besides the configuration of the video chain, the perceived video quality QN also depends on some other important factors:

the content of the video stream the characteristics of the display

0 the viewing conditions the taste of the viewer

We should therefore rewrite equation (3) as:

QN = Q(G, content, display, conditions, taste) (14)

596 IEEE Transactions on Consumer Electronics, Vol. 47, No. 3, AUGUST 2001

As discussed in a later section, the display - with its drive electronics - and the viewing conditions may be regarded as fixed, leaving content and taste to be dealt with in a formal manner. Following common practice in video processing, variations in personal taite are addressed by averaging QN over the subjective scores obtained from a given panel of viewers:

(15) GN = Q(CN, content, display, conditions, x) Variations in content are first of all dealt with by using a set of training sequences that cover all image features that are affected by the video chain in a statistically relevant manner. The quality produced by the video chain is then averaged over this training set, which we denote as - - QN = a(&, content, display, conditions, m) (1 6)

There is however another, more complicated aspect of varying video content, which has to do with the manner in which the video chain’s programmable parameters are used. These parameters can be split into two categories: those that are fixed during product development and that don’t change when the product is in operation, and those that are meant to be ongoingly modified during operation in order to adapt the chain’s processing to the instantaneous contents of the video stream. We denote these parameters as being static and dynamic, respectively.

(Note that individual functions may also autonomously adapt their intemal processing to the instantaneous video contents. Since we regard functions as black boxes however, such intemal adaptation is invisible to us; it cannot be influenced, and is therefore not relevant to our optimization problem. We are only concerned with those parameters that can be controlled externally.)

We deal with these two types of parameters by splitting the optimization problem itself into a static and a dynamic part. Static optimization now deals with determination of those chain parameters that are fvted during product development; typically, those are the function order, the interface bit precision, and the static programmable parameters. Dynamic optimization, on the other hand, deals with those chain parameters that need to be run-time adjusted when the product is in operation; these are typically the dynamic programmable parameters.

The resulting static and dynamic optimization problems can be approached with different techniques. The parameter set (P}, of each function f, must therefore be split into a static and a dynamic subset:

{Pin = F‘stat)n U { P w n l n (17)

where

with P,,,, the number of static parameters of function f, and PdPr the number of dynamic parameters of this function. Several strategies can be used for co-optimizing static and dynamic parameters. With a suitable strategy in place, the two problems can be treated independently. In this paper, we will focus on static optimization only, and disregard dynamic parameters hereafter.

6 AUTOMATING STATIC VIDEO CHAIN OPTIMIZATION

The basic mechanism that we applied for achieving automated static video chain optimization is shown in Figure 2 below. Reference video sequences are processed by some initial configuration ofthe video chain, and the quality of the resulting video is measured by an objective cost function. The measured quality level is fed to an optimization algorithm that is capable of reconfiguring the chain, creating a closed loop that can autonomously iterate towards configurations that produce high video quality. We have taken a full-software approach, meaning that all blocks in Figure 2 are implemented in software.

Figure 2 - automated video chain optimization concept

To obtain a practical and useful toolset, the cost function and the optimization algorithm should be chosen such that the following criteria are fulfilled:

1. high likelihood of finding the global optimum CN 2. high reliability of the result 3. quick availability of the result

This translates into a need for the following items:

1. 2.

an optimization algorithm that can avoid local optima a) accurate software models of all bct ions b) a reliable objective video quality metric

van Zon and Ah: Automated Video Chain Optimization 5 97

3. a) software models that execute efficiently b) an optimization algorithm with fast convergence

Whereas items 2a) and 3a) relate to software engineering issues that we will not consider in detail, the remaining items form the heart of the automated video chain optimization concept and will be discussed in the next sections. We start with the measurement of video quality.

7 OBJECTIVE VIDEO QUALITY

The average video quality & defined by equation (1 6) is a subjective measurement of the quality of the video produced by chain CN. Such measurement is done by having a panel of viewers evaluate the reference sequences processed by CN, and averaging their scores after removal of outliers. This procedure is highly time consuming, and should as such be avoided in an automated approach. We therefore need a way to measure the quality of the video produced by chain CN objectively, allowing it to be done by a computer without human intervention. Whereas exact prediction of the average viewer taste may be regarded as impossible, we want this objective measurement QobjSJ to be as close to the subjective measurement as possible:

Q0bj.N QN (20) - -

Besides needing to be reliable, there are some additional requirements that the objective video quality metric (OVQM) must fulfill:

Since the processing chain may alter the reference sequences to be in a different representation format (NTSC, PAL, SECAM, MPEGx, YUV, RGB), to be of different size, and to have a different linelfieldlframe rate, a processed sequence cannot readily be compared to the original; the metric should therefore be single-ended, i.e., it should not be based on comparing the processed sequence to a reference.

0 Since an image contains desired features as well as undesired features which may be affected in different degrees by the processing, the metric must be able to distinguish desired and undesired features. The

.former features include for instance sharpness, contrast, brightness, smoothness of motion, and legibility of text, whereas the latter include noise, blockiness, line / large area flicker, and color errors.

Since the processing chain may improve certain image features yet simultaneously degrade other features - for instance, enhancing the sharpness tends to increase the noise level - the metric must be able to distinguish improvement and degradation.

Since the perceived video quality depends on the display characteristics, on the viewing conditions, and on the human visual system, the metric must take all of these into account.

While the Video Quality Experts Group (VQEG) is working towards an ITU recommendation for objective image quality [5 ] , we applied a pragmatic approach in which we measure a number of key features that viewers tend to evaluate in an image, and combine those into a single quality indicator. In terms of video quality, the taste of an individual can be considered as the balance between his or her preference for desired image features on one hand, and tolerance for undesired features on the other hand. This suggests the following simple way of objectively measuring the quality of a video sequence k

Qobi .k = Z W i ' Fde-s.1.k - &i. Fundes.j.k (21 1

where wi and wj represent positive weight factors, and F h , i ~ and Funhsj& are single-ended measurements of selected desired and undesired image features in sequence k, respectively. Realizing that the results may be improved by combining the individual measurements - which we will denote as submetrics - in non-linear fashion, we based our initial experiments on this linear combination because of its simplicity.

i i

The above requirements for the OVQM can be met through suitable choices of the weight factors. For this purpose, we first select a set of training sequences that cover the features measured by the submetrics in statistically relevant fashion. The submetric values are established for each of these training sequences. In parallel, the training sequences are also rated by a panel of viewers on the display of choice and under the viewing conditions of choice. This process, which is shown inFigwe 3, yields the sets of objective scores {Fh} and {Funds}, as well as a set of subjective scores { Qsub,}.

Figure 3 - OVQM training

598 IEEE Transactions on Consumer Electronics, Vol. 47, No. 3, AUGUST 2001

The weight factors can now be obtained by maximizing the correlation factor R between Qsubj and Qobj, giving

For a given set of weight factors, R is calculated using the Spearman rank order correlation analysis [6]:

where Nmh is the number of training sequences, QsubjSkr

respectively, with Qobj,k the objective video quality of sequence k estimated according to equation (2 1).

It is of course essential that the submetrics cover all relevant image features that are impacted by the chain - what is not measured, cannot be optimized for. Once that has been established, and once the metric has been trained using the above procedure, it can reliably be used for optimizing video chains. Importantly, the results obtained with the metric are only valid

and Qobj,kr are the rank Orders of Qsubjb and Qobj,k

for a specific display for the focus group used for the subjective training for the preferred viewing conditions

When any of these targets changes, the metric must be re- trained; this is for instance the case when a video chain is to be used in conjunction with a different display, or when the end product is to be sold in another part of the world where different regional preferences apply. The training procedure takes only one or two days, which is negligible compared to the amount of perception testing involved in iterative subjective optimization.

8 OPTIMIZATION STRATEGY

We now tum our attention to finding an optimization strategy that meets the criteria of being accurate and fast, with an obvious emphasis on +e former. In section 3, we established that our problem is "-hard, i.e., it is a so- called Non-Polynomial-time complex problem for which no exact solution exists. Such problems need to be addressed by approximation methods, which search a subspace of the total space in order to find "good" solutions rather than the very best one. In our case, that means finding a set of chain configurations { Copt,N} whose video quality is close but not necessarily identical to that of c,.

If the shape of the search space is known prior to optimizing, a good solution can be constructed in stages. Starting from a seed input, partial solutions are successively selected until a complete solution is obtained. Selection is based on an optimization criterion, which is strongly correlated to the cost function. If no such a priori information is however, a local search must be used. Optimization then starts at some initial feasible solution and searches for a better solution in the neighborhood of that solution; searching stops on hitting a local optimum. Initial solution, neighborhood definition and local vs. global optima are critical aspects of the local search method.

Since discontinuities, noise, and local optima are likely features in any search space, the technique that emerges as most suited to our problem is Genetic Algorithms [7]. A genetic algorithm (GA) is a stochastic. iterative non- deterministic search algorithm based on the Darwinian theory of evolution, which assumes no prior information about the search space yet evolves toward the global optimum. Unlike hill climbing, which is local in scope, GAS are independent of the initial configuration; and unlike Tabu search or simulated annealing, which move between states, GAS move between sets of points in the search space with probabilistic transitions, minimizing the probability of being trapped in local optima. Like biological evolution, GAS are blind to reaching an optimum, meaning that a termination condition has to be defined. Common termination criteria are reaching an acceptable approximate solution (which relates to the value of the cost function), reaching a stable approximate solution (which relates to ability to improve any further), reaching a specific number of generations, and reaching a maximum compute time.

Having established GAS as a suitable optimization strategy for our purpose in terms of accuracy, the next section gives more details on utilizing GAS for video chain optimization.

9 GENETIC ALGORITHM FOR AVCO

Genetic Algorithms are based on the Darwinian concept that diversity helps to ensure a population's survival under changing environmental conditions. They are simple and robust methods for optimization and search and have intrinsic parallelism. GAS are iterative procedures that maintain a population of candidate solutions encoded in the form of chromosome strings. The initial population can be selected heuristically or randomly. For each generation, all candidates are evaluated and assigned aptness value, which is the cost function as defined in section 7. Based on their fitness values, candidates are selected for

* .

van Zon and All Automated Video Chain Optimlzatlon 599

reproduction in the next generation. Selected candidates are combined using the genetic recombination operation called crossover. This operator exchanges portions of bit strings in an attempt to produce better candidates with higher fitness for the next generation. Mutation is then applied to perturb the bits of the chromosomes so as to guarantee that the probability of searching a particular subspace of the problem space is never zero [8]. It also prevents the algorithm from becoming trapped at local optima [9] [ 101. The entire population is evaluated again in the next generation and the process continues until it reaches the termination criterion.

.

When applied to optimizing a video processing chain, a given GA chromosome uniquely and completely defines a particular configuration of that chain. The chromosome consists of a number of genes, each of which is a bit string representing a certain characteristic of the chain. Using equations (2), (8 ) , (9) and (lo), the chromosome can in our case contain a gene Gk for the function order k, B genes GB,i, iE { 1..B} for the interface bit precisions B,,,,,,fi, =

Bh,(n+l)~), and PN genes GPJ, jc {~..PN} for the programmable parameters Pn(k),i, giving a total of D(CN) genes. The lengths of the various genes are (in bits):

L(G$ = r log2(N!)1 (24)

L(GB.,) = L(GB) =r log2(B)1 v i E (I..B} (25)

giving a chromosome length L(c) equal to

As an example, each configuration of the four-function chain discussed in section 4, which has N=4, B=5, P=10, and PN=12 and therefore L(Gd=5, L(GB)=~, L(Gp)=4, can be represented by a chromosome of 3+5.3+12.4=66 bits. It should immediately be noted the GA's speed of convergence is proportional to the chromosome length. From that perspective, it is preferred to keep the genes as short as possible, which can be achieved by not applying the simplification of equation (9) and by eliminating the 1.1 operator through proper coding of the individual genes.

Our application of GAs is a variant of the standard genetic search [19]. The initial population of n chromosomes is generated randomly and each of the chromosomes is evaluated. An intermediate population is then generated in the following fashion:

1. The current population is copied to the intermediate population.

2. Each chromosome in the current population is randomly paired with another chromosome, and user- defined crossover is performed if the difference criterion is satisfied. The resulting "children" are evaluated and added to the intermediate population.

The resulting intermediate population has more than n chromosomes - up to 2n if all the chromosome pairs are sufficiently different. The best n chromosomes from the intermediate population are selected and passed to the next generation. Note that no mutation is performed during this stage. Two chromosomes are crossed over only if the (modified Hamming) difference between them is above a threshold. This threshold is lowered when no chromosome pairs can be found with a difference above the threshold. when the threshold reaches zero, a re-initialization (divergence) of the population is done. The best chromosome available is then selected as a representative and copied over to the next generation. Mutating a percentage (35 %) of the bits of this template chromosome generates the rest of the chromosomes.

The algorithm terminates when the number of successful or failed divergences (that did or did not improve the result, respectively) reaches a specified number. The user can also specify the maximum trials (evaluations) allowed.

10 CASESTUDY

In this section, we present a detailed case study in which the above AVCO techniques are exercised.

10.1 Video processing functions

The video chain under consideration is fairly modest. As depicted in Figure 4, it consists of four functions: spatial scaling [ 1 13, histogram modification (a form of contrast enhancement) [ 11, adaptive peaking (a form of sharpness enhancement) [ 121, and spatial noise reduction [l].

V 0

Figure 4 -video chain used for case study

These four functions play were elected because they play a vital role in high-end TV sets [l]. A specific challenge in optimizing this chain lies in the fact that sharpness enhancement and noise reduction are competing functions, in the sense that the former strives to increase the higher frequencies in the luminance signal, whereas the latter simultaneously tends to decrease them [13]. We will be considering luminance processing only.

600

10.2 Objective video quality metric

The image characteristics that are expressly impacted by our chain are contrast, sharpness, and noise. If we are to do a reliable optimization, we must be able to accurately measure at least these features. Before describing the individual submetrics in the remainder of this section, we note that although only clean sequences and sequences contaminated by Gaussian noise were used for this particular experiment, other experiments included sequences contaminated by digital coding artifacts, of which blocking artifacts are the most irritating to the human eye [14]. Our OVQM was therefore equipped to measure that characteristic as well, which allows us to check that the measurement of artifacts that are not present does not disturb the optimization process. We now turn to the submetrics used in our case study.

We define the contrast level F,,, as the normalized gap between those parts of the luminance histogram that contain the lower and upper 5% of the luminance energy. The luminance signal Y is lowpass filtered to Y1,fprior to contrast measurement in order to reduce the impact of noise. Mathematically:

m a y = 26N - 1 (28)

IEEE Transactions on Consumer Electronics, Vol. 47, No. 3, AUGUST 2001

should prove blind for sharpness level, making it a random

hist[il = C ( Y I ~ = i) V i E (O..maxv] picture

maxy Em = hist[i]

i=O

j j+ l

i=o i=O lo = j I C histti] I 0.05.Emt A histti] > 0.05.Emt (31 1

maxy maxy hi = i I hist[i] e 0.95.Em A hist[i] 2 0.95.Ew . (32)

with BN the output bit precision of the video chain, my the maximum value of the original as well as of the lowpass filtered luminance signal, hist[i] the number of times that the luminance value of the picture under test equals i, and bot an estimate of the total luminance energy in the picture’.

Although one is under construction, no reliable objective sharpness measurement was available for our case study, unfortunately. The implication is that the optimization

factor. We did, however, include a submetric that picks up artifacts introduced by an “overdoses” of sharpness enhancement. This metric simply measures clipping level in a picture, which is defined as the number of times the luminance signal hits its minimum or maximum value in a given picture. The reason that this correlates with sharpness is that the form of sharpness enhancement that we apply (adaptive peaking) is based on adding over- and undershoots to luminance transitions. When too much overshoot or undershoot is added, the enhanced edge will clip at maxy or at zero, respectively, which may cause visible artifacts that degrade the perceived video quality. The clipping submetric F,I, is defined in equation (34). As can be seen, a gain of 100 is applied; this is to give the submetric a weight comparable to the other ones. Like the contrast submetric, the range of the clipping measurement is defined to be between 0 and 1.

The noise level Fnoise is measured using the enhanced algorithm outlined by Hentschel and He in [15]. This algorithm is a modification of the noise measurement algorithm described in [16][17][1], which is based on the premise that every picture is highly likely to contain at least one little area of constant luminance, i.e., with low or zero texture. Assuming the absence of other artifacts, any variance in this flat area is a direct consequence of noise. To identify suitably flat areas, the image is divided into a number of small blocks whose intensity variation is individually computed. Noise measurement is then based on the area with lowest activity. The modified algorithm applies a four-step spectral analysis to take into account that the subjective perception of noise is frequency dependent. For the details of this algorithm, the reader is referred to the above sources.

The blocking level Fbl&, finally, is measured with the Blocking Impairment Metric (BIM) by Wu and Yuen [ 181. This algorithm is primarily based on the measurement of intensity differences across the edges of blocks in the decoded image, and was proposed by Yang et.al. [14] for smoothness measurement. The reader is again referred to the above sources for details.

With the above four submetrics, we composed an OVQM using the procedure outlined in section 7. Following equation (21), we define

Qobl= wmnaast. Fmm - Wdip. F c ~ p - Wnoise. Fmise wblock. Fdtp (35) ’ While (hist[i]>’ should be used in the calculation of bo,, this is computatinally more intensive and doesn’t improve the results.

van Zon and Ah: Automated Video Chain Optimization 60 1

which observes the fact that contrast is a desired image feature whereas the others are undesired. The weight factors were calibrated using a (relatively small) set of 24 reference sequences that represented a variety of contrast, sharpness, noise, and blocking. After evaluating these sequences both objectively and subjectively, the correlation between the two sets of measurements was maximized by searching for the maximum value of R according equation (23). The result of this search is shown in Figure 5 below; R reaches a maximum value of 0.87.

1 .o 0.9 1 0.8 - 0.7 a: - 0.6

.P 0.5 3 ij 0.4

6 0.3 0.2 0.1 0 . o c , , I , , I

0 100 200 300 400 500 600 700 800 900 1000

trial I 1000

Figure 5 -progression of correlation factor R

10.3 Chromosome construction

The next step in setting up the optimization process is the definition of a chromosome as outlined in section 9. The length of the chromosome is preferably as short as possible, since we wish to minimize the optimization time. Regarding the order of the N=4 four functions, we have

k c (1, ..., 24) (36)

L(G3 = 5 bits (37)

For the interface bit precision, we want to investigate if it is beneficial to go beyond the traditional eight bits. Because the precision of the video material and that of the display unit are both eight bits, we define

For the other interfaces, we define

BIE (8, 9, 10, 11, 1 2 ) v i E (2, 3, 4) (39)

which means

B = 5 (40)

L(Ge,,) = 3 bits j E {1,2,3) (41 1

Regarding the programmable parameters, we are bound by what is offered by the functional models at hand. The scaler requires only a scale factor. Since we do not vary this factor during the experiment, it does not require representation in the chromosome, i.e., L(Pscalm)=O. The histogram modification unit is fully autonomous and has no external control parameters, so L(Pscaler)=O. The peaking unit and the noise reduction unit each have one programmable parameter, both of which are actually dynamic in nature (cf. section 5). To experiment with optimizing parameter settings, we will treat these parameters as static, which requires that we conduct the optimization with a constant S I N ratio; we chose 30dB. The ranges of P& and Pnoise and their corresponding genes are:

Ppk E (0.0, 0.25, 0.50, 0.75, 1.0) (42)

With the above choices, our chain has a dimensionality of

(46) D(C4) = 4! . 53 . 4 . 5 = 60,000

and is represented by a chromosome of length

L(c) = 5 + 3.3 + 3 + 2 = 19 bits (47)

10.4 Optimization results

Before running the actual optimization, we considered that the computational bottleneck of our software setup lies in the complexity of the video processing functions. We therefore parallized our problem by having the GA suggest sets of new candidate configurations which can be evaluated simultaneously on a parallel computer. On the machine that we used, the evaluation of a single configuration took 5 minutes, and 10 configurations were evaluated in parallel. The input consisted of 40-frame natural sequences contaminated with 30dB Gaussian noise.

Figure 6 shows the maximum normalized objective video quality encountered during the evolution. Stopping criterion was the ability to improve further. While the optimum was found after 996 trials, subsequent trials resulted in the same optimum until the optimization terminated after 3800 generations. The video sequences

602 IEEE Transactions on Consumer Electronics, Vol. 47, No. 3, AUGUST 2001

corresponding to the best configurations were subjectively examined along with a random sampling from the 3800 configurations. The optimization results were found to match well with human opinion, bearing the fact in mind that sharpness was not optimized for. Because the clipping submetric measures artifacts introduced by peaking, the system actually tends to apply the least amount of peaking possible; as a result, the configurations indicated as best hardly apply any sharpness enhancement.

1 1) 0.9 ]#r m 0 8

02 -

0.1 - 0.oq I I I , I I I , ,

1 101 201 301 401 501 601 701 801 901

generation

Figure 6 -evolution of maximum quality

Looking at Figure 6 in some more detail reveals that the main image features affected by the configuration are the clipping and noise levels, and that the system correctly strives for low values of the corresponding submetrics. The contrast level furthermore proves to be largely configuration independent. The blocking level, finally, is correctly measured to be close to zero, causing it to play no role in the optimization.

The 25 best configurations are characterized as follows:

f, = histogram modification (96Y0) f2 f, f4 = noise reduction (88%) B2 = 9.2 bits B3 = 8.0 bits

= scaling (52%) or peaking (44%) = peaking (40%) or scaling (48%)

Bq = 8.8bts Ppeh = 0.07 Pmw = 4

where the bit precision and parameter values are averages. While noisy, natural sequences were used to perform the optimization, we opted to show a pictorial result by means of a clean zoneplate because of its ability to reveal artifacts. Figure 7 shows the leftmost (V) and rotated uppermost (H) hgments of a quarter-zoneplate, for both the best and the worst chain configuration. Whereas the best fi-agments appear virtually artifact-free, obvious artifacts appear in the worst fragments.

Figure 7 - zoneplate fragments for best and worst configurations

11 CONCLUSION

We have presented a methodology for automated video chain optimization which applies a subjectively trained composite objective video quality metric as a cost function for a stochastic iterative non-deterministic search algorithm. The methodology has proved capable of finding chain codigurations that correspond to high image quality. Current research focusses on extending the objective video quality metric, on enlarging training sets, and on speeding up the optimization process. The resulting tools can assist TV and TV-IC manufacturers to improve video quality while speeding up product development.

van Zon and Ali: Automated Video Chain Optimization 603

REFERENCES

[I] G. de Haan, “Video Processing for Multimedia Systems,” CIP-Data Koninklijke Bibliotheek, The ~ Hague, ISBN 90-

[2] Philips Consumer Electronics, private communication ‘

[3] G. de Haan, J. Kettenis, and B. Deloore, “IC for motion compensated 100 Hz TV with a smooth-motion movie- mode”, IEEE Transactions on Consumer Electronics, vol. 42, pp. 165-174, May 1996.

[4] ITU-R Recommendation 500-7, “Methodology for the subjective assessment of the quality of television pictures,” CTU, Geneva, Switzerland, 1995.

[SI VQEG, “Final Report From the Video Quality Experts Group on the Validation of Objective Models of Video Quality Assessment,” www.crc.ca/VQEG, March 2000.

[6] L. L. Scharf, “Statistical Signal Processing: Detection, Estimation, and Time Series Analysis,” Addison Wesley Longman, ISBN: 020 1 190389,199 1.

[7] J. H. Holland, “Adaptation in Natural and Artificial Systems,” University of Michigan Press, Ann Arbor, 1975.

[8] A. Chipperfield and P. Fleming, “Parallel Genetic Algorithms,” in Parallel and Distributed Computing Handbook by A. Y . H. Zomaya, McGrawHill, 1996, pp.

[9] P. Husbands, “Genetic Algorithms in Optimization and Adaptation,” in book Advances in Parallel Algorithms by Kronsjo and Shumsheruddin ed., pp. 227-276, 1990.

9014015-8,2000.

11 18-1 143.

[lo] D. E. Goldburg, “Genetic Algorithms in Search: Optimization and Machine Learning,” Reading, Mass. Addison-Wesley, 1989.

[ l l ] J.G.W.M. Janssen, J.H. Stessen and P.H.N. de With,”An Advanced Sampling rate Conversion Technique for Video and Graphics Signals“, Proceedings of the 6th International Conference on Image Processing and its Applications, IPA97, Dublin, Ireland, July 1997, pp. 771-775.

[12] E.G.T. Jaspers and P.H.N. de With, “A Generic 2D Sharpness Enhancement Algorithm for Luminance Signals”, Proceedings of the 6th International conference on Image Processing and its Applications, IPA97, Dublin, Ireland,

A. Ojo, “An Algorithm for Integrated noise Reduction and Sharpness Enhancement”, Proceedings of Intemational Conference on Consumer Electronics, pp.58-59, Los Angeles, August 2000.

Yang, Galatsanos and Katsaggelos, “Projection-Based Spatially-Adaptive Reconstruction of Block-Transform Compressed Images”, IEEE Transactions on Image Processing, Vol. 4, pp. 896- 908, 1995.

[15] Ch. Hentschel and H. He, ‘Woise Measurement in Video Images,” ICCE Digest of Technical Papers, Los Angeles (USA), Juni 2000, pp. 56-57.

July 1997, pp. 269-273. NLMS-19.332.

[I61 G. de Haan, T.G. Kwaaitaal-Spassova, and O.A. Ojo, “Automatic 2-D and 3-D noise fltering for high-quality television,” Proceedings of the Th Intemational Worhhop on HDTV, Turin, Italy, Oct. 1994.

[ 171 Ch. Hentschel, “Video-Signalverarbeitung (Video Signal Processing),” Chapter 5.4.4, B.G. Teubner Verlag, ISBN: 3- 5 19-0625O-X, Stuttgart 1998 (in German).

[18] H.R. Wu and M. Yuen, “A SUNV of hybrid MC/DPCM/DCT video coding distortions” Signal Processing Letters, Vol. 70, pp. 247-278, 1998.

[ 191 L. Eshelman, “The CHC Adaptive Search Algorithm,” in G. Rawlins, editor, Foundations of Genetic Algorithms, pp. 265- 283. Morgan Kauhann, 199 1.

BIOGRAPHY Kees van Zon was born in Eindhoven, The Netherlands, on December 27, 1960. He received his MSc. degree in electrical engineering from Eindhoven University of Technology, where he graduated cum laude in 1986. He joined the Consumer Electronics division of Royal Philips Electronics in Eindhoven in 1986, where he worked on television applications of field memories. In 1989 he joined Philips Research in Briarcliff

Manor, NY, USA, where he headed a project on NTSC ghost cancellation until 1992. The Ghost Cancellation Reference signal developed in this project was accepted by the FCC for nationwide transmission in the USA, the team’s contributions to the field were acknowledged with a Primetime Emmy Award by the Academy of Television Arts and Sciences in 1995. From 1992 to 1997, he worked at the Philips Research Laboratories in Eindhoven on the application of programmable devices for television signal processing purposes. Since 1998, he is heading a project on automated video chain optimization at Philips Research Briarcliff.

Walid S. I. Ali received his BSc. degree in electronics and telecommunications from Cairo University, Cairo, Egypt with first honor in 1992. He received his M.Sc. in computer science from the Imperial College, London University, London, UK in 1994, where he graduated with distinction. From 1995 till 1999, he attended Drexel University, Philadelphia, PA, USA, where he received his Ph.D. degree in electrical and computer

engineering, During his Ph.D. course, he received the outstanding graduate student and best teaching award from Drexel University. In 1992, he joined IBM, Egypt as a system engineer. Since 1999, he has been with Royal Philips Electronics Research Labs in Briarcliff Manor, NY, USA. His research has focused on video and image quality and processing, system optimization and evolutionary models. He has also published work in medical imaging, image registration and differential geometry.