6

Click here to load reader

Statistical approaches to minimising experimentation

  • Upload
    dd

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Statistical approaches to minimising experimentation

Food Quality and Preference3 (1991/2) 109-l 14

STATISTICAL APPROACHES TO MINIMISING EXPERIMENTATION

E. A. Hunter Scottish Agricultural Statistics Sexvice, The University of Edinburgh, UK

&

D. D. Muir Hannah Research Institute, Ayr, UK

(Received 6 Februav 1992; accepted May 1992)

ABSTRACT

Statistikal design and analysis of experiments is not wio!ely used in food science. One reason for its neglect is that it is associated with ?eld plot’ experiments and is not regarded as relevant by food scientists. There are special features of food science experiments which distinguish them from agricultural expenments. These special features include the sequential nature of ex@%entation and the ability to complete and evaluate an experiment in a short period of time. The same features are also seen in research worh in the chemical industry where an a#n-opiate statistical methodology has been deoeloped by Professor G. E. I? Box. This can be applied to research work in food science and technology. Our experience of applying these methods at the Hannah Research Institute is described.

Keywords: Experimental design; fractional factorial; response surface.

INTRODUCTION

The statistical design and analysis of experiments is associated with field plot experiments. Consequently, many (most) courses on statistics for food scientists either teach no experimental design at all or teach it in such a way that the student does not appreciate its relevance.

This is not surprising since development work carried out on pilot plant or in a laboratory does not easily fit the paradigm of field plot experiments. Consequently, food experiments are often designed using a one-at- a-time approach which is known to be wasteful of

0 1992 Elsevier Science Publishers Ltd 0950-3293/92/$05.00

109

resources and, in certain circumstances, not to give the best answer.

There are comparatively few books on statistics for food research and even these do not give an extensive treatment of appropriate techniques of design and analysis (see Piggott, 1986). However, an appropriate methodology for the research and development prob lems of the chemical industry has been developed by Professor Box of Madison, Wisconsin. A full account of these methods is given by Box et al. (1978), Box and Draper (1987) and research papers in the journal Technomttics. Box’s methods have been used to improve the effectiveness of food research at the Hannah Research Institute (see Muir et al., 1991a,b).

Parallel to Box’s work, Taguchi developed a system of design and analysis for experiments within the wider framework of a methodology for quality and productivity improvement in manufacturing. His methods have been widely applied to the electronics and engineering industries in Japan and more recently in North America and Europe. For an up-to-date exposition of these methods of experimental design and analysis see Logothetis and Wynn (1989). Charteris (1992) reviews the literature with a special reference to use in the food industry.

Although the Box and Taguchi methodologies were developed independently for different industries the methods are similar.

THE STATISTICAL PROBLEM

In this paper it is assumed that experiments are carried out in a laboratory environment and that runs are done sequentially using the same pilot plant but with the controls altered to create the conditions specified for each run. Furthermore, it is assumed that the results of each run are available almost immediately after it has been completed.

Page 2: Statistical approaches to minimising experimentation

110 E. A. Hunter, D. D. Muir

The statistical problem considered in this paper is ‘How should a scientist organise a programme of experimentation where many potentially important factors are identified at the start?’

Box et al. (1978) have recommended that a pro- gramme of work on a particular problem should be divided into a number of experiments with the first using not more than 25% of the available resources. In early experiments the aim is to identify important factors and in later experiments to optimise the values of the ‘active’ factors.

TABLE 1. Description of Factors and Specification of the Levels

Factor Low level High level

Powder Protein content (%) Heat classification

Recipe Solids content of soup (%) Powder content of soup (%)a

Processing Homogenisation pressure (psi) Sterilisation conditions

22.5 low heat

11.0 80.0

1750 107”C/ 75 min

40.5 high heat

15-o 120-o

3500 121°C/ 3 min

IDENTIFICATION OF 'ACTIVE' FACTORS 0 Percentage based on original recipe.

In our experience ‘real’ problems have many factors. Typically, we have five or more. The one-at-a-time approach requires as many experiments as factors. In order to get estimates of error at least some of the runs must be repeated. After each experiment the ‘optimal’ level of the factor is set for all subsequent experiments. This strategy assumes that there are no interactions between factors. If there are important interactions this approach will fail to find the best combination of factor levels.

An alternative and better approach is to vary the levels of all factors simultaneously. We recommend the use of a sequence of fractional replicate designs with few runs and many factors. Fractional factorial designs are designs with a specially selected fraction of all possible treatment combinations. By making some assumptions main effects and occasionally two-factor interactions can be estimated. Further details are contained in Box et al. (1978).

It is very important for the scientist to select the factor levels carefully. For qualitative factors, such as catalyst or additive, the number of levels will be determined from the analysis of the problem. For quantitative factors, such as temperature and pressure, the number and choice of levels is arbitrary. It is recommended that for the initial experiment the quantitative factors should have two levels which are set as far apart as is sensible. If no response is detected then it can be assumed that the factor is not ‘active’.

In general, it is easier to find statistically adequate designs with all factors at two levels rather than a mixture of two and three levels. Four levels are generated by two factors each at two levels. Catalogues of designs are given by Box et al. (1978) and Haaland (1989). Other designs with many factors and few runs are available (see, for example, Taguchi, 1986).

Example

The effect of skim milk powder properties on the viscosity of tomato soup was investigated by Muir et al.

(1991 b) . Six factors were considered at levels representing the extremes of processing conditions as shown in Table 1.

There are 64 treatment combinations of these six factors and hence a replicated design would require at least 128 runs. The resources required for such a design were beyond those available to the project. Also, the design commits the experimenter to factor levels at an early stage and does not allow the results of the early runs to be used to plan later runs. A much better strategy is to do a fraction of the full set of treatment combina- tions. A decision can then be made on whether to do a further complementary fraction, to alter the factor levels and start again or even to stop the programme of experimentation.

It was decided that a design with 16 runs would provide adequate information on treatment effects. A suitable fractional factorial design was derived by using two four-factor interactions as defining contrasts (see chapter 12 of Box et aZ., 1978). The 16 treatment combinations together with the results are given in Table 2.

Analysis. A fractional replicate design does not allow an indepen- dent estimate. of error to be made. Statistical analysis can only proceed by making some assumptions. The initial analysis of this experiment was by analysis of variance with the main effects of the six treatments separated from error. The error was thus estimated from nine interaction degrees of freedom. This form of analysis is available in most statistical packages including Minitab (Minitab Inc., 3081 Enterprise Drive, State College, PA 16801, USA) and Censtat 5 (copyright 1990, Lawes Agricultural Trust, Rothamsted Experi- mental Station, Harpenden, Herts, UK). However, it assumes that the interactions are small.

A more rigorous approach to the analysis of fractional replicate designs was first advocated by Daniel (1959) and is described in chapter 12 of Box et al. (1978). Using Yates’s algorithm, the 15 degrees of freedom

Page 3: Statistical approaches to minimising experimentation

Statistical Approaches 111

TABLE 2. Factor kvels for Each Experimental Run Together with the Resulting Viscosity

RUll Protein Heat Solids Powder Homogenisation pressure Temperature Log viscosity

1 2 3 4 5

; 8 9

10 11 12 13 14 15 16

high high high high low low 1.044 low low high high high high 0.940 high low low high high low 0.675 low high high low high low 1.068 low low low low low low 0.758 high low high low low high l-075 low high low high low high O-614 high high low low high high O-820 low high high low low high 1.212 low low low low high high O-716 high low high low high low 1.059 high high high high high high 1.000 high low low high low high 0.669 low high low high high low 0.656 high high low low low low O-732 low low high high low low 1.036

between the 16 runs were associated with 15 treatment effects. These treatment effects were ordered and a normal plot made. Normal probability paper can be used to facilitate this process. Alternatively, normal deviates corresponding to the following points on the cumulative normal distribution, (i - 0.5)/15 where i= 1 . . .15, can be formed and the graphs prepared by computer. If no treatments had been applied in the experiment we would expect the points to lie on a straight line passing through the origin. Departures are evidence of treatment effects. This technique allows for selection and also for the possibility of interaction effects. When there are few runs there are advantages in making use of the symmetry of the normal distri- bution and selecting a half-normal plot. A plot of the ordered absolute treatment effects of log viscosity is given in Fig. 1. Inspection of the plot shows that solids and powder are the only two effects which deviate from the trend line. There is no statistical

0.18

:

0.16 Solids ’

0.141

0.12 t

2 0.10 ii fj 0.08

t 0.06

0.04 t Powder ’

x 0.02.

xxx x x x

0.XxXxXx

0 0.5 1.0 1.5

Normal deviate

2.0

evidence of an interaction between these factors. A reduced model, with the effects of solids and powder separated from error, was fitted to the data and the residuals obtained. The validity of the model was tested by the following plots:

(4

(b)

(c)

residuals against plot order-a check for drift over the experiment; residuals against fitted values-a check for variance-mean relationships; and a normal plot of residuals-a check for goodness of fit of the model.

These plots were judged to be satisfactory and the reduced model was accepted as being a good summary of the results of this experiment.

The treatment means are given in Table 3. The small standard errors of differences and hence the high sensitivity of the experiment result from ‘hidden replica- tion’ and from the good fit of the model to the data.

Since no evidence of the effects of powder properties on the viscosity of the soup was found, the programme of experiments was stopped.

An alternative method of analysis has been proposed by Box and Meyer (1986) using a Bayesian argument to give a probability of each effect being ‘active’.

Notes (1) This particular experiment was designed in

two blocks. One block consisted of the first eight factor combinations and the other of the remaining eight

TABLE 3. Summary of the Statistically Significant Results

Factor Level Standard error of difference

LOW High

Solids 1.37 Z-50 o-21 1 Powder 2.25 1.62 0.211

FIG. 1. Half-normal plot of treatment effects on log viscosity.

Page 4: Statistical approaches to minimising experimentation

112 E. A. Hunter, D. D. Muir

combinations. It was envisaged that each block would take a day. The division into two blocks was done in such a way as to minimise the effects of differences between days. It was later found possible to do all 16 runs on the same day.

(2) If a further experiment had been deemed apprcF priate, a complementary fractional design would have been used. This would allow the main effects to be more precisely estimated and all two-factor interactions to be separated. The initial design would not have been repeated because it is more important to evaluate all two-factor interactions than to obtain independent estimates of error.

(3) If each run had required a large amount of resources then it would have been possible to use one block of the design to evaluate the main effect of treatments.

OPTIMISATION OF LEVELS

After the ‘active’ factors have been identified it is usual to wish to find the values of the continuous factors which give the optimum results. The simplest design is a grid of points. This process is known to be wasteful. A much better method is that described in Box and Draper (1987) who recommend a regression approach which they call ‘response surface methodology’. The potential of this approach is illustrated by the following example.

Example

A factorial experiment, which examined seven factors in eight runs, identified two components of the ‘recipe’ as influencing the viscosity of the final product. The purpose of this stage of experimentation was to find values of the two components which gave viscosity values of 30-35, and then to map this region of the component space. It was important to locate a region where small changes in the components did not lead to large changes in the viscosity.

SYl The initial experiment used five different combina- tions of the two components. Component 1 was assigned a low level of 30 and a high level of 50 whilst component 2 was assigned a low level of O-5 and a high level of 1.5. The design consisted of all four combina- tions of these levels together with two centre points (component 1 = 40, component 2 = 1-O). The experiment was repeated, in order to increase confidence, before the results were analysed. The values of the two compo- nents and the viscosity response for each run are given in Table 4.

TABLE 4. Levels of Components 1 and 2 for Each Run of the Initial Experiment together with the Resulting Viscosity

RUll Block Component 1 Component 2 viscosity

1 1 40 2 1 30 3 1 50 4 1 30 5 1 50 6 1 40 7 2 40 8 2 30 9 2 50

10 2 30 11 2 50 12 2 40

1-o 0.5 0.5 1.5 1.5 1.0 1.0 0.5 o-5 l-5 1.5 1.0

9.70 9.95

22.45 6.65

12.55 9-45 9.80

11.55 22.15

7.70 17.60 11.20

The results show clearly that the desired part of the response surface had not yet been located. The following equation was fitted to the data by least squares:

Viscosity = - 1.49 + 0_486*comp_l - 5.40*comp_2

Using this equation a contour map of the results was prepared with the design points marked ‘X’ and the mean viscosity of these points shown alongside (Fig. 2). The centre of the system should be moved in a direction at right-angles to the contours for the second experiment.

step 2 A new centre with component 1 of 65 and component 2 of 1.2 was used for the second experiment. The vis- cosity values for this combination of parameters were 95 and 75. For component 1 of 75 and component 2 of O-8 and 1.6 the value for viscosity was greater than 1000. This experiment was quickly abandoned and a new centre point of 55 and 1.5 was selected.

16-

1.4 -

0.6 -

o’425 I 1, 30 35 / 40 , 45 /. 50 55

Component 1

FIG. 2. Contour plot of first-order response surface with data points superimposed.

Page 5: Statistical approaches to minimising experimentation

TABLE 5. Levels of Components 1 and 2 for Each Run of the Third Experiment together with the Resulting Viscosity

RUn Block Component 1 Component 2 Viiosity

1 1 55 1.5 25.0 2 1 50 1.0 19.0 3 1 60 1.0 44.0 4 1 50 2.0 18.1 5 1 60 2.0 31.0 6 1 55 1.5 24.9 7 2 55 1.5 23.5 8 2 50 1.0 16.6 9 2 60 1.0 39.1

10 2 50 2.0 18.6 11 2 60 2.0 29.6 12 2 55 1.5 24.6

step3 The value of each component and the resulting viscosity values for the runs of this experiment are given in Table 5. The results show that a suitable part of the response surface has been located. The following equation was fitted to the data:

Viscosity = - 161.3 t 3*56*comp_l t 59*5*comp_2

- l~180*comp_l*comp_2

Using this equation a contour map of the results was produced (Fig. 3).

Notes (1) At this point it was decided that the diagram

given in Fig. 3 provided sufficient evidence to choose values of component 1 and component 2 giving viscosity values of 30-35.

(2) Three extra points, two with component 1 of 60 and one with 65 and all with component 2 = 1.5, when

20 contour

/

46 48 50 52 54 56 56 60 62 64

Component 1

FIG. 3. Contour plot of second-order response surface with data points superimposed.

Statistical Approaches 113

added to the existing centre point and the two points with component 1 of 60 would form a new design space. This would allow an equation to be fitted in the region of interest and hence a more precise set of contours to be drawn.

(3) An extra block of runs in the same region of the’ parameter space with the points carefully chosen to complement the existing points would allow a more complicated equation to be fitted to the data and this in turn would allow further improvements in the precision of the contours.

The points for this block would be:

Component 1 Component 2

Centre points 60 1.5 Factorial points 56.5 1.15

56.5 1.85 63.5 1.15 63.5 1.85

The two blocks of runs would form a ‘star’ arrange- ment. The ‘star’ design is particularly effective for fitting second-order response surfaces (see Box et al., 1978, chapter 15).

The work described in this paper was supported by funds from the Scottish Office Agriculture and Fisheries Department.

CONCLUSIONS

A statistical methodology exists to allow food scientists to design and analyse experiments (see Box et aZ., 1978; Box & Draper, 1987). These methods are also applicable to the design of the treatments of experiments where the products are evaluated by sensory means, (Drew and Whitear, 1980).

It is recommended that a programme of work is broken down into a number of experiments each with a modest number of runs. The results of the first experiment are used to design the second experiment. Later experiments make use of the results of all previous experiments.

These methods are routinely used in the chemical industry and their adoption at the Hannah Research Institute has led to improvements in effectiveness.

REFERENCES

Box, G. E. P. & Draper, N. R. (1987). Empirical M&Z-Building and Response Surfaces. John Wiley & Sons, New York.

Box, G. E. P. 8c Meyer, R. D. (1986). An analysis of un- replicated fractional factorials. Technomettics, 28, 11-18.

Box, G. E. P., Hunter, W. G. & Hunter, J. S. (1978). Statistics fw Experkmters. John Wiley & Sons, New York.

Page 6: Statistical approaches to minimising experimentation

114 E. A. Hunter, D. D. Muir

Charteris, W. (1992). Taguchi’s system of experimental design and data analysis: A quality engineering technology for the food industry. Daily Technology, 45,33-49.

Daniel, C. (1959). Use of half-normal plot in interpreting factorial twolevel experiments. Technomettics, 1,311-41.

Drew, I. & Whitear, A. L. (1980). Plavour stability studies: The use of fractional factorial designs. Journal of Institute of Brewing, 86,269-73.

Haaland, P. D. (1989). Ea$erimental Design in Biotechnology. Marcel Dekker, New York.

Logothetis, N. & Wynn, H. P. (1989). Qua&y Through Design. Oxford Science Publications, Clarendon Press, Oxford.

Muir, D. D., Hunter, E. A. &West, I. G. (1991a). Optimisation of the properties of dried skim-milk for use in white sauce suitable for use with frozen products. Dairy Technology, 44, 2CL3.

Muir, D. D., Hunter, E. A. & West, I. G. (1991b). Skim milk powder for use in cream of tomato soup. Daily Technology, 44,524.

Piggott, J. R. (1986). Statistical Procedures in Food Research. Elsevier Applied Science Publishers, Barking, UK.

Taguchi, G. (1986). Introduction to @ali4 Engineering (Designing Quality into Products and Processes). American Supplier Institute, Dearborn, Michigan.