Efficient handling of raw material variation in industry Tormod Naes Ingunn Berget (post-doc) Kjetil Joergensen (Ph.D. student)

Efficient handling of raw material variation in industry

Tormod Naes

Ingunn Berget (post-doc)

Kjetil Joergensen (Ph.D. student)

Overview• IBION-project

• The problem. Handling raw material variation

– Why?

– Different aspects

• Building relevant models

– Data/design + modelling

• Different types of use of the equation• Robustness

• Sorting

• Continuous updating

• Combinations

• Robustness, validation

Industrial Biostatistics Network (IBION) www.ibion.no

Efficient use of raw materials in industry

The research project IBION (Industrial Biostatistics Network) is a consortium with partners from

five Norwegian bio-processing companies (Tine, Ewos, Stabburet, Mills, Borregaard)

the software and consulting company Prediktor and

three research institutes:-Department of mathematics, Agricultural University of Norway, AUN, Ås -Matforsk, Norwegian Food Research Institute, Ås-CPAC, University of Washington, Seattle.

The project is financed by the industrial partners and by The Research Council of Norway (NFR).

It was started July 1, 2001 and will continue until July 1, 2005. The total budget for the project is about 40 million NOK.

Handling raw material variationWhy important?

• Raw materials vary in quality

• Raw material costs represent a large portion of the total costs

• Customers require goodh and stable quality

Handling raw material variation

• Adjust processes to account for unwanted raw material variation– Stable, good quality– avoid waste

• Utilise potential in raw material– Best raw materials for best product– Good combinations– Quality and cost

In all these cases

We would like to have a model or a strategy that can be used to tell us what to do when a

batch of raw material is received and characterised

“process=f(raw materials, target values of output)”

Important

• All serious industries have strategies and techniques for handling raw material variation (including the partners)– Expert knowledge, practical experience

• We help them improving strategies – Using statistical/chemometric methods

Important criteria

• Methods should be easy to use and understand

• Results should be easy to interpret

• Methods should stimulate to user interaction

• Methods should be flexible/versatile/robust

• Realistic validation of results

Important steps• Problem formulation

• Measurements, where?, type?.

• Data collection

– Design or historical data?

• Modelling

– Type of model, variable selection

• Use of the models

– Interpretation, optimisation

• Properties, robustness, validation

• New round?

Collaboration

• Typical area for collaboration

– raw material knowledge– process knowledge– spectroscopy – statistics

• Without close collaboration, no results!

Knowledge available

• A large amount of useful components are avaialable

• Experimental design

• Empirical modelling (polynomials)

• Variable selection

• Optimisation

• Validation

• etc.

• Little focus on this particular problem area

Raw material handling

• The best that can be done prior to processing starts

• Should be followed up by process monitoring and/or control strategies when appropriate.

Problem Description

Process parameters

Raw-materials

PROCESS Final ProductQuality

N o r w e g i a n F o o d R e s e a r c h I n s t i t u t e

Time

Different approaches

• Robustness with respect to raw material variation

• Continuous updating of process settings

• Sorting of raw materials (define classes with corresponding optimal processing and with good properties in each)

• Combinations

• Useful for different situations, depending on local conditions

Two pieces of work done by studentsone related to modelling

one related to use of models

• Design and analysis strategy for situations with uncontrolled raw material variation

– Jorgensen and Næs, (2004). J. Chem. (in press)

• Optimal sorting of raw materials based on the predicted end product quality.

– Berget and Næs (2002). Qual. Eng.

• Batch oriented

A possible strategy for model buildingK. Jorgensen and T. Næs. (2004). J. Chem. (in press)

Proposal for a strategy and a case study, cheese, Dry matter • End product quality = F(raw materials, process settings)

– In practice often used inverted

• Problems

– Raw materials are natural products, can not be controlled/designed

• How can we set up an experimental design?

– Raw material characterisation

• Time consuming

• Sometimes one does not know what to measure

Possible solution

• Design– Block design with raw materials as blocks

• Measurements– Spectroscopy, use principal components of

spectra directly in modelend product quality=F(PC’s, process)

Experiment

• 4 2-level factors + 3 2-level block factors – 7 factors in total– factorial design 24 in 8 blocks

– protein content– renneting time– amount of starter culture added– coagulum cuting

Design variable Low (-) value High (+) valueProtein content (A) 3.15% 3.50%Renneting Time (B) “Standard” “Standard” + 7 minutesStarter culture added (C) 1.7% 2.2%Coagulum cutting (D) “Reduced” “Standard”

Replicate #1 Replicate #2Run # Block A B C D Run # Block A B C D

1 1 - - - - 17 5 + + + -2 1 - + + + 18 5 + - - +3 1 + - + + 19 5 - - + -4 1 + + - - 20 5 - + - +5 2 + + + + 21 6 - - - +6 2 + - - - 22 6 + - + -7 2 - - + + 23 6 + + - +8 2 - + - - 24 6 - + + -9 3 - + - + 25 7 - + + +

10 3 - - + - 26 7 + + - -11 3 + + + - 27 7 - - - -12 3 + - - + 28 7 + - + +13 4 - - - + 29 8 + + + +14 4 - + + - 30 8 - + - -15 4 + - + - 31 8 - - + +16 4 + + - + 32 8 + - - -

ANOVA tableSource DF Adjusted SS Adjusted MS F P-valueBlocks 7 0.0334 0.00478 4.38 0.008Main Effects 4 0.0523 0.01308 12.00 0.0002-Way Interactions 5 0.0010 0.00020 0.19 0.963Residual Error 15 0.0163 0.00109Total 31

Estimated regression coefficients and p-values for individual main effectsTerm Coefficient SE of coeff. P-ValueProtein 0.003 0.0058 0.565Starter -0.007 0.0058 0.237Renneting Time 0.037 0.0058 0.000Cutting 0.015 0.0058 0.019

ANOVA tableSource DF Adjusted SS Adjusted MS F P-valueCovariates 6 0.0328 0.00547 5.16 0.004Main Effects 4 0.0352 0.00879 8.29 0.0012-Way Interactions 5 0.0014 0.00029 0.27 0.922Residual Error 16 0.0170 0.00106Total 31

Estimated regression coefficients and p-values for individual main effectsand Principal ComponentsTerm Coefficient SE of coeff. P-valuePC 1 -0.093 0.1202 0.449PC 2 0.026 0.0061 0.001PC 3 0.024 0.0285 0.420PC 4 0.023 0.0087 0.017PC 5 0.009 0.0076 0.239PC 6 -0.007 0.0081 0.436Protein 0.099 0.1216 0.426Starter -0.007 0.0060 0.291Renneting time 0.030 0.0063 0.000Cutting 0.017 0.0064 0.018

Same level of residual standard deviation

ANOVA tableSource DF Adjusted SS Adjusted MS F P-valueCovariates 2 0.0304 0.01519 15.65 0.000Main Effects 4 0.0410 0.01024 10.56 0.0002-Way Interactions 5 0.0013 0.00026 0.26 0.928Residual Error 20 0.0194 0.00097Total 31

Estimated regression coefficients and p-values for individual main effectsand Principal ComponentsTerm Coefficient SE of coeff. P-valuePC 2 0.026 0.0058 0.000PC 4 0.018 0.0062 0.007Protein 0.004 0.0055 0.455Starter -0.007 0.0057 0.241Renneting time 0.033 0.0057 0.000Cutting 0.016 0.0057 0.010

0.650.550.45

0.05

0.00

-0.05

Fitted Value

Res

idu

al

Residuals

1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Abs

orba

nce

Wavelength (cm-1)

FT-IR spectra

Loadings for PC2 og PC4

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Loadin

g P

C2

1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

Wavelength (cm-1)

Loadin

g P

C4

PLS instead of PCA

• Preliminary investigations based on simulations and real data indicate

– PLS on the residuals (iteratively) gives better interpretability of spectral information (fewer components)

Conclusions• Block designs, with raw materials as blocks, combined with rapid characterisation of the raw

materials: useful tool for model building.– (verified also in other experiments)

• Using PC’s of spectra is flexible and does not need decisions about which properties to measure. Interpretation is important

• First step towards a good model– Possible to interpret– Model is verified

• Must may extended or fine-tuned by extra experiments to incorporate conlinearities etc.

• Best possible combination of process variables and blocks? Research!

• Combinations of collinear spectral data and factors. Research going on!

Utilising equations for process improvements

• Some ideas, possibilities, feasibility

– not finished industrial implementations

• General goal: adjust process after measurement of raw material quality

– Optimal for each batch

– Optimal, but robust with respect to certain types of noise

– A simpler strategy: Identify a small number of homogeneous raw material classes and their corresponding optimal process settings

• robust with respect to measurement error of raw material measurements

• simple in use (if difficult to change process)

• well suited for situations where it is possible to sort. – Receive raw materials, sort, store in bins and process from same bin

Goal: Reduce the effect of variable raw material quality on end product quality

Procedure:

Identify optimal classes with corresponding processing conditions,cluster analysis.

After identification: measure raw material and put in best class (with known

processing conditions).

Industrial process with sorting

RAW MATERIALS

PROCESS 1

END PRODUCT

“POOR QUALITY” “GOOD QUALITY”

PROCESS 2

Model: Predicted quality = Raw material + Process

Sorting:Predicted quality depends on category

Objective: Minimise (Predicted quality - Target)2

for all objects in all categories

Model: ),(ˆ zxfy

Sorting:i = object index (i = 1,…,n)j = group index (j = 1,…,C)

),(ˆ ijij zxfy

y = end product quality x = process variablesz = raw material variables n = number of objects

C = number of categoriesT = Target

Distance between objects and groups= the loss from object i when it is allocated to category j

22 )ˆ( Tyd ijij

Optimal sorting of raw materials, based on the predicted end product quality

Paper I

Fuzzy clustering

• Fuzzy clustering as strategy for finding groups

– Flexible with respect to distance

– Easy to implement

– Good convergence properties

• Gives a quantitative description of how well each object fit in each cluster

• Membership values

– Numbers between 0 and 1

– Sum up to 1 for each object

– Relative numbers

OPTIMAL PROCESS SETTINGS

xo1, xo

2,,…, xoc

MEMBERSHIP VALUES

U ={ui j}

MODEL

ŷ=f(x,z)

DATAZ ~p(z)

TARGET, T

Number of Categories

(C)

FUZZY CLUSTER ANALYSI

S

EXPERIMENTAL DATAX,Y, ZExp

Example: Baking of hearth bread• Data taken from a study of baking process and flour quality*

– 10 flour blends

– 3 levels of mixing and proofing time ( = resting time after dough has been shaped)

– 90 combinations of flours and baking process

– Response: bread loaf volume

*Færgestad, E. M. et al. Influence of flour quality and baking process on hearth bread characteristics using gentle mixing. Journal of Cereal Science 1999, 30 61-70.

Input to cluster analysis

zxzxxx

zxxy

2122

21

21

52.090.008.030.0

10.2447.405.226.523

• MODELz1 protein

x1 mixing time

x2 proofing time

y volume

• Target– T=530 mL

• Raw material data– 100 equally spaced points within experimental region (10.2 - 14.3%

protein)• Number of groups

– C=2

Membership values and loss

10 10.5 11 11.5 12 12.5 13 13.5 14 14.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

mem

bers

hip

valu

es

protein content (%)

Category 1Category 2

10 10.5 11 11.5 12 12.5 13 13.5 14 14.50

500

1000

1500

2000

2500

3000

protein content (%)

with two groupswithout sorting

loss

Mixing Proofing Averageloss

p < 12% 11.7 61.3p > 12% 7.6 51.9 199.1

Withoutsorting

8.7 54.4 705.9

Results

• Optimal process settings • Convergence properties

– With two groups the clustering algorithm converges in less than 20 iterations.

– More iterations needed with more groups.

– No sensitivity to different initialisations.

… but what about bread shape?

Volume = 354 ml

Form ratio = 0.64 (height/width)

Volume = 352 ml

Form ratio = 0.52 (height/width)

Sorting of raw materials with focus on multiple end-product properties

• Product quality is often defined by several product characteristics

• Different responses have different optima

– Example: volume and form ratio

– longer proofing times give larger, but flatter breads

Paper II

Suggested approaches• Optimise one response under constraints on the

others– Convergence problems in investigated example

• Weighted squared loss– Use weights to prioritise responses

• Desirability functions– Functions of predicted product quality

Alternatively

RAW MATERIALS

PROCESS 1

PRODUCT 1

POOR QUALITY GOOD QUALITY

PROCESS 2

PRODUCT 2

Assessing robustness

• Various approaches exist– Box et al, optimisation of polynomials models

– Bootstrapping (parametric) for assessing robustness

• Estimate model.

• Simulate data from the model

• Repeat optimisations.

• Visualise the optimal points

5 10 15 20 2535

40

45

50

55

60

mixing time

proo

fing

time

20

20

30

30

30

30

40

40

40

4050

50

50

50

60

60

60

60

70

70

70

70

80

80

80

80

90

9090

90

100

100

100110 110

120

[13.5, 60.0]

5 10 15 20 2535

40

45

50

55

60

mixing time

proo

fing

time

20

20

2020

20

20

20

30

30

3030

30

30

30

40

4040

40

40

4040

50

50

50

50

50

50 50

6060

60

60

6070

70

70

8090

100110

[7.7, 52.3]

Can also be used in other situations

• Robustness of robust process optimization

– Mevik(2003) Qual. Eng.

– Some variables controlled, robustness to others

– robustness to model and target uncertainty

• Product and process improvement using mixture-process variable design and robust optimization techniques.

– Sahni, Piepel and Næs (2004), in prep.

– Some variables controlled, robustness to other (mixture-process)

– robustness to coefficient and model selection uncertainty

• Results

– Memberships and optimal process settings are variable when regression coefficients are uncertain.

– Misclassification rate due to variable membership values is small.

– Average error in predicted response due to variable optimal process settings small compared to the prediction error

• Indicate that prediction sorting is rather robust to random error in the regression coefficients.

• Paper submitted

Robustness of prediction sorting

Paper IV

List of papers

I. Berget and T. Næs. “Optimal sorting of raw materials, based on the predicted end-product quality”. Quality Engineering (2002) 14 (3) 459-478

I. Berget and T. Næs, “Sorting of raw materials with focus on multiple end-product properties”. Journal of Chemometrics (2002) 16 263-273

I. Berget, A. Aamodt, E. M. Færgestad and T. Næs. “Optimal sorting of raw materials for use in different products”. Journal of Chemometrics and Intelligent Laboratory systems (in press).

I. Berget and T. Næs. “Robustness of prediction sorting”. (submitted)

Conclusions• Fuzzy clustering combined with suitable distance measure can be

used for sorting

• Robust splitting and reasonably robust process settings

• Clear improvements over non-sorting

• Method can be extended to multivariate data and can be penalised

• Bootstrap can be used for evaluating robustness

Documents

Efficient handling of raw material variation in industry Tormod Naes Ingunn Berget (post-doc) Kjetil Joergensen (Ph.D. student)