Nonlinear system identification and model reduction using artificial neural networks

Computers and Chemical Engineering 27 (2003) 1741�/1754

www.elsevier.com/locate/compchemeng

Nonlinear system identification and model reduction using artificialneural networks

Vinay Prasad *, B. Wayne Bequette

Howard P. Isermann Department of Chemical Engineering, Rensselaer Polytechnic Institute, CII 6015, 110 8th Street, Troy, NY 12180-3590, USA

Received 9 April 2002; received in revised form 29 May 2003; accepted 29 May 2003

Abstract

We present a technique for nonlinear system identification and model reduction using artificial neural networks (ANNs). The

ANN is used to model plant input�/output data, with the states of the model being represented by the outputs of an intermediate

hidden layer of the ANN. Model reduction is achieved by applying a singular value decomposition (SVD)-based technique to the

weight matrices of the ANN. The sequence of state values is used to convert the model to a form that is useful for state and

parameter estimation. Examples of chemical systems (batch and continuous reactors and distillation columns) are presented to

demonstrate the performance of the ANN-based system identification and model reduction technique.

# 2003 Elsevier Ltd. All rights reserved.

Keywords: Nonlinear system identification; Model reduction; Artificial neural networks

1. Motivation

Strategies for state, parameter and disturbance esti-

mation have been extensively researched in chemical

process control. Every estimator design requires a

nominal state-space model around which the estimator

is constructed. In many cases, a first-principles mechan-

istic model constructed from material and energy

balances is used as the nominal model. However, it

may not always be possible to construct a satisfactory

nominal model for a given process. In other cases, the

mechanistic state-space model developed may be of very

high order. In the case of distributed parameter systems,

the conversion of the original system model (described

by partial differential equations) into a system of

ordinary differential equations (ODEs) invariably leads

to a very high order model. Such models are often

unsuitable for direct application in an estimation frame-

work. Constructing appropriate augmented parameter

* Corresponding author. Present address: Controls Engineering

Group, H2Gen Innovations, Inc., 4335 Taney Avenue, 402,

Alexandria, VA 22304, USA. Tel.: �/1-703-823-3305; fax: �/1-703-

212-4898.

E-mail address: [email protected] (V. Prasad).

0098-1354/03/$ - see front matter # 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0098-1354(03)00137-6

and disturbance structures is non-trivial for these high

order models.

This indicates the need for a method for system

identification and model reduction to create models thatare convenient for estimator and controller design. For

linear systems, subspace identification methods are used

for constructing state-space models from input�/output

data, and model reduction is achieved by using balanced

truncation. These techniques, however, are not directly

applicable to nonlinear systems. In this paper, we

explore techniques for nonlinear system identification

and model reduction based on artificial neural networks(ANNs). We propose a new method to construct

reduced order nonlinear state-space models from in-

put�/output data using ANNs. The model identified

using ANNs and input�/output data is then transformed

into a form that is well-suited for estimator and

controller design.

2. Background

An important step in the design of estimators andcontrollers for chemical processes is the development of

an accurate model of the system. This nominal model is

then used as the basis for state estimator design. First-

mailto:[email protected]

Fig. 1. A typical feedforward network.

Fig. 2. A typical recurrent or feedback network.

V. Prasad, B.W. Bequette / Computers and Chemical Engineering 27 (2003) 1741�/17541742

principles mechanistic models are not always appro-

priate for this purpose. In some instances, the funda-

mental physical model may be prohibitively difficult to

develop, or may be of very high order and unsuitable forestimator and control design. In such cases, empirical

models based on input�/output data offer an alternative

approach to system identification.

System identification using input�/output data is a

well-developed topic for linear systems. There are two

basic approaches to this problem: identifying a model

that is in input�/output form (autoregressive models*/

ARX, ARMAX, etc.), or identifying a state-space modelfrom the input�/output data. Autoregressive models

suffer from the disadvantage that the fit with output

data obtained by optimization is highly dependent on

the structure and parameterization. Extensions to multi-

variable (MIMO) problems also lead to a large number

of parameters and a difficult optimization problem.

Also, since the basic purpose of the model is for use in

state estimation and control, state-space modeling is thefavored approach.

Subspace identification techniques are commonly

used for obtaining empirical state-space models for

linear systems. The state-space model is identified in a

frequency-weighted balanced form using input�/output

data. A comparison of subspace identification techni-

ques may be found in Overschee and Moor (1995). The

most popular subspace identification method is N4SID(Overschee & Moor, 1994), which identifies a linear

stochastic state-space model. As described by Overschee

and Moor (1995), all subspace identification methods

consist of two steps. The first step involves the

computation of a characteristic subspace (giving the

order of the system) and the extended observability

matrix of the system. In the second step, the extended

observability matrix is used to define two state se-quences, which can be used with the original input�/

output data to obtain the system matrices by employing

a least-squares fit.

Subspace identification methods are restricted to the

modeling of linear systems. For nonlinear system

identification, nonlinear autoregressive models have

been used, as well as models that use a set of polynomial

basis functions (e.g. Volterra functions). Another ap-proach is to use ANNs to model nonlinear systems. An

ANN consists of a series of layers starting with an input

layer, ending with an output layer and having a number

of ‘hidden’ layers in between. Each layer consists of a

series of linear or nonlinear nodes. The output of each

node in a nonlinear layer is a nonlinear transformation

of the weighted sum of the outputs of the nodes of the

previous layer. The weights that connect outputs of onelayer to the next are the parameters that characterize the

system model. System identification using ANNs in-

volves ‘training’ the neural networks using a set of

historical data. Training is an off-line optimization

procedure that modifies the weights of the neural

network in order to minimize the error (in some norm)

between the network output and the desired output

value over the entire data set. A single data set is usually

processed by the optimization procedure for many times

before an acceptable fit is obtained between ANN

output prediction and the actual output measurement.

Each cycle over the data set is termed an epoch. The

trained ANN is then used as a model for prediction and

estimation. Examples of neural network-based ap-

proaches to the modeling of chemical processes are

presented in Schittenkopf, Deco and Brauer (1997),

Kramer (1991), Sentoni, Biegler and Guiver (1999),

Psichogios and Ungar (1994), Lee, Pan and Sung (1999),

Pan, Sung and Lee (2001), McLain and Henson (2000)

and Tholodur, Ramirez and McMillan (2000). Schitten-

kopf et al. (1997) consider the use of principal compo-

nent analysis (PCA) for model reduction in neural

networks. Kramer (1991) uses a network with an

internal ‘‘bottleneck’’ layer of arbitrarily small dimen-

sion to develop a compact representation of input data.

Sentoni et al. (1999) use balanced truncation on a system

with decoupled state transition and input matrices and a

static nonlinearity at the output. Psichogios and Ungar

(1994) use SVD-based methods for ANN structure

regulation and apply these methods to the modeling of

a nonisothermal CSTR. McLain and Henson (2000) use

the method of Kramer in the design of their neural

network, and use the model thus developed in the design

of a nonlinear model reference adaptive controller for a

Fig. 3. ANN structure for nonlinear state-space modeling. Note that each input u (k�/i ) and output y (k�/i ) represents a multivariable vector.

V. Prasad, B.W. Bequette / Computers and Chemical Engineering 27 (2003) 1741�/1754 1743

polymerization CSTR. Tholodur et al. (2000) use

interpolated parameter functions to reduce the number

of experiments used to develop ANN-based models.

None of these studies attempts to create a state-space

model of the nonlinear system.

There are two basic types of supervised ANNs:

feedforward networks and recurrent (or feedback) net-

works. In a feedforward network, the inputs to any layer

consist only of outputs of the preceding layers. In a

recurrent or feedback network, in addition to these

connections, outputs of following layers are fed back

(with a delay) as inputs to the preceding layers. Fig. 1

shows a typical feedforward network and Fig. 2 depicts

a typical recurrent network. Most ANN applications in

dynamic modeling have involved the use of feedforward

networks to identify a model in autoregressive form. In

this case, the inputs to the network are past values of the

system inputs and outputs, and the network output

corresponds to the current system output. While recur-

rent networks provide models in a recursive form and

are suited to provide a state-space description of the

system, their use is restricted due to the additional

degree of difficulty involved in their training. Training

methods for recurrent networks are modified versions of

their counterparts for feedback networks, since a

dynamic element has been introduced into the network

structure by the feeding back of delayed node outputs.

Some examples of recurrent network applications for

state�/space modeling may be found in Eaton, Rawlings

and Ungar (1994), Pan et al. (2001) and Zamarreno and

Vega (1998). Eaton et al. (1994) study a recurrent neural

network with a single feedback connection from the

output to the input layer. Through simulations, they

show that such a model can be used in a model

predictive controller whose stability is assured by having

a minimum prediction horizon. Zamarreno and Vega

(1998) provide a local stability analysis for a recurrent

neural network by performing eigenvalue analysis on the

linearization of the network around its current state. A

particularly relevant reference is Pan et al. (2001), in

which a recurrent neural network is designed and

trained as a state estimator for real time application.

In this case, error feedback is used with the network’s

predictions to provide state estimation. The resulting

ANN is extremely difficult to train, however, since it is a

recurrent network with multiple hidden layers.

Associated with the problem of system identification

is the issue of model reduction, since high order models

are not suited for estimation and control purposes. For

linear systems, balanced realization and truncation

(Zhou, Doyle & Glover, 1996) is a widely used method

for model reduction. Balanced realizations employ the

concept of the controllability and observability gra-

mians and match them to provide an asymptotically

stable and balanced model. This concept does not

translate easily to nonlinear systems, however, since it

is not possible to calculate the observability and

controllability gramians explicitly and accurately in the

presence of nonlinearity. In ANN-based modeling,

higher order models result from an overparameteriza-

tion of the ANN, which implies a larger number of

hidden layer nodes than is required. Most common

techniques for ANN structure regulation involve sensi-

tivity analysis of the network output error to the

removal or addition of a single node (or weight) in the

hidden layers. The node or weight with the least

sensitivity is then removed. This procedure is time-

consuming and inefficient, especially for networks with

multiple hidden layers. Another option is to add a cost

term (penalty) in the error function for training of the

ANN. The cost term links a learning rate with the vector

of weights in the network. Learning by backpropagation

or other training algorithms drives the weights towards

zero because of the cost term, and thereby reduces the

Fig. 4. A representative plot of the type of inputs used for generation

of training and prediction data. The abscissa represents the index of

data points and the ordinate values of the input sequence.


number of free parameters. Deciding on the actual form

of the cost function and the learning rate is a non-trivial

task, however, and is system-dependent.

A more promising approach for ANN structureregulation and reduction of overfitting is the use of

singular value decomposition (SVD) or PCA-based

methods. SVD-based methods have been investigated

by Psichogios and Ungar (1994), Schittenkopf et al.

(1997), Emmerson and Dampier (1993) and Hu, Xue

and Tompkins (1991). All of these studies involve using

SVD as an explicit model selection method to remove

redundant hidden nodes during network training. TheSVD is applied on linear layers, or on nonlinear layers

before the nonlinear transformation is applied. Correla-

tions between the outputs of nodes lead to small singular

values corresponding to those outputs. The nodes

corresponding to the small singular values are redun-

dant and are removed from the network, thus reducing

the degree of overparameterization. Two different

approaches to model reduction are presented in Sentoniet al. (1999) and Kramer (1991). Sentoni et al. use

balanced truncation on a neural network system con-

sisting of a linear input-state relationship. Their for-

mulation is only applicable to systems with static

nonlinearities. Kramer’s technique is concerned solely

with nonlinear PCA, and assumes that full state

information is available, i.e. all states are measured

outputs. A network containing an internal ‘‘bottleneck’’layer of small size is constructed that maps the state

information back onto itself. The outputs of the nodes

of the bottleneck layer represent the reduced

state vector. The choice of the size of the

bottleneck layer is arbitrary, with the only

guideline for model reduction being that the number

of bottleneck nodes is smaller than the number of

original states.

3. System identification and model reduction using

artificial neural networks

A new formulation for system identification using

neural networks is presented in this section. The basic

principle is to train a neural network based on input�/

output data, and to use the dynamic information storedin the outputs of an intermediate hidden layer to

represent the state vector of the model. SVD is then

performed on each of the layers in the network to

remove redundant nodes and thus provide a compact

model. The difference from earlier formulations is in the

use of SVD as an explicit criterion for model reduction

and node truncation, and in the fact that the network

trained on the basis of plant output data is used toprovide an empirical state-space model of the system.

Previous approaches based on developing nonlinear

state-space models from input�/output data using neural

networks have not addressed the issue of overparame-

terization resulting from using an arbitrary number of

nodes in the hidden layers of the ANN. Another feature

of the proposed approach is the use of the state

information from the ANN model to construct a

reduced order model by constructing a state transforma-

tion matrix. This requires information about the states

of the original higher order model.

The state space modeling may, in principle, be

performed using recurrent or feedforward networks. In

the description that follows, the method is described for

the case of a feedforward network. In this situation, the

inputs to the ANN are past plant input and output

values, and the outputs of the ANN are current and

future plant outputs. Future plant outputs are consid-

ered because models used in predictive control need to

be accurate multi-step predictors and not just provide

good one-step ahead prediction. The past window of

system inputs and outputs chosen as ANN inputs must

be of sufficient length in time to capture the essentials of

the plant dynamics.

Fig. 3 presents the framework of the proposed ANN

for nonlinear system identification. The network is

Table 1

Parameter values for bioreactor

mmax 0.55 h�1

KCG 1.20 g l�1

Y 0.45 g g�1

sI �/0.02

p1 2.8 l g�1

p2 4.0 l g�1

rI ,max 1.0 h�1

KCI 9.0 g l�1

rP ,max 1.0 h�1

KCP 6.2 g l�1

fI 0 0.0008 g l�1

KI 0.03 g l�1

Fig. 5. ANN training performance for induced bacterial protein

bioreactor. Output y1 represents the cell mass and y2 represents the

protein concentration. The abscissa is the index of data points, and the

sample time is 0.1 h.

Fig. 6. ANN prediction performance for induced bacterial protein

bioreactor. Output y1 represents the cell mass and y2 represents the

protein concentration. The dotted line represents the ANN predictions

and the solid line the true plant output. The abscissa is the index of

data points, and the sample time is 0.1 h.


composed of two parts, one which maps the input-state

relation (xk�1�/f(xk , uk)) and another which maps the

state-output relation (yk �/g (xk , uk)). Note that non-

linear nodes in Fig. 3 are represented with an enclosed

sigmoid, while linear nodes are left blank. The network

has four hidden layers, with three of them using non-

linear nodes. The output layer is nonlinear. The states

are represented by the outputs of the linear hidden layer,

and are thus a linear combination of the outputs of the

second hidden layer. From the network structure, it is

seen that two nonlinear layers are used in the mapping

of each function, the input-state relation and the state-

output relation. It is known that a network with twononlinear layers is capable of approximating a nonlinear

function of arbitrary order (Weigend, Huberman &

Rumelhart, 1990), which is what motivates the choice of

the network structure. The outputs of the ANN can

include future plant outputs (up to yk�p) and the ANN

can be trained as a multi-step predictor. However, this

will require the inclusion of future plant inputs as ANN

inputs in the hidden layer (along with the states), as wellas keeping a separate state-output network for each

future output (to maintain the condition of causality). In

the examples in this paper, we restrict ourselves to a one-

step ahead predictor, meaning that a single ANN maps

the state-output relation.

The procedure for removal of overfitting in the ANN

model is described below. Consider a layer with non-

linear nodes, where the output of the i th node is givenby:

zi �s

�Xj

Wij � uj�W biasi

�(1)

where W is the matrix of weights that links the previous

layer to the current layer and u is the vector of inputs to

the current layer (outputs of the previous layer). s is the

Fig. 7. ANN prediction performance with reduced order modeling for

induced bacterial protein bioreactor. Output y1 represents the cell mass

and y2 represents the protein concentration. The abscissa is the index

of data points, and the sample time is 0.1 h.

Fig. 8. Prediction performance of transformed reduced state model for

induced bacterial protein bioreactor. Output y1 represents the cell mass

and y2 represents the protein concentration. The abscissa is the time in

hours.


nonlinear transformation, which is taken to be the

hyperbolic tangent function in this case. Wbias representsthe weight associated with the bias input to each node.

The output of the node before the nonlinear transfor-

mation is applied is given by:

zlini �

Xj

Wij � uj�W biasi (2)

Overfitting in terms of the presence of redundant

nodes in a layer may be diagnosed by searching for

correlations between the outputs of the nodes. SVD is a

method for detecting linear correlations and hence an

overdetermined problem. SVD is applied to the outputs

of each layer (before the nonlinear transformation is

applied). This is justified in the sense of removing linearcorrelations in the input data that is presented to each

nonlinear node. The procedure is illustrated for an

arbitrary choice of two consecutive layers in the net-

work, which are referred to as layer 1 and layer 2 for

convenience. The output of layer 1 acts as an input to

the nodes of layer 2. Consider the relation:

SW �Zlin (3)

where S is a matrix containing as columns the outputs

of the nodes of layer 1 for each input data sample, i.e. S

contains values of u for each data sample as its columns.

W is the weight matrix connecting the layers and the

matrix Zlin is the set of outputs of the nodes of layer 2

over all data samples. SVD is then performed on thematrix S to decompose it into the form:

S�USVT (4)

S is a diagonal matrix containing the singular values

of S on its diagonal. SVD may be used as a least squarestool for removal of redundant nodes by removing nodes

that correspond to small or zero singular values. This

implies reducing the size of S, and corresponding

reduction in the size of U and V by removal of columns

and rows to create a lower dimensional matrix S . Since

the dimension of S reflects the number of nodes in layer

1, this reduces the number of nodes in layer 1 and acts as

a mechanism for model reduction. The number ofweight connections in W is reduced correspondingly.

This procedure is applied for all possible layers in the

ANN to reduce overfitting in all the layers. The network

is retrained after the SVD procedure has been applied.

This procedure is carried out in a couple of iterations,

where the ANN is trained and then pruned, and the

process is repeated until further pruning and training do

not significantly change the weights of the network. In astrict sense, SVD should be applied only to spatially

correlated data, which have no temporal correlation.

The data used in our procedure contains temporal

Table 2

Parameter values for distillation column

Hc 30 mol

Hr 30 mol

Ht 20 mol

a 5

Lf 10.0 mol (min)�1

xf 0.5

L 27.3755 mol (min)�1

V 32.3755 mol (min)�1

Fig. 9. ANN training performance for distillation column. Output y1

represents the liquid composition in the reboiler (x1) and y2 represents

x4, the liquid composition in tray 4. The solid line represents the true

plant output and the dotted line the ANN prediction. The abscissa is

the index of data points, and the sample time is 0.1 min.

Fig. 10. ANN prediction performance for distillation column. Output

y1 represents the liquid composition in the reboiler (x1) and y2

represents x4, the liquid composition in tray 4. The abscissa is the

index of data points, and the sample time is 0.1 min.


correlations, since they describe the dynamic behavior of

a system. However, since the SVD procedure is applied

to the entire data set, which includes the entire range of

behavior of the system, it is justified in a practical sense

to use SVD as a model reduction method. This is borne

out in the results described in the next section.

For the case of model reduction, we consider the layer

defining the states of the ANN model. Referring again

to Fig. 3, since the states represented by the output of

the intermediate linear layer are just a linear combina-

tion of the dynamic responses of the previous nonlinear

layer, the outputs of the second nonlinear layer may be

used equally well to describe the states of the system.

SVD applied on the relation between the second and

third hidden layers of the ANN then removes redundant

nodes from the second nonlinear layer and hence

reduces the number of states in the ANN model. Note

that SVD is applied to the relation between the input

layer and the first hidden layer, which ensures that input

noise and redundancy in input data is filtered out.We make a distinction between the proposed method

and two similar approaches. Kramer (1991) uses a three-

hidden layer network to map the system states on to

themselves, with the intermediate hidden layer contain-

ing an arbitrary (small) number of nodes and represent-

ing the reduced order state space. The number of

reduced order states chosen in this case is arbitrary,

and the method applies only when full state information

is available for the original system. Lee et al. (1999) also

use a similar network with three hidden layers to obtain

state information from input�/output data. No attempt

is made to select the optimum network structure in

terms of the number of hidden layer nodes.

For the case where a high-order state space model is

available for a system and needs to be reduced to a lower

order model, an additional procedure is proposed for

construction of a model that is better suited for

estimator design and control. The proposed procedure

Fig. 11. ANN prediction performance with reduced order modeling

for distillation column. Output y1 represents the liquid composition in

the reboiler (x1) and y2 represents x4, the liquid composition in tray 4.

The abscissa is the index of data points, and the sample time is 0.1 min.

Fig. 12. Prediction performance of transformed reduced state model

for distillation column. Output y1 represents the liquid composition in

the reboiler (x1) and y2 represents x4, the liquid composition in tray 4.

The abscissa is the time in minutes.


is a modification of the approach of Lee, Eom, Chung,

Choi and Yang (2000), and involves the creation of a

state transformation matrix that projects the states ofthe original higher-order model to a lower dimensional

space. Using a state transformation matrix is better

suited in terms of retaining and translating information

on model uncertainty and possible disturbance effects

from the original model to the lower order model. The

basic principle involves using a system identification

method to create a reduced order state sequence from

input�/output data generated by the higher order model.The higher order model also generates a state sequence

of its own. Let

XN � [x(1); x(2); . . . ; x(N)]

X N � [x(1); x(2); . . . ; x(N)] (5)

where XN represents the state sequence for the original

higher order model and X N the lower order state

sequence. Lee et al. (2000) suggest the use of linear

subspace identification methods along with a balanced

truncation procedure to create the reduced order state

sequence. We use the ANN based nonlinear modelingand reduction algorithm described earlier in this section

to create the reduced order state sequence. It is known

that linear subspace methods of identification are not

very accurate in the presence of significant nonlinearity,

so a nonlinear identification procedure will providebetter results. Once the state sequences have been

generated, a transformation matrix V is generated using:

V �XNX $N (6)

where X $N denotes the Moore�/Penrose pseudo-inverse

of X N : Once the transformation matrix is constructed,

the original discrete-time nonlinear system:

x(t�1)�f (x(t); u(t))

y(t)�h(x(t); u(t)) (7)

is transformed to:

z(t�1)�V $f (Vz(t); u(t))

y(t)�h(Vz(t); u(t)) (8)

Here, V$ refers to the Moore�/Penrose inverse of V ,and the residual of the reduced state model is orthogo-

nal to the span of V .

4. Results

In this section, simulation results on a number of

nonlinear systems are presented using the techniques

Fig. 13. ANN training performance for higher order distillation

column. Output y1 represents the liquid composition in the reboiler

(x1), y2 the liquid composition in tray 4 and y3 that in tray 6. The solid

line represents the true plant output and the dotted line the ANN

prediction. The abscissa is the index of data points, and the sample

time is 0.1 min.

Fig. 14. ANN prediction performance for higher order distillation

column. Output y1 represents the liquid composition in the reboiler

(x1), y2 the liquid composition in tray 4 and y3 that in tray 6. The

abscissa is the index of data points, and the sample time is 0.1 min.


described in the previous section. The conditions under

which the simulations are performed are described

below.

For convenience, feedforward networks are used in

the ANN models for the simulations. This is not a

restriction imposed by the identification and model

reduction method, but feedforward networks are used

for ease of training the network. The Levenberg�/

Marquardt algorithm is used as the nonlinear optimiza-

tion technique in network training. The training criter-

ion chosen is the mean squared error (MSE) of the

output predictions over the entire data set. Training of

the ANN is stopped when the MSE falls below a

specified tolerance or a maximum number of training

epochs is exceeded (whichever applies first). In all the

cases, a nonlinear ODE system is used to simulate the

plant. Data generated from the plant simulations is

divided into two sets, one for training and the other for

prediction. The data itself is generated using an input

sequence that is generated in a manner similar to that of

Sentoni et al. (1999). The probability that an input

changes is sampled from a normal distribution. The

input value is changed only if the sampled value is

greater than a threshold value. The magnitude of the

input change is sampled from a uniform random

distribution that is scaled to lie between a specified

maximum and minimum. A representative input plot is

shown in Fig. 4. In all simulations, Gaussian noise was

applied to the simulated plant outputs. The magnitude

of the noise represents a standard deviation of about 2%

of the characteristic steady state value of the output. In

the plots, however, only the ‘true’ noiseless plant out-

puts are shown. Also, in all cases, conditioning of the

input and output data is performed by scaling each

input and output by its mean value. The scaled data is

presented in all the figures. Since the data is scaled, the

MSE quoted for all the simulations is dimensionless.

The performance criterion for the identification

and model reduction procedure is the accuracy

of prediction of the outputs for new data that

the ANN was not trained on. This is the most

relevant test of the modeling procedure, since the model

is to be used for estimation and control, and accurate

prediction of the controlled/measured outputs is re-

quired.


5. Batch bioreactor

The first set of simulations involves the modeling of a

nonlinear batch bioreactor system presented in Tholo-dur et al. (2000). The model equations are given below

and describe foreign protein production in an inducible

bacterial system.

X �mX (9)

G��mX

Y(10)

I ��rI X (11)

P�rPX (12)

Here, X , G , I and P are the cell mass, glucose,

inducer and protein concentrations, respectively. The

system parameters are given by:

m�mmaxG

KCG � G

�sI � p1I

sI � p2I

�(13)

rI �rI ;maxI

KCI � I(14)

rP��

rP;maxG

KCP � G

��fI0 � I

KI � I

�(15)

The parameter values used are the same as those in

Tholodur et al. and are given in Table 1. In these

simulations, input excitation is not an issue due to thebatchwise nature of the process. Training data is

generated by using data from the start and end of the

batch run, and prediction is attempted using the data in

the intermediate period for testing. The sample time for

the system is 0.1 h. Fig. 5 shows the training perfor-

mance using a neural network that has the cell mass and

protein concentration as outputs. Only one-step ahead

prediction is shown, and inputs and states over two pastsampling instants were used to construct the inputs to

the ANN. The network was trained in 15 epochs using

an error stopping criterion of 10�8. The number of

nodes in each layer of the ANN is given by [4 8 8 8 8 2],

with the hidden layers each having eight nodes, and the

dimension of the inputs and outputs being 4 and 2,

respectively. Using the model structure of Fig. 3, this

implies an eight state system model. Note that the step-like change in the outputs is not a real system response,

it is indicative of the change in the nature of the data set

from the start to the end of the run. Fig. 6 shows the

prediction performance of the network. It is seen that

the ANN predictions diverge significantly from the

actual outputs for a large portion of the prediction

data. Model reduction using SVD on the ANN is now

performed. The reduced order model had the followingnumbers of nodes in its layers [4 4 3 3 4 2], which implies

a three state model. Fig. 7 shows the prediction

performance of the reduced order model. An excellent

fit to the output data is achieved. This indicates that the

system may be accurately modeled using a three-state

nonlinear ANN model. As a final exercise, the state

sequence generated by the reduced order ANN for thetraining data is used along with the corresponding state

sequence for the original system model to generate the

transformation matrix V . The transformed system is

simulated and compared against the original system,

and the results are shown in Fig. 8. The original and

reduced system outputs match almost perfectly, and so

the transformed three state system may be used as a

basis for estimator and controller design instead of theoriginal higher order system.

6. Distillation column

The second set of simulations involves a distillation

column described by Rasmussen and Jorgensen (1999).The tower consists of five trays including a total

condenser and reboiler. The system equations are given

by:

Hr

dx1

dt�(L�Lf )(x2�x1)�V (x1�y1) (16)

Ht

dx2

dt�(L�Lf )(x3�x2)�V (y1�y2) (17)

Ht

dx3

dt�Lf xf �Lx4�(L�Lf )x3�V (y2�y3) (18)

Ht

dx4

dt�L(x5�x4)�V (y3�y4) (19)

Hc

dx5

dt�V (y4�x5) (20)

yi �axi

1 � (a� 1)xi

(21)

The measured outputs are the liquid composition in

the reboiler (x1) and that in tray 4 (x4). The parameter

values are given in Table 2. The network once again uses

the inputs and outputs at the past two sampling instants

as inputs, and the next-step output as its output. This

means that there are four input and two output nodes.

Each of the hidden layers has ten nodes to begin with.The training performance of this ANN is shown in Fig.

9. The network was trained over 200 epochs using 200

data points resulting in a MSE of 0.006. The sample

time used in the generation of the data points was 0.1

min. The prediction performance of the ANN for a

different set of 200 data points is shown in Fig. 10. The

MSE over all outputs for the prediction data is 0.0445.

Fig. 11 shows the prediction performance of a reducedorder ANN obtained by performing SVD on the layers

of the original ANN and then retraining the network. It

is seen that the reduced order model provides much


for high order distillation column. Output y1 represents the liquid

composition in the reboiler (x1), y2 the liquid composition in tray 4 and

y3 that in tray 6. The abscissa is the index of data points, and the

sample time is 0.1 min.

Table 3

Parameter values for polymerization reactor

Dap 5.871�/106

Dad 3.6447�/1011

b 1.3

W 1.013�/10�7

x1 1.206

x3 0.01424

x1f 1.286

x3f 0.01429

gp 6.846

gd 6.892

B 0.3635

x2c 0

x2 0.08653

x4 1.865

x2f 0

x4f 1.865

Fig. 16. ANN training performance for polymerization reactor. Out-

put y1 represents the reactor temperature and y2 the initiator

concentration. The solid line represents the true plant output and the

dotted line the ANN prediction. The abscissa is the index of data

points, and the sample time is 0.5 dimensionless units.


more accurate predictions for the system outputs

compared with the original ANN. The reduced order

ANN has nodes given by [4 6 4 4 4 4], which implies that

a four state model is used in the prediction. The MSE of

the reduced order ANN for the prediction data is

0.0079, almost an order of magnitude lower than thatof the original model. Fig. 12 compares the response of a

transformed reduced state system (obtained using the

state sequences of the original system and the reduced

order ANN) with the actual plant response. The reduced

order model again matches the system outputs extremely

accurately.

7. Higher-order distillation column

The plant used in this set of simulations is a higher

order distillation column obtained by adding two stages

each in the stripping and rectifying section. This resultsin a nine state system and purer components at the top

and bottom of the column. The other system parameters

remain the same as in the lower-order distillation

column. Once again, the past two sampling instants

are considered for network inputs, and one-step-ahead

prediction is considered. Each of the hidden layers is

initially considered to have 12 nodes. The measured

outputs of the system are the liquid composition in thereboiler and the liquid composition in trays 4 and 6. The

feed is now introduced in tray 5. The ANN has six

inputs and three outputs. The network was trained over

Fig. 17. ANN prediction performance for polymerization reactor.

Output y1 represents the reactor temperature and y2 the initiator

concentration. The abscissa is the index of data points, and the sample

time is 0.5 dimensionless units.


for polymerization reactor. Output y1 represents the reactor tempera-

ture and y2 the initiator concentration. The abscissa is the index of

data points, and the sample time is 0.5 dimensionless units.


200 epochs over a data set of 200 samples, with aprediction data set of similar size also being created. The

sampling time is kept the same as for the lower-order

column. Fig. 13 shows the trained outputs of the ANN.

The MSE on the training data is 0.5266. The prediction

performance is shown in Fig. 14. In this case, the

original ANN provides reasonably accurate predictions,

with the MSE for prediction data being 0.8643. Upon

model reduction using SVD, the number of nodes ineach layer is given by [5 5 4 4 3 3], meaning that one of

the input directions was correlated with the others and

thus was redundant. Also, a four state model is

sufficient for output prediction for the full nine state

system. The prediction performance of the reduced state

model is shown in Fig. 15. The MSE in this case is

0.7933, again proving to be more accurate than the

original overdetermined ANN model.

8. Polymerization reactor

The final set of simulations involves a CSTR used for

the free-radical polymerization of methyl methacrylate.The system equations are given below, and the para-

meter values given in Table 3 are the same as those used

by McLain and Henson (2000).

dx1

dt�x1f �x1�DapW (x)x1Ex(x2) (22)

dx2

dt�x2f �x2�BDapgpW (x)x1Ex(x2)�b(x2c�x2)

(23)

dx3

dt�x3f �x3�Dadx3Exd (x2) (24)

dx1

dt�x4f �x4 (25)

Ex(x2)�exp

�x2

1 �x2

gp

�(26)

Exd(x2)�exp

�gdx2

1 �x2

gp

�(27)

x1 represents the monomer concentration, x2 the

reactor temperature, x3 the initiator concentration and

x4 the solvent concentration, all in dimensionless vari-

ables. x2 and x3 are the measured system outputs. Onceagain, system inputs and outputs at the past two

sampled instants are used to construct inputs and

outputs to the ANN. This means that there are four

Table 4

System identification results

System System states ANN states Training MSE Prediction MSE Reduced states Prediction MSE

Batch bioreactor 4 8 1.0e�/8 0.1504 3 0.0130

Distillation column 5 9 0.006 0.0445 4 0.0079

Distillation column 9 12 0.5266 0.8643 4 0.7933

Polymerization reactor 4 8 0.0050 0.3415 3 0.0200


input nodes and two output nodes. The first hiddenlayer has 15 nodes, and all the others have eight nodes,

respectively. 400 data points are used in the training

data set, and a similar number are used in the prediction

data set. The sampling time is chosen to be 0.5

dimensionless units. The network was trained over 200

epochs. Fig. 16 shows the training performance of the

ANN. The MSE for the training data is 0.0050. Fig. 17

shows the prediction performance of the ANN. It can beseen that the ANN predictions are not very accurate,

with the MSE of the prediction being 0.3415. Upon

application of SVD, the reduced order ANN contains

hidden layers with 14, 3, 5 and 4 nodes, respectively,

implying a three state reduced order model. The

prediction performance of the reduced order model is

presented in Fig. 18. The predictions are much more

accurate, especially for the reactor temperature. Under-predictions of the deviations in the initiator concentra-

tion are observed. The MSE for the predictions in this

case is 0.0220, which is again a sizable improvement

over the original ANN model, even though greater

accuracy is desirable, perhaps by further training of the

network.

A summary of the results for the examples presented

above is given in Table 4. In all cases, the predictionMSE for the reduced order model is smaller than the

prediction MSE for the original ANN model, demon-

strating the efficacy of the model reduction techniques.

9. Summary

A technique for nonlinear system identification and

model reduction using ANNs has been proposed anddemonstrated in this paper. The method uses an ANN

to model input�/output data, with the system states

being given by the outputs of a hidden layer. Model

reduction is achieved by reducing overparameterization

of the ANN by removing redundant nodes using a SVD

based technique. In all the examples considered, the

reduced model ANN provides equivalent or improved

output prediction over the original overparameterizednetwork. Significant reduction in the number of states in

the model is also achieved. The state sequence of the

ANN model is used to construct a state transformation

matrix which maps the original higher order system into

a lower order state space and is a useful tool for model

reduction. Extension of the method for use in recurrent

networks is relatively simple, and is desirable because

nonlinear discrete state-space models are represented

compactly in such a form. This comes at the expense of

posing a more difficult optimization problem in the

training of the ANN.

References

Eaton, J. W., Rawlings, J. B., & Ungar, L. H. (1994). Stability of

neural net based model predictive control. In Proceedings of the

American control conference (pp. 2481�/2485).

Emmerson, M. D., & Dampier, R. I. (1993). Determining and

improving the fault tolerance of multilayer perceptrons in a

pattern�/recognition application. IEEE Transactions on Neural

Network 4 , 788�/793.

Hu, Y. H., Xue, Q., & Tompkins, W. J. (1991). Structural simplifica-

tion of a feed-forward, multi-layer perceptron artificial neural

network. In Proceedings of the international conference on acoustics,

speech and signal processing (pp. 1061�/1064).

Kramer, M. A. (1991). Nonlinear principal component analysis using

autoassociative neural networks. American Institute of Chemical

Engineering Journal 37 , 233�/243.

Lee, K. S., Eom, Y., Chung, J. W., Choi, J., & Yang, D. (2000). A

control-relevant model reduction technique for nonlinear systems.

Computers and Chemical Engineering 24 , 309�/315.

Lee, J. H., Pan, Y., & Sung, S. W. (1999). A numerical projection-

based approach to nonlinear model reduction and identification. In

Proceedings of the American control conference (pp. 1568�/1572).

McLain, R. B., & Henson, M. A. (2000). Principal component analysis

for nonlinear model reference adaptive control. Computers and

Chemical Engineering 24 , 99�/110.

Overschee, P. V., & Moor, B. D. (1994). N4SID: subspace algorithms

for the identification of combined deterministic�/stochastic systems.

Automatica 30 , 75�/93.

Overschee, P. V., & Moor, B. D. (1995). A unifying theorem for three

subspace system identification algorithms. Automatica 31 , 1853�/

1864.

Pan, Y., Sung, S. W., & Lee, J. H. (2001). Data-based construction of

feedback-corrected nonlinear prediction model using feedback

neural networks. Control Engineering Practice 9 , 859�/867.

Psichogios, D. C., & Ungar, L. H. (1994). SVD-NET: an algorithm

that automatically selects network structure. IEEE Transactions on

Neural Networks 5 , 513�/515.

Rasmussen, K. H., & Jorgensen, S. B. (1999). Parametric uncertainty

modeling for robust control: a link to identification. Computers and



Schittenkopf, C., Deco, G., & Brauer, W. (1997). Two strategies to

avoid overfitting in feedforward networks. Neural Networks 10 ,

505�/516.

Sentoni, G. B., Biegler, L. T., & Guiver, J. P. (1999). Model reduction

for nonlinear DABNet models. In Proceedings of the American

control conference (pp. 2052�/2056).

Tholodur, A., Ramirez, W. F., & McMillan, J. D. (2000). Interpolated

parameter functions for neural network models. Computers and


Weigend, A. S., Huberman, B. A., & Rumelhart, D. E. (1990).

Predicting the future: a connectionist approach. International

Journal of Neural Systems 1 , 193�/209.

Zamarreno, J. M., & Vega, P. (1998). State space neural

network. Properties and application. Neural Networks 11 , 1099�/

1112.

Zhou, K., Doyle, J. C., & Glover, K. (1996).

Robust and optimal control . Englewood Cliffs, NJ: Prentice

Hall.

Documents

Nonlinear system identification and model reduction using artificial neural networks