A real time hydrological forecasting system using a fuzzy clustering approach

Computers & Geosciences 29 (2003) 1111–1117

A real time hydrological forecasting system usinga fuzzy clustering approach

A. Luchetta, S. Manetti*

Department of Electronics and Telecommunications, University of Florence, Via S. Marta, 3, Florence 50139, Italy

Received 13 February 2002; received in revised form 19 March 2003; accepted 7 May 2003

Abstract

A new technique to predict extreme and rare situations of hydrometric levels in hydrological basins is presented in

this paper. A fuzzy logic approach has been exploited for the adaptive clustering of input data and for the forecasting

model. The methodology has been developed, in collaboration with an Italian manufacturer of meteorological and

environmental sensing equipment, for the design of a system prototype to be installed in the ‘‘Padule di Fucecchio’’

basin in Middle-North of Italy. All the presented data come from monitoring equipments installed in this basin. The

effectiveness of the method has been evaluated by comparing the performance to that obtained with a neural network

forecasting approach.

Keywords: Fuzzy logic; Time series; Flood forecasting; Clustering

1. Introduction

A fundamental aspect of many hydrological studies is

the problem of forecasting the rate of water flow (or

analogously the level) of a river in a given point of its

course. Many works have been developed on this topic

for many years, often by developing a complex models

of the river basin (Bras and Kitanidis, 1980a, b). During

the last decade the artificial neural networks and fuzzy

logic techniques have become popular in data forecast of

time series, particularly in applications in which the

deterministic approach presents serious drawbacks, due

to the noisy or random nature of the data. On the other

hand, both fuzzy logic and neural network approaches

require the support of large historical archives of data to

be exploited.

These learning-based approaches, which can be

considered an alternative to classical methods for flood

forecasting, exploit the statistical relationships between

the hydrologic inputs and outputs without explicitly

considering the physical process relationships that exist

between them. Examples of stochastic models used in

hydrology are the autoregressive moving average models

(ARMA) of Box and Jenkins (1976) and the Markov

method (Yakowitz, 1985; Yapo et al., 1993). ARMA

models work on the assumption that an observation at a

given time is predictable from its immediate past, i.e., it

is a weighted sum of a series of previous observations.

Markov methods also rely on past observations but the

forecasts consist of the probabilities that the predicted

flow will be within specified flow intervals, where the

probabilities are conditioned on the present state of the

river. Other works exist in very close fields (Baglio et al.,

1996) or similar ones (Binaghi et al., 1997; Zardecki,

1997; Hadjimichael et al., 1996). Although neural

networks were historically inspired by the biological

functioning of the human brain and fuzzy logic by the

attempt to simulate human ‘‘vagueness’’ of reasoning, in

practice many characteristics of these approaches, such

as the ability to learn and generalise, the ability to cope

with noise, the distribute processing, which maintains

robustness, can be of great help in many engineering

ARTICLE IN PRESS

*Corresponding author. Tel.: +39-055-4796282; fax: +39-

055-4796442.

E-mail address: manetti@ing.unifi.it (S. Manetti).

doi:10.1016/S0098-3004(03)00137-7

tasks (see Openshaw and Openshaw, 1997) for an

overview of the applications of artificial intelligence

in the geophysical data field and human geographical

problems. Relating to the same philosophy of treating

data, important works have been made with the use

of data fusion, i.e. the operation of combining informa-

tion from multiple sensors and data sources, by

eventually exploiting the potential of several alternative

models such as neural networks, fuzzy logic, genetic

algorithms (See and Abrahart, 2001). Moreover, in

general, these techniques can be included in the overall

concept of soft computing approaches (Openshaw and

See, 1999).

This paper presents a fuzzy logic approach to the

forecasting of hydrological levels, particularly suitable

to cope with extreme situations, by setting different rules

for trivial and rare situations. The mechanism of

partitioning the input space into fuzzy subsets is not

new and developed in the fuzzy adaptive resonance

theory (Fuzzy ART) by Carpenter et al. (1991), but our

approach is quite different.

A 4-year time series of historical data of rainfall and

river levels from several meteorological stations of

the ‘‘Padule di Fucecchio’’ basin were used. Several

trials have been made with the aim of optimizing the

use of these data to forecast the future trend of the

basin. Simple data extractor (a conventional ARMA

model) were used in the past in the installed version

of forecaster in the same basin, but they had strong

difficulties in coping with extreme situations. A

classic neural network backpropagation approach,

eventually adjusted online, as in a previous work on

the prevision of ice formation on road paving (Luchetta

et al., 1998), has shown, in this specific case, the

drawback to tend to a zero order predictor (a low

value of the coefficient of persistence, as it will be shown

later); then the efforts have been addressed toward the

use of a more efficient fuzzy clustering corrected system,

that will be described in Sections 2 and 3. The original

system has been finally modified (Section 4) in order to

better take into account situations rare to occur but

primary to face. The results obtained are presented in

Section 5.

2. The fuzzy logic basic system

As shown in Wang and Mendel (1992), given N

input–output pairs ð%xi; yiÞ; a fuzzy basis function system,

constructed using a centroid defuzzifier, a singleton

fuzzifier, product inference and Gaussian membership

functions, can be represented as

f ð%xÞ ¼

PNi¼1 yie

�j%x�

%xi j2=s2PN

i¼1 e�j

2=s2: ð1Þ

An adequate choice of the parameter s will match any

N input–output pairs to a given accuracy. Moreover s is

a smoothing parameter: if s decreases the matching

error decrease but f ðxÞ becomes less smooth and the

generalization capabilities may deteriorate. A good scan be determined by trial and error.

In our application, the number of available input–

output pairs is very high, having at least 1 year of data,

with samples at every hour. These historical data, which

are used as training data to build the prediction system,

do not have all the same importance. In fact, the

peculiarities of this specific application must be con-

sidered: the river level remains nearly constant (a

slow and regular decrease during no-rain periods), and

show abrupt and fast growth in correspondence of

upstream rain. On the other hand, it should be essential

that a forecast system yields correct answers just

during these anomalous circumstances, because of the

fundamental safety-driven scope of this kind of applica-

tion. The behavior of the system can be seen to be

seasonal, but not really periodic. Starting from these

considerations, a new clustering approach has been

introduced.

3. The fuzzy clustering corrected system

Given a set of N training pairs ð%xi; yiÞ; a modified

version of the nearest neighborhood clustering s

cheme is developed in accordance with the following

steps:

1. The first i/o pair ð%x1; y1Þ; is used to locate the first

cluster center%xo ¼

%x1: Moreover let A1ð1Þ ¼ y1 and

B1ð1Þ ¼ 1 be two parameters of the system, used to

tune its behavior; following this rule for any i/o pair

(i.e. Ai ¼ yi and Bi ¼ 1 8 i) the fuzzy system in Eq. (1)

could be simply re-written as

f ðxÞ ¼PN

i¼1 Ai e�j

%xi j2=s2PN

i¼1 Bi e�j%x�

Finally a radius r must be chosen. r is a real

number, and is a measure of distance in the

space of%xi; it is chosen in an heuristic way,

after several trials with the available dataset. It

can be noted that the radius r determines the

complexity of the fuzzy system; that is for a smaller

radius r we have more clusters and a more accurate

nonlinear regression with a higher computation

effort.

2. At step h let us suppose to have Z clusters, with

centers at%x01;%x02;y;

z : When the successive pair

ð%xh; yhÞ is considered, the distances between

%xh and Z

cluster centers are computed and the smallest is

ARTICLE IN PRESSA. Luchetta, S. Manetti / Computers & Geosciences 29 (2003) 1111–11171112

stored in memory, j%xh �

shj: At this point, there are

two possible conditions:

2.1. j%xh �

shj > r: In this case a new cluster is

introduced and%xh is chosen as new cluster

center%x0

z þ 1 ¼%xh; besides AZþ1ðhÞ ¼ yh;

BZþ1ðhÞ ¼ 1: All the other parameters are

maintained.

2.2. j%xh �

shjor: In this case%xh belongs to the sth

cluster, whose center is adjusted according to

the new element value, in this way:%x0

sh ¼ ð%xh þ

shÞ=2: On the other hand, This adjustment

could result in the exclusion of some cluster

elements, so this possibility is evaluated for all

elements of cluster s, and if it happens, the cut-

off element is selected as a new cluster center

(z þ 1 cluster) and step 2.1 is applied.

2.3. For the sth cluster the parameters are adjusted

in the following way:

AsðhÞ ¼ Asðh � 1Þ þ yh; ð2Þ

BsðhÞ ¼ Bsðh � 1Þ þ 1: ð3Þ

All the other parameters are maintained un-

changed.

3. When all input–output pairs have been processed, a

global reordering of all clusters is performed, in order

to avoid cluster superimposition. In fact, due to

center adjustment, some center pairs could approach

each other to less than r: Then, the following steps

are followed:

3.1. Starting by the first cluster, the distance between

each pair of centers is evaluated. Let us suppose

that a distance j%x0

n �%x0

mjor is found, that is the

centers of nth and mth clusters are closer than r;3.2. A new center

n ¼ ð%x0

n þ%x0

mÞ=2 is established for

the cluster n; all the elements of cluster m are

added to cluster n; the parameters of cluster n

are adjusted in the following way:

An ¼ An þ Am;

Bn ¼ Bn þ Bm

and the cluster m is deleted;

3.4. At this point, in a way analog to the step 2.2 the

cluster n is recalculated to take into account the

elements of cluster that fall out of cluster

boundaries.

4. The output of the fuzzy system is computed as

f ðxÞ ¼PZ

i¼1ðAi=BiÞ e�j

2=s2PZi¼1 e

�j%x�

%xi j2=s2

: ð4Þ

Let us highlight that expression Eq. (4) does not take

back exactly expression Eq. (1), but it has been chosen

following the considerations and the demonstration

given in the next section.

4. The rare event adjustment

The main purpose of forecasting future data of a time

series whose elements are the levels of a river in a given

point is to predict, as early as possible, ‘‘rare events’’ or

catastrophic events.

Let us introduce the ‘‘rare event’’ definition. Suppose

that the given N pairs of input–output samples ð%xi; yiÞ

are subdivided into two classes: the former includes the

Nf frequent events ð%xif ; yif Þ; the latter one the Nr rare

events ð%xir; yirÞ: A frequent event is an event for which

belongs to a cluster whose center is not far from the

others of the same family more than a given Rxf and yir

is not far from the others of the same family more than a

given Ryf : All the other events are rare. Finally, it is

obviously assumed that Nf bNr:Following the previous definition, the two expressions

(1) and (4) can be re-written in this way:

f1ðxhrÞ ¼yhr þ S1 þ S2

1þ S3 þ S4; ð5Þ

f2ðxhrÞ ¼yhr þ ðNr=Nf ÞS1 þ S2

1þ ðNr=Nf ÞS3 þ S4ð6Þ

using the following hypothesis and simplifications:

1. The summations are indicated as

S1 ¼PNf

i¼1 yfi e�j

%xfi j

; S2 ¼PNr

i¼1; iahr yri e�j

%xri j2=s2

S3 ¼PNf

i¼1 e�j

%xfi j

; S4 ¼PNr

i¼1; iahr e�j

%xri j2=s2

2. The summations are subdivided in rare and frequent

components;

3. In Eq. (5) the clustering operation is omitted;

4. Two expressions Eq. (5) and Eq. (6) are evaluated at

the rare event ð%xhr; yhrÞ:

At this point it is easy to demonstrate that the form f2of the fuzzy logic system (Eq. (5) and, analogously,

clustered Eq. (4)) is a better approximation of a rare

event with respect to the starting system 1 and 6, which,

on the other hand, yield better performances in the

approximation of frequent events. In fact, re-writing the

two previous expressions in the form:

f1ðxhrÞ ¼yhr

1þ S3 þ S4þ

S1 þ S2

1þ S3 þ S4and

f2ðxhrÞ ¼yhr

1þ ðNr=Nf ÞS3 þ S4

þðNr=Nf ÞS1 þ S2

1þ ðNr=Nf ÞS3 þ S4

recalling that Nr=Nf 51 and taking into account that C

and D cannot be negative, we have that

1þ ðNr=Nf ÞS3 þ S4� yhr

��o yhr

1þ S3 þ S4� yhr

��

ARTICLE IN PRESSA. Luchetta, S. Manetti / Computers & Geosciences 29 (2003) 1111–1117 1113

and that:

ðNr=Nf ÞS1 þ S2

1þ ðNr=Nf ÞS3 þ S4o

S1 þ S2

1þ S3 þ S4:

It would be furthermore possible to demonstrate that

both systems can approximate all the N input–output

pairs to any given accuracy.

5. A case study

For the ‘‘Padule di Fucecchio’’ basin (Fig. 1), 4 years

of data were available. These data consist of archives of

1-h step samples, containing rainfall for precipitation

stations and levels for river stations are indicated on the

map in Fig. 1.

ARTICLE IN PRESS

Fig. 1. Location map of Padule di Fucecchio Basin.

A. Luchetta, S. Manetti / Computers & Geosciences 29 (2003) 1111–11171114

Half of the archive, i.e., 2 years of data, have been

used in the construction of the fuzzy system, while the

remaining 2 years have been used to test it. The

parameter values heuristically chosen are a radius r ¼ 1

and a s ¼ 7:5: The complete schematic of the fuzzy

system is shown in Fig. 2.

In order to evaluate the efficiency of the forecast, six

performance criteria have been introduced and investi-

gated. They are the following:

1. Mean squared error:

MSE ¼1

2ðyoi � ypiÞ

2. Coefficient of variation of the error residuals:

CVRE ¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNp

i¼1ðyoi � ypiÞ2

yo ¼PNp

i¼1 yoi

3. Ratio of relative error:

RREM ¼PNp

i¼1ðyoi � ypiÞnyo

4. Ratio of absolute error:

RAEM ¼PNp

i¼1 jyoi � ypi jnyo

5. Phasing coefficient of timing error or coefficient of

persistence:

PEðhÞ ¼ 1�PNp

i¼1ðyoi � ypiÞ2PNp

i¼1ðyoi � ypði�hÞÞ2;

where h is the prediction depth in the future.

6. Coefficient of efficiency:

CE ¼So � S

; So ¼XNp

ðyoi � yoiN Þ2;

S ¼XNp

ðyoi � ypiÞ2; yoiN ¼

Pik¼1 yok

In each definition the subscript ‘‘o’’ means observed

data, the subscript ‘‘p’’ means predicted data. The

wide set of error parameters has been introduced in

order to obtain the best evaluation of the system and

to provide an exhaustive comparison with other possible

forecasting systems. In order to avoid an excessive

and deceptive reduction in the error values, due to the

large amount of data of frequent events, only the

difference over a given threshold has been considered

and taken into account in the error calculations. The

threshold has been heuristically chosen of 0.1mt for the

specific case, but it can be adjusted for different

requirements.

The first four error criteria are well known. The first

one is the classical mean squared error; the second one is

the coefficient of variation of the error residuals, which

gives information about the variability of the errors on a

relative, unitless basis. The third and fourth are simply

the ratios of relative and absolute errors.

The latest two criteria, the coefficient of persistence

and the coefficient of efficiency, are a little more

particular. The coefficient of persistence is interesting

in order to compare the prediction of the fuzzy engine

with one obtained by assuming a Wiener process. In this

latter case the variance increases linearly with time and

the best estimation is that given by the latest measure-

ment. The coefficient of efficiency estimates the effi-

ciency of the forecaster as a proportion of the variance

of the observed data So accounted for by the system, by

ARTICLE IN PRESS

Fig. 2. Forecast fuzzy system.

A. Luchetta, S. Manetti / Computers & Geosciences 29 (2003) 1111–1117 1115

means of the measure of association between the

predicted and observed data S:Table 1 reports the given error criteria values for two

different prediction steps: 3 and 6 h (test is performed on

a different data set of training). Results are given for the

proposed fuzzy system and for a backpropagation two

hidden layer neural network with 16 neurons in the first

hidden layer and eight in the second. Figs. 3 and 4 show

the resulting hydrographs, compared with the actual

values, respectively, for 3 and 6 h forecast lead times, in

an abrupt variation zone. Note that the first one (Fig. 3,

3 h lead time) does not show big differences between the

two methods, whereas with a 6 h lead time (Fig. 4) only

the fuzzy system follows the rising edge of the

hydrograph very well. The software package able to

query the database and to implement the proposed

algorithm has been completely developed in C++

language for PC, Windows OS and installed in the basin

station.

6. Conclusions

A new fuzzy-logic-based algorithm has been devel-

oped for the forecasting of hydrological basins.

A prototype of the described system has been installed

for the forecasting of a river level in ‘‘Padule di

Fucecchio’’ basin, in Middle-North of Italy, where it is

working in an experimental stage. The proposed

approach does not claim to represent an exhaustive

methodology for the treatment of hydrological datasets,

but can be a useful tool for estimating a time series of

basin levels by means of a fuzzy interpretation. The

choice of a new simple fuzzy system with adjusted

clustering of input data has been suggested by the

particularly good behavior of the approach when time

series data under analysis are flat enough in almost all

cases and they present a limited number out of the

ordinary values. The use of an application devoted

‘‘black box’’ model, based on a fuzzy logic approach,

allows to avoid the very expensive work related to the

development and the use of a complete hydrological

model of the basin, exploiting the available large

historical dataset.

A comparison with another classical black box

approach, based on neural networks, has shown a better

performance of the proposed technique in this specific

application.

The enhancement of the method, by using other

measurements, in a more complete approach, will be the

subject of the future work.

ARTICLE IN PRESS

Table 1

Error values of neural and fuzzy predictors (validation period)

Neural (3 h) Neural (6 h) Fuzzy (3 h) Fuzzy (6 h)

MSE 0.06028 0.09642 0.03285 0.0833

CVRE 0.02375 0.02119 0.02294 0.02179

RREM 0.06824 �0.02218 0.01413 0.01665

RAEM 0.12961 0.16329 0.0939 0.16978

CE �0.97236 �0.94356 �0.98487 �0.95091

PE 0.37621 0.41118 0.65864 0.48878

Fig. 3. Three hours step forecasting.

A. Luchetta, S. Manetti / Computers & Geosciences 29 (2003) 1111–11171116

Acknowledgements

The authors would like to thanks ETG s.r.l. of

Firenze, Italy, the manufacturer of monitoring equip-

ment, for the collaboration given.

References

Baglio, S., De Pietro, R., Fortuna, L., Graziani, S., 1996.

Neural networks to estimate hydrographic basins evolution.

In: Proceedings of the ICNN’96, IEEE International

Conference on Neural Networks, Washington, D.C.,

USA, pp. 1818–1823.

Binaghi, E., Madella, P., Montesano, M.G., Rampini, A., 1997.

Fuzzy contextual classification of multisource remote

sensing images. IEEE Transactions on Geoscience and

Remote Sensing 35 (2), 326–340.

Box, G.E.P, Jenkins, G.M., 1976. Time Series Analysis:

Forecasting and Control. Holden-Day, Oakland, CA.

Bras, R.L., Kitanidis, P.K., 1980a. Real-time forecasting with a

conceptual hydrologic model—applications and results.

Water Resources Research 16 (6), 1034–1044.

Bras, R.L., Kitanidis, P.K., 1980b. Real-time forecasting with a

conceptual hydrologic model—analysis of uncertainty.

Water Resources Research 16 (6), 1025–1033.

Carpenter, G.A., Grossberg, S., Rosen, D.B., 1991. Fuzzy

ART: fast stable learning and categorization of analog

patterns by an adaptive resonance system. Neural Networks

4 (6), 759–771.

Hadjimichael, M., Kuciauskas, A.P., Brody, L.R., Bankert,

R.L., Tag, P.M., 1996. MEDEX: a fuzzy system for

forecasting Mediterranean gale force winds. In: Proceedings

of the Fifth IEEE International Conference on Fuzzy

Systems, New Orleans, USA, pp. 529–534.

Luchetta, A., Manetti, S., Francini, F., 1998. Forecast: a neural

system for diagnosis and control of highway surfaces. IEEE

Intelligent Systems 13 (3), 20–26.

Openshaw, S., Openshaw, C., 1997. Artificial Intelligence in

Geography. Wiley, London.

Openshaw, S., See, L., 1999. Applying soft computing

approaches to river level forecasting. Hydrological Sciences

Journal 44 (5), 763–778.

See, L., Abrahart, R.J., 2001. Multi-model data fusion for

hydrological forecasting. Computers & Geosciences 27 (8),

987–994.

Wang, L.X., Mendel, J.M., 1992. Fuzzy basis functions,

universal approximation, and orthogonal least squares

learning. IEEE Transactions on Neural Networks 3 (5),

807–814.

Yakowitz, S.J., 1985. Markov flow models and the flood

warning problem. Water Resources Research 21, 81–88.

Yapo, P., Sorooshian, S., Gupta, V., 1993. A Markov chain

flow model for flood forecasting. Water Resources Research

29, 2427–2436.

Zardecki, A., 1997. Fuzzy control for forecasting and pattern

recognition in a time series. In: Proceedings of the

Third IEEE Conference on Computational Intelligence,

New York, USA, pp. 1815–1819.

ARTICLE IN PRESS

Fig. 4. Six hours step forecasting.

A. Luchetta, S. Manetti / Computers & Geosciences 29 (2003) 1111–1117 1117

A real time hydrological forecasting system using a fuzzy clustering approach

Documents

Agglomerative Fuzzy Clustering

FUZZY CLUSTERING 2009/2010. 2 What is Data Clustering? Fuzzy C-Means Clustering Subtractive Clustering Data Clustering Using the Clustering GUI

SCALABLE CLUSTERING BY TRUNCATED FUZZY -MEANS …gan/ggpaper/gan2016tfcm.pdf · SCALABLE CLUSTERING BY TRUNCATED FUZZY c-MEANS ... fuzzy c-means, scalable clustering. 247. ... initial

Clustering Based Fuzzy Logic - Final

FUZZY CLUSTERING: APPLICATION ON ORGANIZATIONAL … · FUZZY CLUSTERING: APPLICATION ON ORGANIZATIONAL ... defined by a given matrix or randomly ... Fuzzy Clustering: Application

A Fuzzy Self-Constructing Feature Clustering

Fuzzy Clustering Techniques

Ensemble Based Gustafson Kessel Fuzzy Clustering

Experiments on using fuzzy clustering for fuzzy control system design.pdf

Tutorial On Fuzzy Clustering

Fuzzy Clustering in Context - ReStore Clustering in Context.pdf · Caveat on FS/QCA vs. Fuzzy Cluster • FS/QCA refers to Fuzzy set qualitative comparative analysis (Ragin, Fuzzy

Pages from Advances in Fuzzy Clustering and Its Applications · Advances in Fuzzy Clustering and its Applications EDITED BY ... Advances in Fuzzy Clustering and its Applications

Fuzzy Systems - Fuzzy Clustering 1fuzzy.cs.ovgu.de/wiki/uploads/Lehre.FS1617/fs_ch09_clustering1.pdf · Fuzzy Systems Fuzzy Clustering 1 Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de

Fuzzy Clustering Lecture Babuska

Fuzzy Ants as a Clustering Concept

Fuzzy System Learned Through Fuzzy Clustering and Support ... · Fuzzy System Learned Through Fuzzy Clustering and Support Vector Machine for Human Skin Color Segmentation Chia-Feng

Fuzzy c-Means Clustering Algorithms

Bagged fuzzy clustering for fuzzy data: An application to a tourism marketeprints.bournemouth.ac.uk/23278/1/DPDM_KBS_pre-print... · 2016-03-16 · Bagged fuzzy clustering for fuzzy

SCALABLE CLUSTERING BY TRUNCATED FUZZY -MEANS …

2000 - Extended Fuzzy Clustering Algorithms