Download ppt - Reti Neurali nella previsione di variabili ambientali

Reti Neuralinella previsione di

variabili ambientali

Organizzazione dell’esposizione

• Il problema previsionale

• Soluzione lineare (ARX)

• Intelligenza artificiale e idrologia: approccio con reti neurali

• Pruning

• Risultati (Olona, Tagliamento)

• Previsione del PM10 a Milano

Requisiti di un sistema previsionale

• accuratezza previsionale

• .. anche nel caso in cui non siano disponibili i dati rilevati da tutte le stazioni (robustezza)

• velocità computazionale

• minimo orizzonte temporale utile

Problematiche Idrologiche• Variabilità spaziale:

piogge/ permeabilità

• Non linearità: imbibimento del terreno.

0.00

0.60

0 90Pioggia cumulata 5gg

Rai

nfal

l/R

unof

f• ingressi pluviometrici

concentrati / distribuiti

• modelli lineari/ non lineari

Schema di previsione

• y(t, t-1,..): termini autoregressivi (portate)• u1,u2(t-,t --1,..): termini esogeni (piogge) : tempo di corrivazione piogge portate (ritardo)

PREVISORE y(t+1)

y(t)

y(t-1)

y(t-m)

...

u2(t-)

u2(t- -1)

u2(t- -m)

...

u1(t-)

u1(t- -1)

u1(t- -m)

...

Previsione ricorsiva a K passiyt u1t-

yt-1 u1t- -1

….

yt-L+1 …

PREVISORE1ty

La del passo precedente diventa il primo

termine autoregressivo nella nuova previsione

y

K max = min(1, 2.. M)+1

PREVISORE 2ˆ ty1ˆ ty

u1t-+1

yt u1t-

….

yt-L …

Approccio black-box lineare (ARX)

• pioggia media: un unico ingresso X (perdita di informazione)

• n ingressi esogeni : pluviometri disponibili (es: 2)• stima parametrica MQ

linearitàordini delle parti = ???

21

)2(22)1(111 ***ˆnbnbna

iktiiktiitit ububyay

AR X1 X2

Misure in tempo reale

• Se Xi non è disponibile si ha il blocco del predittore.• Soluzione: previsori d’emergenza

1

3 2

)2(22 **ˆnbna

itiitit ubyay X2

2X1

1)1(11 **ˆ

nbnaitiitit ubyay X1

nbnnb

itnniiti ubub )3()3(13 *.....*3

21

)2(22)1(11 ***ˆnbnbna

iktiiktiitit ububyay AR X1 X2

X3 Xn

Un diverso modello ARX per ogni classe idrologica

ARX con soglie

dominio di portata (mc/sec)

In corrispondenza delle soglie si ha un brutale cambio di modello

Soglia S1S1=????

Soglia S2S2=????

Predittore 1 Predittore 2 Predittore 3

Dagli ARX alle ANN

Richiesta di modellizzazione non lineare

• ARX vs reti neurali

Quale complessità per il predittore neurale?

• Ottimalità parametri• Ottimalità ingressi (robustezza)• pruning

Reti Neurali Artificiali (ANN)

x0

x1

x2

y

xr

...

f

w1,1

wn,r

input strato nascosto(n neuroni)

output

neurone d’uscita

Modelli di neuroni artificiali

xt

xt-1

xt-2

...

w1,1

w1,r

b

1

input

neurone

= f(Wx+b)

xt-

xt --1

...

jkkjj bxwz

somma pesata degli ingressi (cfr. dendriti)

funzione logistica (cfr. assone)

Reti Neurali Artificiali (ANN)

input strato nascosto(n neuroni)

outputneurone d’uscita

x0

x1

x2

y

xr

...

w1,1

wn,r

Apprendimento

• Il numero di parametri può arrivare a diverse centinaia

• Algoritmi di ottimizzazione non lineare per la stima

dei parametri,ad es minimizzando

• overfitting

2)ˆ( ii yy

Ann: problematiche

Diversi training di una medesima struttura

neurale

• Minimi locali

“Early stopping”Regolarizzazione • Sovrataratura

Tentativi ed errori • Architettura

ottimale

Pruning: rimozione di parametri da una rete di partenza completamente connessa

• ottimale ed automatica selezione della struttura, non completamente connessa

Analisi di Salienza

• Sj è misura dell’incertezza di stima

• la rimozione del peso con minima Sj genera il minimo incremento di Etrain

jj

jjs

ˆ

L’idea del pruning• Calibrazione = training + testing• Training (LM) della rete iniziale

sovraparametrizzata• Valutazione errore di test• Stima salienza dei pesi• Eliminazione peso a minore salienza• Retrain della nuova rete

0 2000

0.7

• Test error

• Training error• Num

parametri

E test

E train

Caso Di Studio: Previsione in Tempo Reale Delle Portate Del Fiume Olona

Inquadramento Idrologico

• Area bacino (Castellanza): 190 kmq

• Portata media: 2.5 mc/sec• Portata attesa per T=10 anni: 108

mc/sec • Minimo orizzonte previsionale

utile: 3ore• basse correlazioni piogge/portate • Stazioni di misura: un idrometro,

tre pluviometri• 15 eventi considerati (circa 1100

passi orari)

Struttura dei previsori

• ARX: ordini 2,111 5 par.

• ANN1: 19 par. (6 neur, 3 pluv)

• ANN2: 5 pars. (2 neur, 2 pluv)

Risultati

Taratura Validazione

RMSEARX.016

3 h avanti

ANN2.014

ARX.017

ANN1.013

ANN2.013

Varianza non spiegata

.377 .313 .345 .452 .327 .350

Corr V/P .926 .951 .940 .892 .949 .937

ANN vs ARX:• miglioramento del 10% in training• miglioramento del 20% in validazione• utilizzando una stazione pluviometrica in

meno!!

Caso di studio: Tagliamento

• Area bacino: 2480 kmq• Q media: 90 mc/sec• Picco piena (1966) :

4000 mc/sec• 5 stazioni

pluviometriche• 2000 dati di piena a

passo orario

Letteratura

• Campolo et. Al. Water Resources Research, 1999• Rete neurale completamente connessa (5 pluviometri)• Efficienza 5h avanti: 85%

Risultati di pruning

• Rete non completamente connessa• Utilizzo di 3 soli pluviometri • Efficienza 5h avanti: 84,5 %

Conclusioni (1)

• ANN permettono migliore qualità previsione rispetto ai lineari ARX (efficienza)

• pruning trova in modo analitico una struttura ottimale

• riduzione degli ingressi pluviometrici senza penalizzazione delle prestazioni previsionali

Basin saturation issues The catchment response to rainfall impulses depends strongly on

the saturation state of the basin

An indirect measure at time (t) may be obtained by using the information R(,t), i.e. cumulated rainfall on the time window [t-,t]

The proxy can be noisy (spatial interpolation from local rain measures, differences between saturation and precipitation)

The basin state at time (t) is classified in a fuzzy way. For instance: 1(t) : membership related to saturation class 1 (“dry” class) 2(t) : membership related to class 2 (“wet” class) 1(t) + 2(t) =1 (constraint)

Fuzzyfication of cumulated rainfall R(,t)

Suitable values of are found via hydrological analyses

A set of centroids is identified on R(,t) (C-means fuzzy clustering algorithm)

We fuzzify the basin state at each time step of the dataset

Coupling fuzzy logic and neural networks The rationale: each saturation class results in a different non-linear

rainfall-runoff relationship

The idea: to train a different, specialized neural network on each saturation

class

to issue the forecast by linearly combining the prediction of the different models

the higher the membership related to a given saturation class, the higher the weight of the corresponding predictor on the forecast

Specialized predictors training

We implemented a weighted least squares variant of the LM training algorithm:

To prevent overfitting, we jointly use regularization and early stopping during the training

The optimal architectures are selected via trial and error (20 estimates of each model)

The model showing the lowest wls on the validation set is finally chosen

Dyyt jjj 2ˆ)()(

Issuing the forecast

As in Takagi Sugeno systems, we linearly combine the output of the specialized models:

is the prediction of the j-th specialized model Switching between models is smooth and ruled by the

state of the basin at time (t)

j

jj ktytkty )(ˆ)()(ˆ

jy

Olona: 3-hours ahead prediction performances (testing set)

ModelEfficiency

(R2)

Correlation

RMSEHigh flows error

FFNN .85 .93 .30 .294

Fuzzy

(=2 days).86 .94 .29 .319

Fuzzy

(=5 days).88 .95 .27 .284

The fuzzy framework with =5 days appears the most effective forecasting approach

Simulation sample

CASO DI STUDIO: Tagliamento

Suddivisione dati:

Set di addestramento 1273 istanti

Set di prova 483 istanti

Set di validazione 599 istanti

Calcolo centroidi e membership:

Addestramento delle reti:Funzione wls_trial

Stima dell’altezza idrometrica:Funzione fuzzy_report

Funzione C-Means Clustering

SOFTWARE: nnsyssid20

RISULTATI 1/2Valutazione statistica

Errore quadratico medio:

Indice di correlazione:

Efficienza modello:

N

yy

RMSE

N

iii

1

2)ˆ(

N

ii

N

ii

N

iii

yyyy

yyyy

1

2

1

2

1

)ˆˆ()(

)ˆˆ)((

0

0

F

FFeff

Fo varianza dati reali

F errore quadratico medio

RISULTATI 2/2Previsione con Pcum 5gg (t+5)

Confronto tra le altezze idrometriche previste e quelle registrate sull’intero set di validazione per i dati orari tra il 1978 e il 1996.

CONFRONTO CON LETTERATURAPrevisione con passo di 5h

eff 90,5 %

Corani e Guariso

rmse —

ρ 0,9480

eff 0,8790

Campolo et al.

rmse —

ρ —

eff 0,8500

Conclusions (2) The proposed approach uses specialized models

and couples their output via fuzzy logic, in order to account for the basin saturation state

The framework outperforms the classical FFNN rainfall-runoff approach

The framework complexity does not involve significant computational overload nor additional measurement costs to issue the prediction

Un’altra applicazione:reti neurali per la

previsione del PM10 a Milano

Milan case study Significant reduction of the yearly average of pollutants

such as SO2, NOx, CO, TSP (-90%, -50%, -65%, -60% in the period 1989-2001).

A major concern is constituted by PM10. Its yearly average is stable (about 45g/m3) since the beginning of the monitoring (1998).

The limit value on the daily average (50g/m3) is exceeded about 100 days every year.

The application: prediction at 9 a.m. of the PM10 daily average concentration of the current (and the following) day.

Air pollutants trends on Milan

SO2, NOxand CO: decreasing trends (catalytic converters, improved heating oils)

1988 1990 1992 1994 1996 1998 20000

200

400

Years

Yearly NOx aver. concentrations [g/m3]

1988 1990 1992 1994 1996 1998 20000

50

100

Yearly SOx concentrations [g/m3]

NOx

SO2

1990 1992 1994 1996 1998 20000

20

40

60

Yearly O3 aver. concentrations [g/m3]

1990 1992 1994 1996 1998 20000

2

4

6

Years

Yearly CO mean concentrations [g/m3]

O3

CO

PM10 and O3: increasing from the early 90’s

Prediction methodology: FFNN The input set contains both

pollutants (PM10, NOx, SO2) and meteorological data (pressure, temperature, etc).

The input hourly time series are grouped to daily ones as averages over of given hourly time windows (chosen by means of cross-correlation analysis).

The architecture is selected via trial and error and trained using the LM algorithm and early stopping.

PM10 expectedconcentration

Hidden layer:

logistic

Input set

X1

X2

X3

Xn

PM10 time series analysis

Available dataset: 1999-2002

Winter concentrations are about twice as summer ones, both because of unfavorable dispersion conditions and higher emissions

On average, concentrations are about 25% lower on Sundays than in other days

Deseasonalization approach Yearly and weekly PM10

periodicities are clearly detected also in the frequency domain

The same periodicities are detected also on NOx and SO2

On each pollutant, we fit a periodic regressor R(,t) before training the predictors.

PM10pred (t)= R(,t) + y(t) [y(t) is the actual output of the ANN]

R(,t)= =c+f(1,t)+ f(2,t) where:

f(,t)=k [aksin(k t)+bkcos(k t)] k=1,2

1 =2/365 day-1; 2 =2/7 day-1

Meteorological data are standardized as usual.

Prediction at 9 a.m. for the current day t

Deseasonalized Normalized

average behavior

0.94 0.91

MAE 7.70

(17% )

8.71

(19% )

50 g/m3 threshold detection

CPO 0.81 0.87

CPP 0.91 0.83

FA 0.09 0.17

CPP:

bserved

redictedorrectly

N

N

O

PC

redictedP

PC

N

N redictedorrectly

redicted

larmalse

N

N

P

AF

CPO:

FA:

Satisfactory performances. Deseasonalization allows to increase the average goodness of fit indicators

As a term of comparison, a linear ARX predictor results in = 0.89 and MAE=11g/m3

Prediction for the following day (t+1)

To meet such an ambitious target, we added further meteorological improper (i.e., unknown at 9 a.m. of day t) input variables, such as rainfall, temperature, pressure etc. measured over both day t and t+1

The performances obtained in this way can be considered as an upper bound of what can be achieved by inserting actual meteorological forecasts in the predictor

Pollutant time series have been again deseasonalized via periodic regressor

We tried - besides trial and error - also a different identification approach for neural networks, namely pruning

OBS pruning algorithm training of the initial fully connected architecture;

ranking of parameters on the base of their relevance (saliency);

elimination of the parameter with the lowest saliency;

generation of a new architecture (one parameter less);

re-training of the obtained network;

evaluation of the mean square error over the validation set;

back to step 2, until there are parameters left

Pruned ANNs The network showing the lowest validation error is finally chosen as optimal Pruned ANNs are parsimonious: they contain one order of magnitude less

parameters than fully-connected ones

X1

y

X2

X3

Xn

Pruned network sample0 100 200 300 400 500

0

0.01

0.02

0.03

0.04

0.05

0.06

Network parameters

Validation error

Training error

Selection

ResultsConnected NN Pruned NN

average behavior

0.76 0.76

MAE 13.08

(29% )

12.89

(29% )

50 g/m3 threshold detection

CPO 0.73 0.72

CPP 0.75 0.76

FA 0.25 0.24

Performances of the two models are very close to each other, decreasing strongly with respect to the 1-day case

As a term of comparison, the network trained without improper meteorological information looses just a few percent over the different indicators, showing an almost irrelevant gap

Conclusions (3) Performances on the 1-day prediction appears to be

satisfactory: in this case, the system can be really operated as a support to daily decisions (traffic blocks, alarm diffusion,…).

Deseasonalization of data before training the predictors seems to be helpful in improving the performances.

2-days forecast are disappointing, even if improper meteo data are introduced. Performance differences between pruned and fully connected neural networks are neglegible.

More comprehensive meteorological data (vertical profiles, mixing layer) may be more substantial than training methods in improving the quality of longer term forecasts.

Altre architetture di reti

PROBLEMA: come effettuare l’addestramento?

Possibile soluzione: prevedere un numero massimo di cicli

RETI RICORRENTI: alcune uscite ritornano in ingresso all’iterazione successiva. Utilizzate in diversi campi (http://www.idsia.ch/~juergen/rnn.html).

Utili, ad esempio, per la diagnostica di reti di sensori (un sensore guasto dà risultati diversi da quelli in uscita dalla rete).

RETI AUTOASSOCIATIVE: addestrate a riprodurre gli ingressi, passando per uno strato con pochi neuroni:

• servono per estrarre caratteristiche (come Componenti Principali)

• utili per scoprire i legami (non lineari) tra gli ingressi