199
TKK Dissertations 67 Espoo 2007 Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Civil and Environmental Engineering for public examination and debate in Auditorium R1 at Helsinki University of Technology (Espoo, Finland) on the 18th of May, 2007, at 12 noon. Helsinki University of Technology Department of Civil and Environmental Engineering Water Resources Laboratory Teknillinen korkeakoulu Rakennus- ja ympäristötekniikan osasto Vesitalouden ja vesirakennuksen laboratorio WATER QUALITY PREDICTION FOR RIVER BASIN MANAGEMENT Doctoral Dissertation Olli Malve

WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

TKK Dissertations 67Espoo 2007

Dissertation for the degree of Doctor of Science in Technology to be presented with due permission

of the Department of Civil and Environmental Engineering for public examination and debate in

Auditorium R1 at Helsinki University of Technology (Espoo, Finland) on the 18th of May, 2007, at

12 noon.

Helsinki University of TechnologyDepartment of Civil and Environmental EngineeringWater Resources Laboratory

Teknillinen korkeakouluRakennus- ja ympäristötekniikan osastoVesitalouden ja vesirakennuksen laboratorio

WATER QUALITY PREDICTION FOR RIVER BASIN

MANAGEMENT

Doctoral Dissertation

Olli Malve

Page 2: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Distribution:

Helsinki University of Technology

Department of Civil and Environmental Engineering

Water Resources Laboratory

P.O. Box 5200

FI - 02015 TKK

FINLAND

URL: http://www.water.tkk.fi/wr/index.html

Tel. +358-9-451 3821

Fax +358-9-451 3856

E-mail: [email protected]

© 2007 Olli Malve

ISBN 978-951-22-8749-9

ISBN 978-951-22-8750-5 (PDF)

ISSN 1795-2239

ISSN 1795-4584 (PDF)

URL: http://lib.tkk.fi/Diss/2007/isbn9789512287505/

TKK-DISS-2292

Vammalan kirjapaino Oy

Lempäälä 2007

Page 3: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

ABABSTRACT OF DOCTORAL DISSERTATION HELSINKI UNIVERSITY OF TECHNOLOGY

P. O. BOX 1000, FI-02015 TKKhttp://www.tkk.fi

Author Olli Malve

Name of the dissertation

Manuscript submitted January 17. 2007 Manuscript revised March 23. 2007

Date of the defence May 18. 2007

Article dissertation (summary + original articles)MonographDepartment

LaboratoryField of researchOpponent(s)SupervisorInstructor

Abstract

Keywords river basin management, target pollutant load, Bayesian inference, MCMC, hierarchical model

ISBN (printed) 978-951-22-8749-9

ISBN (pdf) 978-951-22-8750-5

Language English

ISSN (printed) 1795-2239

ISSN (pdf) 1795-4584

Number of pages p. + app. p.

Publisher Water Resources Laboratory

Print distribution Water Resources Laboratory

The dissertation can be read at http://lib.tkk.fi/Diss/2007/isbn9789512287505

Water quality prediction for river basin management

X

Department of Civil and Environmental EngineeringWater Resources LaboratoryWater Resources EngineeringProf. Kenneth Reckhow, Duke University, NC, USAProf. Pertti Vakkilainen

X

Water quality prediction methods are developed which provide realistic estimates of prediction errors and accordinglyincrease the efficiency of river basin management and the implementation of EU’s Water Framework Directive. Theresulting river basin management decisions are based on realistic safety margins for restoration measures andaccompanying targeted pollutant load limits.

The realistic error estimates attached to the predictions are based on Bayesian statistical inference and MCMCmethods which are able to synthesize two distinct water quality prediction approaches i.e. mechanistic and statistical.What is more, a hierarchical modeling strategy is employed in order to pool information from extensive cross-sectionallake monitoring data and consequently to improve the accuracy and precision of lake specific water quality predictions.

Testing of the methods using extensive hydrological and water quality data from five real–world river basinmanagement cases suggests that Bayesian inference and MCMC methods are no more difficult to implement thanclassical statistical methods. Even models with large numbers of correlated parameters can be fitted using moderncomputational methods. Moreover, the hierarchical modeling strategy proves to be efficient for river basinmanagement. Guidelines for adaptive river basin management are also set up based on the experience gained. It isproposed that monitoring, prediction and decision making should be integrated into an efficient managementprocedure.

Page 4: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 5: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

ABVÄITÖSKIRJAN TIIVISTELMÄ TEKNILLINEN KORKEAKOULU

PL 1000, 02015 TKKhttp://www.tkk.fi

Tekijä Olli Malve

Väitöskirjan nimi

Käsikirjoituksen päivämäärä 17. 01. 2007 Korjatun käsikirjoituksen päivämäärä 23. 03. 2007

Väitöstilaisuuden ajankohta 18. 05. 2007

Yhdistelmäväitöskirja (yhteenveto + erillisartikkelit)MonografiaOsastoLaboratorioTutkimusalaVastaväittäjä(t)Työn valvojaTyön ohjaaja

Tiivistelmä

Asiasanat vesistöaluiden hoito, tavoite kuormitus, Bayes päättely, MCMC, hierarkinen malli

ISBN (painettu) 978-951-22-8749-9

ISBN (pdf) 978-951-22-8750-5

Kieli Englanti

ISSN (painettu) 1795-2239

ISSN (pdf) 1795-4584

Sivumäärä s. + app. s.

Julkaisija Vesitalouden ja vesirakennuksen laboratorio

Painetun väitöskirjan jakelu Vesitalouden ja vesirakennuksen laboratorio

Luettavissa verkossa osoitteessa http://lib.tkk.fi/Diss/2007/isbn9789512287505

Vedenlaadun ennustaminen vesistöaluiden hoidon suunnittelussa

X

Rakennus- ja ympäristötekniikan osastoVesitalouden ja vesirakennuksen laboratorioVesitalousprof. Kenneth Reckhow, Duke’n yliopisto, NC, USAprof. Pertti Vakkilainen

X

Tässä työssä kehitetään ja testataan vedenlaadun ennustmenetelmiä, jotka reaslistisen kuvan ennustevirheistä jaauttavat välttämään vesistöalueiden hoitoimien virhemitoituksen ja tehostavat vastaavasti EU’n vesipuitedirektiivintoimeenpanoa.

Laskenta perustuu Bayeslaisen päättelyyn ja MCMC -menetelmään, jotka mahdollistavat mekanistisenvedenlaatumallin ennustevirheen realistisen estimoinnin. Lisäksi sovelletaan hierarkista mallintamisstrategiaajärvikohtaiseen vedenlaadun ennustamiseen laajan suomalaisen järviseuranta-aineiston perusteella. Valittu strategiapienentää ennustevirheitä ja parantaa ennusteiden tarkkuutta.

Menetelmiä testataan Lappajärven, Kymijoen, Tuusulanjärven, Säkylän Pyhäjärven sekä yli kahden tuhannen, Suomenympäristökeskuksen seurantaverkossa olevan järven hoitotoimien tavoitteen asettelussa. Testit osoittavat, ettäBayes-päättelyn ja MCMC-menetelmän laskennallinen toteuttaminen ei ole vaikeampaa kuin klassistentilastomatemaattisten menetelmien toteuttaminen. Jopa suuren määrän korreloituneita parametrejä sisältävävedenlaatumalli saadaan sovitettua havaintoaineistoon. Myös hierarkinen mallintamisstrategia on tehokas välinevesistökohtaisten vedenlaatuennusteiden tekemisessä ja vesistöalueiden hoidon suunnittelussa. Lopussa ehdotetaanvesistöseurannan, vedenlaadun ennustamisen sekä vesistöjen hoidon yhdistämistä tässä työssä kehitettyjenlaskentamenetelmien avulla jatkuvasti tarkentuvaksi, adaptiiviseksi hoitoprosessiksi.

Page 6: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 7: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

7

Preface

The statistical inference methods adopted during the preparation of this thesis

opened up new vistas of endeavour in scientific discovery and learning and revealed

their usefulness for the identification and interpretation of constraints on river basin

management and for the specification of limits for sustainable water and land use.

Professor Pertti Vakkilainen, the supervisor of this work and head of Water Re-

sources Laboratory at Helsinki University of Technology, ana Olli Varis, are warmly

acknowledged for their guidance and encouragement.

The Finnish Environment Institute (SYKE) is thanked for the opportunity to carry

out this study as a part of my work there and to test the methods in real–world river

basin management projects and for access to the institute’s valuable databases. The

supportive and stimulating research environment provided by Juha Kamari, Seppo

Rekolainen and Matti Verta is particularly acknowledged.

The writer’s co-authors and colleagues at SYKE and elsewhere are warmly thanked

for their efforts, knowledge, skills and sharing of data during the progress of the

work: Teija Kirkkala, Mauri Pekkarinen, Olli-Pekka Pietilainen, Simo Salo, Jouko

Sarvala, Matti Verta, Kristiina Vuorio and Jarmo Vaariskoski who were particularly

involved in the acquisition of the chemical and biological water quality data and John

Forsius, Heikki Haario, Timo Huttula, Marko Laine, Kari Lehtinen, Simo Salo and

Song Qian, who were concerned with the computational implementation of water

quality predictions.

The author was privileged to visit the Water Quality Laboratory of the Nicholas

School of the Environment and Earth Sciences at DUKE University in 2004 – 2005.

It was a once in a lifetime opportunity to acquire new viewpoints to statistical

Page 8: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

8

decision making in river basin management. Professor Kenneth Reckhow, George

Arhonditsis, Chi-Feng, Jacqui Franklin, Andrew Gronewold, Melissa Kenney, Con-

rad Lamon and Song Qian are warmly thanked for their friendship and new ideas.

Conrad Lamon is particularly acknowledged for reviewing this summary.

Juhani Kettunen and Elja Arjas, the preliminary examiners of this thesis, are warmly

thanked for their precise and constructive comments, which improved the summary

to a decisive extend, and also Malcolm Hicks, who reviewed the English language of

it.

The work was supported in part by the Centre for International Mobility, the EU

FP5 research project BMW (Benchmark Models for the Water Framework Directive,

Contract EVK1-CT-2001-00093), Maa- ja vesitekniikan tuki ry, the Academy of

Finland’s research project MaDaMe (BIAS sub project; Development of Bayesian

methods with applications in geophysical and environmental research), the EU FP6

research project REBECCA (Relationships between the ecological and chemical

status of surface waters, Contract SSPI-CT-2003-502158) and the Sven Hallinin

Foundation.

Finally, I’m overwhelmed with gratitude to my beloved wife, Anne, our dear daugh-

ters, Sara, Kaisla and Kirsi and my parents, Heimo and Raili Malve for their constant

love and to God who paved the way for this thesis.

Helsinki, April 2007

Olli Malve

Page 9: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

9

Contents

Preface 7

Contents 9

List of Publications 13

Author’s contribution 15

List of Figures 17

List of Tables 21

1 Introduction 23

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.2 Research problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.4 Scope of the research . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.5 Research methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.6 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2 Observational data 33

2.1 Lake Lappajarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2 River Kymi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.3 Lake Tuusulanjarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.4 Lake Pyhajarvi in Sakyla . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5 Finnish lakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.6 Analysis of the case data . . . . . . . . . . . . . . . . . . . . . . . . 43

3 Objectives of river basin management 46

3.1 General objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Objectives in case studies . . . . . . . . . . . . . . . . . . . . . . . 47

Page 10: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

10

3.3 Lake Lappajarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 River Kymi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.5 Lake Tuusulanjarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 Lake Pyhajarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.7 Finnish lakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Evaluation of prediction methods 52

4.1 General objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2 Classification of prediction methods . . . . . . . . . . . . . . . . . . 53

4.3 Bayesian inference using MCMC methods . . . . . . . . . . . . . . 54

4.4 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.5 Analysis of case predictions . . . . . . . . . . . . . . . . . . . . . . 61

4.5.1 Lake Lappajarvi . . . . . . . . . . . . . . . . . . . . . . . . 61

4.5.2 River Kymi . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.5.3 Lake Tuusulanjarvi . . . . . . . . . . . . . . . . . . . . . . . 68

4.5.4 Lake Pyhajarvi in Sakyla . . . . . . . . . . . . . . . . . . . . 75

4.5.5 Finnish lakes . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 Attainment of prediction objectives 96

5.1 Case–specific objectives . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.2 Efficiency in river basin management . . . . . . . . . . . . . . . . . 96

6 Discussion 102

6.1 Significance of the developed prediction methods . . . . . . . . . . . 102

6.2 Benefits and limitations . . . . . . . . . . . . . . . . . . . . . . . . 103

7 Conclusions 105

7.1 Main findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.2 Water quality prediction, monitoring and river basin management . 106

7.3 Continuation of research . . . . . . . . . . . . . . . . . . . . . . . . 107

Bibliography 114

Page 11: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

11

Glossary 115

Summary 121

Yhteenveto 124

Page 12: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

12

Page 13: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

13

List of Publications

This thesis consists of an overview and of the following publications which are re-

ferred to in the text by their Roman numerals.

I Malve, O., Huttula, T. and Lehtinen, K. 1991. Modelling of Eutrophi-

cation and Oxygen Depletion in the Lake Lappajarvi. In: Wrobel, L.,

Brebbia, C.(Eds.), Water Pollution: Modelling, Measuring and Prediction.

Computational Mechanics Publications, pp. 111–124.

II Malve, O., Salo, S., Verta, M. and Forsius, J. 2003. Modelling the transport

of PCDD/F compounds in a contaminated river and possible influence

of restoration dredging on calculated fluxes. Environmental Science and

Technology, Vol. 37(15), pp. 3413–3421. DOI: 10.1021/es0260723

III Malve, O., Laine, M. and Haario, H. 2005. Estimation of winter respiration

rates and prediction of oxygen regime in a lake using Bayesian inference.

Ecological Modelling, Vol. 182:2, pp. 183–197. DOI:10.1016/j.ecolmodel.2004.07.020

IV Malve, O., Laine, M., Haario, H., Kirkkala, T. and Sarvala, J. 2006.

Bayesian modelling of algae mass occurrences – using adaptive MCMC

methods with a lake water quality model. In press, Environmental Mod-

elling and Software, DOI:10.1016/j.envsoft.2006.06.016.

V Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-

tionships in Finnish Lakes. Environmental Science & Technology 40 (24),

pp. 7848–7853. DOI: 10.1021/es061359b.

Page 14: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

14

Page 15: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

15

Author’s contribution

The individual articles are reprinted with the permission of the respective copyright

holders as follows: paper I with the kind permission of Computational Mechanics

Publications, papers II and V with the kind permission of ACS Publications, papers

III and IV with the kind permission of Elsevier Science.

Table 1: Account of author contribution. Names are in alphabetic order. Ab-breviations of authors: FJ (Forsius, J.), HH (Haario, H.), HT (Huttula, T.), KT(Kirkkala, T.), LM (Laine, M.), LK (Lehtinen, K.), MO (Malve, O.), SS (Salo, S.),SJ (Sarvala, J.), VM (Verta, M.) and QS (Qian, S.)

Contribution Paper

I II III IV V

1. Data and mass balances HT LK

MO

MO SS

VM

MO KT LM

MO SJ

MO

2. Currents and sediments HT LK

MO

MO SS

VM

MO - -

3. Computational imple-

mentation

HT LK

MO

FJ MO

SS

HH LM

MO

HH LM

MO SJ

MO QS

4. Fitting and validation HT LK

MO

MO SS HH LM

MO

LM MO MO QS

5. Water quality prediction HT LK

MO

MO SS HH LM

MO

LM MO MO QS

6. Interpretation of results HT LK

MO

MO SS

VM

HH LM

MO

HH LM

MO SJ

MO QS

Page 16: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

16

Page 17: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

17

List of Figures

1.1 Research problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.2 General objectives of river basin management and the methods de-

veloped in this study . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.1 Map of Lake Lappajarvi . . . . . . . . . . . . . . . . . . . . . . . . 35

2.2 Map of the River Kymi . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.3 Map of Lake Tuusulanjarvi . . . . . . . . . . . . . . . . . . . . . . . 38

2.4 Map of Lake Pyhajarvi . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1 General objectives of water quality management in river basin planning 46

3.2 Inference of target pollutant loading using predicted water quality

response and selected water quality standard . . . . . . . . . . . . . 47

3.3 Management objectives, actions and water quality standard for Lake

Lappajarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4 Management objectives, actions and water quality standards for the

River Kymi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 Management objectives, actions and water quality standard for Lake

Tuusulanjarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Management objectives, actions and water quality standards for Lake

Pyhajarvi in Sakyla . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.7 Management objectives, actions and water quality standards in the

Finnish lakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.1 Classifiers of prediction methods . . . . . . . . . . . . . . . . . . . . 54

4.2 Elements of Bayesian posterior predictive inference in target pollutant

load estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Decision variables, prediction methods and predictions for water qual-

ity management in Lake Lappajarvi . . . . . . . . . . . . . . . . . . 61

4.4 Observed and simulated chlorophyll ain Lake Lappajarvi . . . . . . 63

Page 18: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

18

4.5 Calculated chlorophyll aconcentration in Lake Lappajarvi with four

loading scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.6 Decision variables, prediction methods and predictions for water qual-

ity management in the River Kymi . . . . . . . . . . . . . . . . . . 65

4.7 Verification of 1-D transport model for the River Kymi . . . . . . . 66

4.8 Calculated PCDD/F concentration in the suspended solids in the

River Kymi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.9 Decision variables, prediction methods and predictions for water qual-

ity management in Lake Tuusulanjarvi . . . . . . . . . . . . . . . . 69

4.10 Observed and simulated oxygen concentrations in Lake Tuusulanjarvi 71

4.11 Estimated respiration in Lake Tuusulanjarvi . . . . . . . . . . . . . 72

4.12 Predicting dissolved oxygen during new winter in Lake Tuusulanjarvi 73

4.13 Predicting dissolved oxygen concentration in Lake Tuusulanjarvi . . 74

4.14 Decision variables, prediction methods and predictions for water qual-

ity management in Lake Pyhajarvi in Sakyla . . . . . . . . . . . . . 76

4.15 Observed and modeled algae concentrations in Lake Pyhajarvi . . . 78

4.16 Validation of the phytoplankton model . . . . . . . . . . . . . . . . 79

4.17 Observed control variable values in Lake Pyhajarvi for calibration

(1992-1999) and validation (2000-2004) of the phytoplankton model 80

4.18 Calibration (1992-1999) and validation (2000-2004) of two optional

Cyanobacteria models . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.19 Probability response surface of the summer mean Cyanobacteria in

Lake Pyhajarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.20 Control variable limits on exceeding a summer mean Cyanobacteria

concentration of 0.86 mg l−1 in Lake Pyhajarvi with 0.05 probability 82

4.21 Observed and fitted nutrient concentrations and loads in Lake Py-

hajarvi in 1980-2001 . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Page 19: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

19

4.22 Estimated total phosphorus and summer maximum Cyanobacteria

biomass percentiles (10% – 90%) as a function of total phosphorus

load and summer maximum grazing zooplankton biomass in Lake

Pyhajarvi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.23 Impact diagram for management decisions in Lake Pyhajarvi. . . . 86

4.24 Decision variables, prediction methods and predictions for water qual-

ity management in Finnish lakes . . . . . . . . . . . . . . . . . . . . 89

4.25 Fit plot of hierarchical and non-hierarchical models of Finnish lakes 92

4.26 Probabilistic chlorophyll aresponse surface . . . . . . . . . . . . . . 94

4.27 Predicted Chlorophyll ain Lake Paijanne as a function of total phos-

phorus and total nitrogen . . . . . . . . . . . . . . . . . . . . . . . 95

Page 20: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

20

Page 21: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

21

List of Tables

1 Account of author contribution . . . . . . . . . . . . . . . . . . . . 15

1.1 Contribution of the papers to water quality prediction and river basin

management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.1 Hydrology and morphology of Lake Tuusulanjarvi . . . . . . . . . . 39

2.2 Characteristics of the catchment of Lake Pyhajarvi . . . . . . . . . 39

2.3 Characteristics of Lake Pyhajarvi . . . . . . . . . . . . . . . . . . . 41

2.4 Preliminary geomorphological typology of Finnish lakes . . . . . . . 42

2.5 Number of observations within the lake types of Finnish lakes . . . 43

2.6 Analysis of case study data . . . . . . . . . . . . . . . . . . . . . . 44

4.1 Criteria for predictions in river basin planning . . . . . . . . . . . . 52

5.1 Attainment of case-specific prediction objectives . . . . . . . . . . . 97

5.2 General efficiency of predictions . . . . . . . . . . . . . . . . . . . . 101

Page 22: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

22

Page 23: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

23

1 Introduction

1.1 Background

The availability of clean water and the good ecological status of surface waters have

been endangered by increasing loads of nutrients and chemicals. In 2000-2003 water

quality was satisfactory or worse in 20 – 27 per cent of the Finnish lake, coastal

and sea area and in twice that proportion of the river area (50 %). To improve and

protect water quality in watercourses, the Finnish government passed the Water

Act on 19 May 1961 and legislation for the assessment of environmental impacts on

10 June 1994. The new set of national Water Protection Policy Outlines extending

to 2015 was approved on 23 November 2006, under which diffuse nutrient loading

from agriculture should be reduced by a one third by 2015 and loading from fish

farming and waste water treatment plants must be further reduced. In particular,

nitrogen removal from municipal waste water must be improved to 70 % in densely

populated areas with more than 10,000 inhabitants.

The general goals of the Water Framework Directive, introduced by the European

Union on 22 December 2000, are to achieve a ”good status” in all water bodies by

2015 and to protect the aquatic ecology, unique valuable habitats, drinking water

resources and bathing water with reasonable costs. Planning and implementation

of water management will be organized on a river basin basis in order to ensure

that local factors and the need for water protection measures are taken into account

efficiently. Ecological and chemical protection is required everywhere, but other

forms of protection will apply only within specific zones.

An act on the organization of river basin management planning was adopted in

Finland in 2004, and the drafting of plans was started by the regional working

groups. The plans will be complete by 2009 and will be updated every six years.

Page 24: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

24

For these purposes, Finland has been divided into five river basin districts, two

international river basin districts (the Tornio River and the Teno – Naatamo –

Paatsjoki district) and to a separate river district covering the autonomous province

of the Aland Islands.

The targeting of the required pollutant load reductions and the finding of technical

solutions for their implementation are the challenging key ingredients of the river

basin planning, and all our existing science, technology, mathematics and practical

experience in this field will be needed to achieve compliance with the water quality

standards with regard to chemical substances and ecological status. Hydrological

and biogeochemical cycling, in particular, and the resource conditions for the as-

sembly of the plankton community must be considered comprehensively. Until quite

recently the theoretical foundation for ecology was empirical rather than theoret-

ical, ranging from deterministic to stochastic approaches, and hence there is no

equivalent comprehensive biological foundation analogous to Newtonian mechanics

or hydrodynamics that can be employed for the control of eutrophication and pollu-

tion in lakes and rivers. In addition, the determination, calibration and validation of

prediction models is hampered by the overwhelming number of factors affecting the

composition and activity of plankton assemblages and by the limited experimental

and observational resources available. Hence the translation of scientific theories,

specific observations on river basin and mathematical approaches into forms which

are useful for river basin planning is difficult.

Prediction models are nevertheless considered useful for river basin management

and are used to predict the behaviour of water quality with respect to changes in

pollutant loads and hydrological conditions. They are therefore used to evaluate

target pollutant loads and management actions which will achieve compliance with

water quality standards. The target pollutant loads are then used to set up reg-

ulatory rules and to plan waste water treatment plants, agricultural practices and

general land use.

Page 25: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

25

Simple empirical water quality models are based on statistical methods, which makes

quantitative learning and prediction efficient (Manly, 2001; Berthouex and Brown,

2002). Early attempts were made in the 1970’sto estimate statistical relationships in

data from a large sample of lakes (Vollenweider, 1976; Vollenweider and Kerkes, 1980;

Reckhow and Chapra, 1983), and more complex mechanistic models (Jørgensen,

1980; Chapra and Reckhow, 1983; Orlob, 1983; Chapra, 1997) were structured in

the 1970’s according to the causal understanding and mathematical descriptions

of processes prevailing at that time, sometimes accompanied by least-squares pa-

rameter estimates, approximate first order error analysis, Monte Carlo analysis or

Kalman filtering (Scavia, 1980). The error term in a model was usually neglected

in the context of prediction (NRC, 2001). The lack of proper error estimates was

compensated for by a comprehensive mathematical description of the process. Thus,

the development of mechanistic models for water quality and hydrodynamics were

seen to be interrelated (Streeter, 1958; Chow, 1959; Graf, 1971; Cunge et al., 1980;

Dyer, 1986; van Rijn, 1989). Water quality management in Finland has often been

supported by a combination of empirical and mechanistic models (Kinnunen et al.,

1982; Frisk, 1989; Sarkkula, 1991; Varis, 1991; Kettunen, 1993; Nyroos, 1994; Hut-

tula, 1994; Kokkonen, 1997; Rankinen, 2006).

1.2 Research problem

The EU Water Framework Directive urges member states to quantify numerically the

present and near-future maximum loads (i.e. target pollutant loads) to be permitted

for pollutants, from point and non-point sources and from background sources, so

that they will meet water quality standards with an adequate margin of safety

(MOS). Due to the probabilistic and random nature of water quality parameters,

a small MOS might result in non-attainment of the water quality goal, while a

large MOS can be inefficient and costly (NRC, 2001). Therefore the MOS should

account for the errors in the data and the model. Ideally, MOS represents the

Page 26: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

26

joint probability of possible errors in the estimated target load, load estimates and

transcendence of the water quality standards (NRC, 2001). The problem is how

to estimate model parameters and error variances of predictions realistically and

determine how errors in these and in the inputs propagate through the model and

result in error in the estimated target pollutant load.

Another problem is that approximate error estimation methods involving complex

mechanistic water quality models and small-sized water quality samples are likely to

result in unrealistic (Ascher and Overholt, 1983; NRC, 2001) and overly optimistic

error estimates (Omlin and Reichert, 1999). This in turn will bias the MOS of

the target pollutant loads and reduce the efficiency of river basin management.

(Figure 1.1).

Figure 1.1: Outline of the problems involved in water quality prediction and theirimplications for river basin planning and management.

The coding, debugging, fitting and validation of complex mechanistic water quality

Page 27: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

27

models is difficult due to the long simulation time and the large number of unknown

parameters, making water quality prediction and river basin management less effi-

cient and more costly than is necessary for effective decision making (NRC, 2001).

Moreover, twenty per cent of the Finnish lake area is in a satisfactory or worse

condition, which means that the number of lakes requiring pollutant load control

amounts to hundreds and the updating of river basin plans every six years using

complex water quality models will not be efficient or even feasible.

1.3 Objectives

The general objective of this thesis were to make the updating of water quality pre-

dictions, the accompanying error estimates and river basin plans every six years as

efficiently and realistically as possible (Figure 1.2). It was aimed at using Bayesian

inference, Markov chain Monte Carlo methods (MCMC), hierarchical models (HM)

and model simplification. Bayesian inference and MCMC methods were to be used

for synthesizing mechanistic modelling and statistical inference and facilitating re-

alistic error estimation and the efficient updating of predictions in the light of the

continuously accumulating monitoring data. A hierarchical modelling strategy (Gel-

man and Hill, 2006) was used to improve the accuracy and precision of lake specific

the predictions.

The practical objectives were computational implementation of the methods, the

derivation of relevant water quality data, application of these methods to real-world

river basin management cases and the setting up of guidelines for applications to

river basin management.

Page 28: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

28

Figure 1.2: General objectives of river basin management and the methods devel-oped in this study.

1.4 Scope of the research

A complex mechanistic lake and a river model was first developed to facilitate the

prediction of water quality in connection with river basin management. A full sta-

tistical error analysis of sub water quality models was then accomplished using

Bayesian inference and MCMC methods. The error analysis of some extremely

complicated models was postponed because the converge of MCMC sampling algo-

rithms is slow if a model includes a large number of correlated parameters. Instead,

the moderately simple lake respiration and phytoplankton sub models were analysed

initially. This clearly revealed the advantages and limitations of Bayesian inference

and MCMC methods and motivated the use of adaptive sampling algorithms and

simple linear models. Prior distributions, if informative, were obtained from the

scientific literature or from experimental and observational data. Expert elicitation

techniques were not used here.

Page 29: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

29

To make the water quality prediction and management in large river basins with

small observational sample sizes more tractable, hierarchical models were also ap-

plied. These were based on causal relationships among a small number of descriptors.

The mechanistic water quality models constructed for Lake Lappajarvi and the River

Kymi linked river basin management measures directly to water quality responses,

whereas the water quality sub models which were fitted using Bayesian inference

and MCMC methods were limited in this sense. Later on, the model of a entire

water body were fitted using MCMC methods.

1.5 Research methods

The mechanistic lake models used here described vertical mixing, temperature strati-

fication, respiration, sedimentation, leaching of nutrients and phytoplankton growth.

The river model calculated the longitudinal dispersion of suspended solids and con-

taminated sediments. The models were formulated with partial and ordinary differ-

ential equations and integrated by numerical methods.

A hierarchical linear regression model (HLRM) (Gelman and Hill, 2006) was used

to predict chlorophyll a in Finnish lakes. Hierarchical linear modelling (HLM), also

known as multi-level analysis, is a more advanced form of multiple linear regression.

Multilevel analysis allows the variance in outcome variables to be analysed at multi-

ple hierarchical levels, whereas in multiple linear regression all effects are modelled

as occurring at a single level. Thus HLM is appropriate for use with nested data.

In river basin management, data from lakes can be nested within lake types and

ecoregions.

The errors in the mechanistic and hierarchical models (f in equations 1.1) are related

to errors in the measurement of the x variables (φ2), in the model (σ2) and in the

Page 30: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

30

model parameters (σ2θ) (Box and Tiao, 1973; Clark, 2006). These were estimated

using Bayesian inference and MCMC methods, which facilitate statistical learning

and the updating of water quality predictions and river basin plans.

yj = f(xj; θ) + εj

θ ∼ p(µθ, σ2θ)

′error in parameters′

εj ∼ N(0, σ2) ′model error′

x(obs)j ∼ p(xj, φ

2) ′error in x′

(1.1)

1.6 Contribution

Bayesian inference methods and Markov chain Monte Carlo (MCMC) methods were

used here to change the paradigm of water quality prediction and river basin man-

agement decision making from deterministic to statistical. Mechanistic river and

lake models for the evaluation of target phosphorus loading and restoration dredg-

ing were developed and applied in papers I and II (Table 1.1), and the best features

of the mechanistic and statistical prediction methods, i.e. the deterministic simu-

lation and the full statistical error analysis, were synthesized in papers III and IV.

This enabled the mechanistic water quality predictions to be better accommodated

into river basin management. The slow convergence of the MCMC chains in the case

of marked parameter correlation was speeded up by means of adaptive Metropolis

Hastings methods. The accuracy and precision of the lake-specific chlorophyll a pre-

dictions based on extensive cross–sectional monitoring data of Finnish lakes were

enhanced using a hierarchical linear regression model.

The main results of the five original papers listed at the beginning of this publication

will be summarized below. The papers are referred to in the text by their Roman

numerals. First, the case data and the objectives of river basin management cases

Page 31: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

31

will be analysed, and then the selected water quality prediction methods and their

capabilities for meeting the objectives of river basin management will be evaluated.

Finally, guidelines for water quality prediction in adaptive river basin management

will be proposed.

Page 32: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

32

Table 1.1: Contribution of the papers to water quality prediction and river basinmanagement.

Mechanistic modeling —–> Bayesian inference

Paper I:

-Probe lake model : prediction of verti-

cal convection and diffusion of heat, dy-

namics of dissolved oxygen, total phos-

phorus and chlorophyll a

-assessment of target phosphorus load

Paper II:

-One dimensional sediment model : pre-

diction of longitudinal transport of con-

taminated sediments

-setting up of a criterion for restoration

dredging

Paper III:

-Respiration model : prediction of dis-

solved oxygen regime in a lake

-MCMC method : Unbiased error esti-

mates, pooling of cross-sectional infor-

mation

-design and real time control of oxy-

genation devices

Paper IV:

-Lake phytoplankton model : -MCMC method : same as in Paper III

prediction of algal blooms -target nutrient and zooplankton

biomass concentrations

Paper V:

-MCMC method : same as in Paper III

-Hierarchical linear chlorophyll a

model : nutrient to chlorophyll a

relationship, enhanced pooling of

cross-sectional information

-target nutrient concentrations

Page 33: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

33

2 Observational data

The observational data used here originated from three intensively studied Finnish

lakes (I, III and IV), a river (I) and 2289 sparsely monitored Finnish lakes (V).

Lake Lappajarvi (I), Lake Tuusulanjarvi (III) and Lake Pyhajarvi in Sakyla (IV)

are locally important for fishing and recreation, but their use is hindered by eu-

trophication, which impairs also their ecological status. This led the Finnish Envi-

ronment Institute (SYKE), the regional environment centres, the universities, local

authorities, private enterprises and water protection associations to contribute to

the sampling and management of these lakes. The Southeast Finland Environment

Centre and SYKE had sampled the sediments and water of the River Kymi and

planned restoration dredging of contaminated sediments. The water and sediment

samples representing the lakes and the river had not been randomized, except for

the zooplankton sample from Lake Pyhajarvi, which was randomized according to a

stratified design. In general, the samples were concentrated spatially in the middle

of the lake or of the river cross-section, and the water samples from the River Kymi

were from points both upstream and downstream of the area of main interest. The

sampling time was confined to the open water period, except for the sampling of

dissolved oxygen in Lake Tuusulanjarvi, which took place when the lake was covered

by ice.

2.1 Lake Lappajarvi

Lake Lappajarvi is a shallow lake in the western part of Finland (Figure 2.1) that

is agriculturally loaded, mesotrophic and has occasional algal blooms. The bottom

sediment at the two main depths (1 km2) becomes anoxic during the summer and

winter stratification periods and 5 mg m−2d−1 phosphorus is released into the water

body. The theoretical retention time is 2.8 years. Phosphorus is the main limiting

Page 34: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

34

nutrient for phytoplankton growth. Loading in the lake is 0.38 gP m−2a−1 and its

sedimentation coefficient R is about 0.8. The mean phosphorus concentration in the

lake is 23.8 µg l−1 and the mean fresh biomass concentration of planktonic algae is

2.7 mg l−1.

The lake water level, outflow, vertical temperature profile, currents, ice cover and

snow cover were observed in the years 1987 - 1989. Daily meteorological data were

collected at Kauhava Airport 30 km west of the lake.

Water quality in five inflows and the flow between the two sub-basins of the lake

were investigated intensively (2-12 times a month) in May 1. 1988 - April 30. 1989,

together with sedimentation rate experiments and flow measurements (Figure 2.1).

2.2 River Kymi

The River Kymi is the fourth largest river in Finland. It has been polluted by

effluents from pulp mills and the chemicals industry and through some tributaries

and diffuse non-point sources. Loading has been reduced considerably, but the

remains of past emissions still exist in the river sediments. The area studied here

is a 130-km stretch of the river with branches between Lake Pyhajarvi (in Jaala)

and the Gulf of Finland (Figure 2.2). There are 11 power plants and 6 stretches of

rapids on this reach of the river. The upper part of the river strech is 50 m above

sea level and the mean slope of the river bed is small (0.0006). The drainage area

of the River Kymi is 37 200 km2 (lake percentage 18%), with only 3% (1 100 km2)

running directly into the stretch of the river. Thus 97 % of the water in the river

at this stage comes from upstream sources. The mean discharge at the downstream

end of the river is 330 m3 s−1.

The river bed in this area consists mainly of transport and erosion sites, which con-

Page 35: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

35

N

0 5 km

Figure 2.1: Map of Lake Lappajarvi. Observation points= Water samples, =Current meter, = Thermistor chain.

tain non-cohesive soil or solid clay and silt. At wider points in the river there are

sedimentation pools, which are the main traps for PCDD/F compounds. Contam-

inated organic particulate materials accumulated earlier in the main sedimentation

pool at Kuusankoski, and this sediment is nowadays decomposing slowly, eroding

and migrating downstream. The transported sediment with highest settling veloc-

ity has accumulated in the downstream sedimentation pools, whereas the smaller

particles have migrated to the estuarine and the marine area. Due to hydrolog-

ical regulation at the power plants, the sediments have not been exposed to high

floods and discharges have not increased. Construction projects and changes in river

regulation imply a risk of the mobilization of PCDD/F compounds in the future.

Page 36: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

36

0 10 20 km

Sweden

Finland

BalticSeaKUUSANKOSKI

KOTKA

Pyhäjärvi

Keltti

Koria

Myllykoski

Lake Tammijärvi

Ahvenkoski

Ahvenkoskenlahti

Gulf of Finland

Sediment sample

Water sample

Sediment trap

Industrial plantHuruksela

Kokonkoski

Anjalankoski

Muhjärvi

Korkeakoski

Koskenalusjärvi

Contaminated mud sediment

0 1 km

Flow direction and velocity (vertically averaged)

Isopleth of flow velocity 0.2 m s-1

Sediment trap

Sediment sample

Ky-5 plant

0 100 200 m

100 cm s-1

Kuusankoski

Keltti

Figure 2.2: Map of the River Kymi.

The hydrological and hydraulic data (water level and discharge) needed to perform

the sediment transport calculations were observed at least daily or even more fre-

quently at the power plants and stretches of rapids. Suspended solids concentrations

upstream and along the relevant stretch of the river were observed frequently enough

for calibration. The observations of direct runoff and corresponding concentration

of suspended solids do not cover the whole catchment, however. Monthly values for

suspended solids from industrial effluent point loading were collected from sewage

treatment plants, and non-point loading was estimated from the continuous runoff

data and weekly water quality samples from two small representative catchments

Page 37: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

37

(30 and 178 km2) in the drainage area (1 100 km2)). The transported sediment

was sampled with sediment traps at six locations and PCDD/F concentrations in

the sediment were analysed. No direct measurements of historical PCDD/F loading

from Kuusankoski are available, but the amount of historical PCDD/F loading from

the Ky-5 plant and from eroding sediments in Kuusankoski and its variation were

estimated from a bottom sediment sample originating from the bay of Ahvenkosken-

lahti at the mouth of the river (Figures 2.2).

2.3 Lake Tuusulanjarvi

Lake Tuusulanjarvi is a shallow, hypereutrophic lake located just north of Helsinki

in southern Finland, lat. 60◦ 26’ long. 25◦ 03’ (Fig. 2.3). Having previously been

mesotrophic, it became hypereutrophic in the 1960s due to sewage discharge. The

winter dissolved oxygen regime was in a critical condition in the early 1970s, but

improved slightly in 1973, when winter aeration was introduced. The situation was

further improved by reductions in nutrient loading. Sewage discharge was diverted in

1979 and summer aeration started in 1980. The hypereutrophic condition remained,

however, and blooms of blue-green algae have occurred every summer since the

loading reduction (50% in phosphorus loading) in 1979. The phosphorus load from

agriculture (4500 kg a−1 = 0.75 g m−2 a−1) still exceeds the lake’s tolerance level,

which is why a reduction in the phosphorus content of the water body by intervening

in both external and internal phosphorus loading has been required.

The lake water was sampled at two-metre vertical intervals at the deepest point

in the lake (max. depth 10 m by the Uusimaa Regional Environment Centre and

the local water protection board (Keski-Uudenmaan vesiensuojelun kuntayhtyma)

during the period 1968–2003. Samples were collected 2–7 times each winter for the

analysis of dissolved oxygen concentration and temperature by standard methods.

Vertical averages and standard deviations were calculated. Dissolved oxygen con-

Page 38: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

38

Figure 2.3: Map of Lake Tuusulanjarvi.

centrations were also measured in situ at nine stations in March 2001 to determine

the area of aerator impact.

Winter net oxygen consumption in the lake in the early 1970s was estimated to be

200 000 kg on average. The flux of the pumped dissolved oxygen as estimated by

the aerator consultants (100 tn on average) shows a high yearly variation due to

technical problems and fluctuations in the duration of the ice-cover. This leaves

a significant uncertainty concerning the estimated dissolved oxygen fluxes, which

affects the lake respiration estimates. The value describing the prior distribution

was calculated from information available in technical reports.

Page 39: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

39

Table 2.1: Hydrology and morphology of Lake Tuusulanjarvi (Anonymous, 1984).

Surface area 6.0 km2

Volume 19 ∗ 106 m3

Maximum depth 10 m

Average depth 3.2 m

Length (max) 7.5 km

Theoretical water residence time 250 d

Area of drainage basin 92 km2

Percentage of lakes in the drainage basin 8.4 %

Table 2.2: Characteristics of the catchment of Lake Pyhajarvi.

Total area (inclusive of lake’s surface) 615 km2

River Ylaneenjoki 234 km2

River Pyhajoki 77.5 km2

Remaining area (small sub-basins) 149.5 km2

2.4 Lake Pyhajarvi in Sakyla

Lake Pyhajarvi is a shallow, mesotrophic, agriculturally loaded lake (Fig. 2.4) in

which algal blooms increased in the early 1990’s. All the major cyanobacterial

blooms in 1992–1999 were dominated by Anabaena flos-aquae (Lyngb.) Breb.,

while Anabaena planctonica Brunnt., Anabaena curva Hill, Cyanodictyon reticula-

tum (Lemm.) Geitl., and Aphanothece clathrata W. & G.S. West became dominant

in 1999.

Monitoring of the water chemistry and hydrology of Lake Pyhajarvi started in the

Page 40: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

40

Figure 2.4: Map of Lake Pyhajarvi.

1960s, and intensified monitoring of nutrient concentrations was started by the Wa-

ter Protection Association of SW Finland in 1980 and continued from 1993 onwards

by the Southwest Finland Regional Environment Centre. Vertical profiles were

taken at the deepest point in the lake 6–8 times during the open water period in

1980–1991 and at two-week intervals in recent years. Due to the openness and shal-

lowness of the lake, there is no extended stratification during the summer. Nutrient

and plankton concentrations are vertically and horizontally homogeneous most of

the time (Sarvala and Jumppanen, 1988). Phytoplankton was sampled together

with nutrients and counted at the Department of Biology, University of Turku (Sar-

vala et al., 2000). Zooplankton was sampled at approximately weekly intervals from

the surface to the bottom at ten locations selected with a stratified random design

Page 41: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

41

Table 2.3: Characteristics of Lake Pyhajarvi.

Surface area 155 km2

Volume 849 million m3

Mean depth 5.4 m

Maximum depth 26 m

Coastline 110 km

Water residence time 3–5 year

(Sarvala et al., 2000). Crustacean zooplankton was enumerated at the Department

of Biology, University of Turku. Eight years of observations collected between 1992

to 2000 were used for this study. Our data set contains the biomass concentra-

tions of Diatomophyceae, Chrysophyceae, nitrogen-fixing Cyanobacteria and minor

groups of phytoplankton summed together, total phosphorus concentration (TP),

total nitrogen concentration (TN), water temperature (T ), global irradiance (I), the

biomass concentrations of grazing zooplankton (Z) and outflow rates (Q).

Sarvala et al. (1998) have shown that year-to-year variations in chlorophyll a and

phosphorus concentrations in Lake Pyhajarvi are associated with changes in the

total biomass of planktivorous fish, good fish stocks being accompanied by depressed

zooplankton biomass and high chlorophyll a levels. One-third of the total variation

in chlorophyll a is attributed to changes in zooplankton biomass and another third

to the changes in phosphorus concentrations.

2.5 Finnish lakes

National water quality monitoring in Finnish lakes started in 1965, after the passing

of the Water Act in 1962, when information was required on the status, quality and

Page 42: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

42

quantity of Finnish water resources, and how their status relates to and responds to

pressures on the environment. The sampling strategy and analytical methods have

been described by Niemi et al. (2001). A geomorphological typology of Finnish lakes

is under construction to aid in the classification of their ecological status. According

to a preliminary topology they may be divided into the nine types according to their

surface area, depth and water colour (Table 2.4).

Table 2.4: Preliminary geomorphological typology of Finnish lakes as specified bythe Finnish Environment Institute (SA=Surface Area, D=Depth).

Lake Type Name Characteristics

I Large, non-humic lakes SA > 4,000 Ha, color < 30

II Large, humic lakes SA > 4,000 Ha, color > 30

III Medium and small, non-humic lakes SA: 50 - 4,000 Ha, color < 30

IV Medium, humic deep lakes SA: 500 - 4,000 Ha, color: 30–90, D > 3 m

V Small, humic, deep lakes SA: 50 - 500 Ha, color: 30–90, D > 3 m

VI Deep, highly humic lakes Color > 90, D > 3 m

VII Shallow, non-humic lakes Color < 30, D < 3

VIII Shallow, humic lakes Color: 30-90, D < 3 m

IX Shallow, highly humic lakes Color > 90, D < 3 m

19,248 observations of total phosphorus, total nitrogen and Chlorophyll a (Chla) in

2,289 Finnish lakes in July and August from 1988 to 2004 were used in this study.

About 42% of the observations were from July and 58% from August. On the other

hand, observations were unevenly distributed between the years, lake types (Table

2.5) and individual lakes. 900 lakes out of the 2,289 lakes were represented by only

one observation, but the average number of observations was eight (s.d. 26) per

lake. One lake had 441 observations, and there were 12 lakes that had more than

150.

Page 43: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

43

Table 2.5: Number of observations (N) within the lake types.

Type N Type N Type N

1 485 4 3,949 7 391

2 6,536 5 1,080 8 2,729

3 388 6 1,326 9 2,544

2.6 Analysis of the case data

The observational data were used to establish a basis for river basin water quality

prediction and management. Ideally, a water body should be sampled according to

statistical design methods in order to minimize the error variances in the model,

but in the present case the data were collected according to intuitively selected rules

and the model for prediction and decision making was selected later.

Certain important features of the data (Table 2.6) were analysed retrospectively

(Table 2.6) to reveal the adequacy of the data set for water quality prediction and

river basin management. One of the most important features in this respect was

the sample size because a small sample size may reduce the precision of a predic-

tion, and thus the overall efficiency of river basin management (Figure 1.1). The

case studies involved extensive sample sizes. The predictions were also affected,

however, by the orientation of the sampling design. The data for the case stud-

ies were mainly longitudinal, except for the monitoring data on the Finnish lakes,

which were abundant in a cross-sectional direction, i.e. covering numerous lakes.

On the other hand, the majority of the lakes were observed only a few times, so

that the lake specific samples were small and unbalanced, reducing the precision of

the lake-specific predictions. A small sediment respiration experiment conducted in

Lake Tuusulanjarvi and the parameter ranges obtained from the scientific literature

for the River Kymi, Tuusulanjarvi and Pyhajarvi models can be regarded as small

Page 44: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

44

extensions in a cross-sectional direction.

Table 2.6: Analysis of case study data.

Classifier Lappajarvi Kymijoki TuusulanjarviPyhajarvi Finnish

lakes

Sample size Extensive Extensive Extensive Extensive Extensive

Orientation

of sampling

design

Longitudinal Longitudinal Longitudinal Longitudinal Cross-

sectional

Hierarchical

structure

Single level Single level Single level Single level Hierarchical

Scientific disci-

pline

Hydrology

Chemistry

Biology

Hydrology

Chemistry

Biology

Hydrology

Chemistry

Biology

Hydrology

Chemistry

Biology

Chemistry

Sampling

design

Intuitive Intuitive Intuitive Intuitive Intuitive

Treatment

method

ObservationalObservationalObservationalObservationalObservational

If data are hierarchically structured, i.e. they include multilevel or nested clusters

within which correlations occur, cross-sectional information can be pooled to make

longitudinal predictions more precise by means of hierarchical or multilevel models.

The Finnish lake data were hierarchically structured (with the levels: all lakes, type

of lake, lake), and this feature was utilized in modeling phase. The rest of the data

did not have a hierarchical structure.

The scientific discipline of data handling may involve notable variation in sample

size, in that there is a common tendency for the numbers of biological observations

to be smaller than those of chemical or hydrological observations. This was also the

Page 45: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

45

case in the present case studies.

The careful selection of sampling design and treatment method, which are of great

importance for water quality prediction, was left out of the data acquisition process

in the majority of the cases studied here. Sampling was not randomized, and exper-

imental design methods were not employed other than in the case of Lake Pyhajarvi

where zooplankton was sampled using a stratified random design.

Page 46: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

46

3 Objectives of river basin management

3.1 General objectives

The general objectives of water quality management in river basin planning and

decision making are sustainable use and management of the waters and their good

ecological status (Figure 3.1). The first things to be decided are the water quality

standards and the acceptable probability of these being exceeded. Water quality

standards include numerical values for threshold values separating attainment from

non-attainment of the management objectives with respect to the given variables.

The water quality predictions can then be used to infer target pollutant loads which

will achieve compliance with the water quality standards (Figure 3.2) and to generate

a set of feasible management actions in the planning phase with a view to the cost

and benefit analysis in the decision phase.

Figure 3.1: General objectives of water quality management in river basin planning.

Page 47: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

47

Figure 3.2: Inference of target pollutant loading using predicted water qualityresponse i.e. pollutant or algal biomass concentration in a lake or a river. Waterquality standards include numerical values for threshold values separating attain-ment from non-attainment of the management objectives with respect to pollutantconcentration.

3.2 Objectives in case studies

3.3 Lake Lappajarvi

The objective of the management of Lake Lappajarvi was to limit chlorophyll a

to below 10 µg l−1 (Figure 3.3). The target phosphorus load that complied with

this criterion was selected from a number of phosphorus load scenarios and simu-

lated chlorophyll a concentrations. The simulations were based on the hydrological,

meteorological and phosphorus loading data for the one-year period April 1 1988

- March 31 1989. Responses were calculated at four loading levels: 1. present

0.35 gP m−2 a−1 2. Fast obtainable reduction (14.8 % reduction) 0.30 gP m−2 a−1

3. Desirable level (32.7 %reduction) 0.23 gP m−2 a−1 4. Best available protection

measures (44.9 % reduction) 0.19 gP m−2 a−1.

Page 48: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

48

Figure 3.3: Management objectives, actions and water quality standard for LakeLappajarvi.

3.4 River Kymi

The objective of management in the case of the River Kymi was to dredge or per-

manently immobilize sediments which were contaminated by dioxin compounds at

Kuusankoski (Figure 3.4). Dredging, if implemented, would have to be performed

in such a way that the migration of dioxin was minimized. Canalization of the river,

dredging of the most seriously contaminated sediments and a number of smaller

construction projects on the river constituted notable risks of further pollutant mi-

gration, and the migration of contaminated sediments and the exposure of the river

and its adjacent marine and human populations to PCDD/F compounds were pre-

dicted in order to assess these risks.

3.5 Lake Tuusulanjarvi

The objective of the management of Lake Tuusulanjarvi was to lower the trophic

status of the lake, which is the primary reason for oxygen depletion in its water,

fish deaths and the excessive internal phosphorus loading (Figure 3.5). The means

Page 49: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

49

Figure 3.4: Management objectives, actions and water quality standards for theRiver Kymi.

chosen for this have been artificial oxygenation, reduction of external nutrient loads,

dilution of the lake water with nutrient-poor water from a neighboring water body,

and control over fishing. The effect of artificial oxygenation on the dissolved oxygen

regime and the real-time control of oxygenation devices were studied here.

3.6 Lake Pyhajarvi

The objectives of management in Lake Pyhajarvi were to improve its ecological sta-

tus, recreational value and fish catches (Figure 3.6). This was to be done by reducing

the external nutrient load and controlling fishing. Farmers have been participating

in water protection projects initiated by the Southwest Finland Regional Environ-

ment Centre (SFREC) in 1991 and coordinated by the Pyhajarvi Protection Fund

since 1995 (Ventela et al., 2001). The necessary reductions in nutrients and the

optimal fishing management strategy will remain under continuous scrutiny all the

time the decrease in the occurrence of algal blooms and in nutrient concentrations

take place only slowly.

Page 50: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

50

Figure 3.5: Management objectives, actions and water quality standard for LakeTuusulanjarvi.

Figure 3.6: Management objectives, actions and water quality standards for LakePyhajarvi in Sakyla.

Page 51: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

51

3.7 Finnish lakes

The objective of the management of the Finnish lakes was to restore them to a good

ecological status (Figure 3.7). This involved reducing their nutrient load to a level

that meets the chlorophyll a standard, a proxy for a phytoplankton standard.

Figure 3.7: Management objectives, actions and water quality standards in theFinnish lakes.

Page 52: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

52

4 Evaluation of prediction methods

4.1 General objectives

The main objectives of prediction in river basin planning are to acquire and anal-

yse all the information necessary and to provide accurate and precise predictions of

the expected water quality outcomes of planned management actions (Table 4.1).

Since management decisions are usually made under conditions entailing consid-

erable predictive uncertainties, realistic estimates of the possible error contained

in predictions are needed. In addition, the adjustment of river basin plans every

six years calls for continuous monitoring of water quality, analysis of management

success and correction of failures. Ideally, this should be achieved by continuous

updating of the parameters and predictive distributions. On the other hand, the

precision of lake-specific (longitudinal) predictions will be low if the sample size is

small. Higher precision can be achieved most efficiently using estimation methods

which are able to pool cross-sectional information in order to make longitudinal

inferences. The accomplishment of the above objectives are expected to promote

efficient river basin planning and management.

Table 4.1: Criteria for predictions in river basin planning.

Accuracy and precision

Realistic error estimates

Ease of updating predictions

Coverage of large geographical areas

Pooling of cross-sectional data to longitudinal inference

Efficiency

Page 53: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

53

4.2 Classification of prediction methods

Water quality prediction methods can be classified by reference to several attributes,

the most important among which are the modelling approach, the structure of the

model and the scientific discipline concerned (Figure 4.1). According to this classi-

fication, the prediction approach can be either mechanistic, statistical or Bayesian,

where a mechanistic approach relies on comprehensive process description using nu-

merical integration of partial or ordinary differential equations (Jørgensen, 1980;

Chapra and Reckhow, 1983; Orlob, 1983; Chapra, 1997), while statistical prediction

is based on classical statistical point estimation, which is somewhat approximate if

applied to mechanistic models (Omlin and Reichert, 1999). In contrast, a Bayesian

approach can combine mechanistic process description and observational data result-

ing in a posterior predictive distribution, which is useful in river basin management

(Box and Tiao, 1973; Clark, 2006).

Models are classified here in terms of their structure and that of the data used as

either hierarchical or composed of a ”single” level (Gelman and Hill, 2006). Classical

”single” level estimates may be useless if fitted to a lake with a small sample size

and misleading, in that they ignore variation between lakes and lake types if fitted

to composite data representing different lake types. A hierarchical model allows the

estimation of lake and lake type-level effects and can achieve a compromise between

noisy and oversimplified classical estimates. A hierarchical linear regression model

was used here together with a geomorphological typology of Finnish lakes to estimate

the nutrient effect on chlorophyll a in lakes of varying sample sizes. For example, a

linear or generalized linear model in which probability models are assigned to the

regression coefficients can be considered as a hierarchical model. This second level

has parameters of its own, which are also estimated from the data.

A water quality prediction method can be classified in terms of scientific discipline

as either hydrological, chemical or biological. Hydrological predictions are often

Page 54: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

54

based on numerical integration of complicated continuity equations of mass, energy

and momentum, while chemical and biological reaction kinetics are normally ap-

proximated using ordinary differential equations or steady state linear regression

models.

Figure 4.1: Classifiers of prediction methods.

4.3 Bayesian inference using MCMC methods

Bayesian inference

A Bayesian approach facilitates continuous updating of parameters, error variances

and predictions as new information accumulates (Figure 4.2). Bayesian methods do

this formally as

posterior ∝ likelihood× prior (4.1)

The likelihood (p(y|θ)) and prior density p(θ) for a parameter enable calculation of

the posterior density p(θ|y), the distribution of which for an unknown parameter θ

Page 55: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

55

Figure 4.2: Elements of Bayesian posterior predictive inference in target pollutantload estimation.

is formulated as:

p(θ|y) =p(y|θ)p(θ)

p(y)=

p(y|θ)p(θ)∫ +∞−∞ p(y|θ)p(θ)dθ

(4.2)

The posterior density consists of the product of the likelihood and the prior dis-

tribution divided by the normalization constant. The integral in the normalization

constant∫ +∞−∞ p(y|θ)p(θ)dθ for a complex model is hard to calculate analytically, but

fortunately integration is not needed if Monte Carlo methods are used for posterior

simulation. This involves drawing repeated random samples of the parameter or

parameter vector. Several methods exist for posterior simulation and prediction,

among which the Markov chain Monte Carlo (MCMC) method allows simulation

of multivariate distributions and is usually implemented as a random walk through

the parameter space. During the ’burn-in’ period the Monte Carlo averages con-

verge to the target distribution, after which samples of parameters are used to

Page 56: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

56

estimate the posterior distribution. Non-standard distributions are sampled here

using Metropolis-Hastings (Hastings, 1970; Haario et al., 1999, 2001, 2003, 2004;

Clark, 2006) and Gibbs sampling (Gelman et al., 2005; Spiegelhalter et al., 1996,

2002; Clark, 2006).

Prior distributions

To fit the models to Bayesian methods, a prior distribution for the parameters needs

to be specified. Since prior independence of the parameters was assumed, only a

marginal density for each component of the parameter vector was assigned here.

The strongest form of prior assumption is that a parameter is a fixed constant, e.g.

as obtained from literature. Alternatively, a ’fixed’ constant may be treated as a

parameter with a narrow prior distribution. If no prior value is known or if we want

the posterior value to depend solely on the observed data, a flat ”non-informative”

prior assumption is preferred, perhaps with a positivity constraint, as with many of

the parameters in the present cases. Nevertheless, every new parameter increases

the dimension of the vector to be sampled and increases the computational burden.

Mainly Gaussian prior distributions with possible upper and lower limits for the

values (e.g. positivity constraints) are used in the present work.

Model error

Since the models described the system on a non-transformed scale, they had to

be transformed accordingly for the fitting procedure. The observational error was

modelled as a Gaussian random variable. The errors in the lake respiration and

chlorophyll a models were additive with respect to the modelled concentration, and

the error term εi in the phytoplankton model was additive with respect to the square

Page 57: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

57

root of the modelled concentrations:

√yi(t) =

√µ(xi; θ) + εi (4.3)

where εi is the error term. The error term of the model contains all the unexplained

factors, and may include several sources of errors other than pure observational

error. The error term is assumed to follow a Gaussian distribution with unknown

variance. We used a standard non-informative conjugate prior variance defined by

an inverse gamma distribution.

MCMC sampling

The Bayesian inferences regarding the respiration, phytoplankton and chlorophyll

a parameters in the model were implemented using MCMC sampling methods. In-

stead of a single fit to the data, statistical distributions were determined for the

model parameters. In practice, the process involved four steps: 1. Formulation of

prior probability distributions for unknown model parameters. 2. Statistical anal-

ysis of measurement errors. 3. Specification of likelihood function. 4. MCMC

(Markov chain Monte Carlo) sampling of the posterior probability distributions of

the parameters and predictions.

The Bayesian approach has been shown to be a powerful way of quantifying the

uncertainties in the whole modelling procedure (Adams, 1998; Annan, 2001; Borsuk,

2001; Borsuk et al., 2001; Harmon and Challenor, 1996; Omlin and Reichert, 1999;

Reckhow, 2002; Qian et al., 2002). The MCMC computations and adaptive MCMC

strategies used here are demonstrated and described in Haario et al. (2003). MCMC

is popular in computational statistics at the moment (Gelman et al., 2005) and can

be applied to a wide variety of modelling problems (Gamerman, 1997).

Page 58: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

58

Although recent advances in MCMC computing and increasing CPU resources have

made larger problems tractable (Haario et al., 2001), computational problems still

arise on account of correlations between parameters. The limited availability of

observational data and the structure of non-linear modelling equations may cause

correlation between parameters, which can be reduced through better design of

the experiments and reparametrization of the model. In situ monitoring does not

favour orthogonal observational design for generating completely uncorrelated ob-

servations of independent variables, however, and adaptive MCMC methods have

been developed as a remedy (Haario et al., 2001). Adaptive methods make the

procedure statistically efficient and reduce the need for laborious hand tuning of

the algorithm. In fact, they adapts the proposal distributions for the generation

of new samples according to the Adaptive Metropolis algorithm (AM) instead of

using a fixed proposal distribution. In addition, a number of different scales for

the proposal distribution were used, employing the Delayed Rejection (DR) method

(Haario et al., 2001, 2003, 2004).

The number of iterations that the Monte Carlo averages need to converge to the

true posterior distribution is called the burn-in period. Samples obtained after the

burn-in were saved for statistical inference of the posterior distribution. To ensure

convergence and to estimate the lake respiration and phytoplankton dynamics, sev-

eral runs were carried out sequentially, each sequence starting from the values of the

previous chain, and convergence was diagnosed visually from 1d and 2d plots of the

chains. In contrast, the length of the burn-in period for the hierarchical regression

model, multiple MCMC chains of different length were run and R statistics (Gelman

and Rubin, 1992) were calculated for each chain. If R ≈ 1 the burn-in period was

deemed adequate.

Page 59: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

59

Posterior simulation

Where Bayesian inference and MCMC methods were used for model fitting, the wa-

ter quality responses to planned pollutant load reductions and management actions

were predicted using posterior simulation methods. Predictions were calculated re-

peatedly with sampled parameter values and error variances from their posterior

distribution and with relevant environmental control variables derived from their

observed distributions. The simulated predictive distributions revealed prediction

errors realistically and rationalized river basin management accordingly.

4.4 Model validation

Prediction with mechanistic models was mainly based on a theoretical understanding

of the underlying mechanism and the consequent causal relationships. Runs with

data located outside the range of variation of the calibration data were used to

confirm the model and to reveal structural errors and limitations in it.

Validation in an empirical modelling approach is clearly related to the scientific

learning process (Kettunen, 1993; NRC, 2001; Brun et al., 2001; Omlin and Reichert,

1999; Reichert and Vanrolleghem, 2001; Clark, 2006), where a tentative model sug-

gests an experiment or observational data gathering process and an appropriate

analysis of the data can lead to a new experimental or observational design (Box

and Tiao, 1973). The alternation between the model and experiment is carried out

by means of experimental design and data analysis. The efficiency of the underlying

statistical learning process depends on the appropriateness and power of the design

and analysis methods employed. In Bayesian analysis, a prior distribution is com-

bined with the data to calculate the posterior distribution, from which inferences

regarding the parameters are to be made. The postulated probability model is never

Page 60: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

60

expected to be entirely true, but is chosen in the light of the available knowledge

and constructed with the simplest possible structure. It must therefore be tested at

each step in the investigation. Residual quantities are calculated and sensitivity to

prior distributions and model structure are tested to criticize the probability model

and to suggest modifications.

Comprehensive validation of mechanistic models is a luxury that is seldom achieved

in water quality management, due to limitations caused by sparse data and the

complicated model structure. Scientific learning using statistical analysis methods

involves a continuous iterative approach in which management decisions are condi-

tional on the validity of the tentative model and the available information. River

basin management decisions thus have to be modified alongside this iterative learn-

ing process and model criticism.

The validation of the mechanistic water quality model for Lake Lappajarvi was ham-

pered by the small observational water quality sample size, which meant that the

data could not be separated out into calibration and validation sets. By contrast,

one-year data on dioxin concentrations in settling suspended solids were available

for validation of the transport model for the River Kymi. The respiration model

for Lake Tuusulanjarvi and the chlorophyll a model for the Finnish lakes were not

validated, either, but the phytoplankton model for Lake Pyhajarvi was validated

with data from 5 additional years. The residual normality of the respiration, phy-

toplankton and chlorophyll a models was investigated through a graphical display

of the predictive distributions and observations. Sensitivities to prior distributions

and model structure were not studied.

To facilitate comparison of the hierarchical linear model with non-hierarchical dummy

variable models, we calculated the deviance information criterion (DIC), a Bayesian

measure of model complexity and fit (Spiegelhalter et al., 2002). DIC is the sum of

the posterior mean deviance D(θ), a Bayesian measure of fit or “adequacy”, and a

Page 61: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

61

complexity measure pD (effective number of parameters), which corresponds to the

trace of the product of Fisher’s information and the posterior covariance.

4.5 Analysis of case predictions

4.5.1 Lake Lappajarvi

Prediction method

A water quality model was constructed to link phosphorus loading and hydrological

conditions to phytoplankton growth and oxygen deficit in Lake Lappajarvi (Fig-

ure 4.3). The driving variables included wind, cloudiness, air temperature, humidity,

water outflow and the phosphorus loads from point and non-point sources.

Figure 4.3: Decision variables, prediction methods and predictions for water qual-ity management in Lake Lappajarvi.

Vertical mixing and temperature distribution were simulated by means of a one-

dimensional, horizontally integrated, k-e turbulence model, PROBE (Svensson, 1986;

Page 62: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

62

Svensson et al., 2002). It was assumed that the lake was horizontally homogeneous

and that gravitational effects obeyed the Boussinesq approximation. A complete de-

scription of the model and the numerical scheme are given by Svensson (Svensson,

1977, 1978; Svensson et al., 2002).

The ice increment was calculated using a degree-day method, while the melting

formulation took the decreasing ice thickness to be a linear function of air tempera-

ture. The model distinguished between ice increment or melting on the basis of the

direction of the net surface heat flux.

The water quality model coupled with the PROBE model simulates vertical mixing

and chemical and biological transformations of total phosphorus, dissolved oxy-

gen and chlorophyll a (a proxy for phytoplankton biomass). The transformations

were biological oxygen demand, phytoplankton growth and respiration, respiration

in the bottom sediment, growth, respiration and settling of chlorophyll a, exter-

nal phosphorus load, sedimentation and internal phosphorus load under anaerobic

conditions.

Model calibration and prediction

The simulated temperatures and ice thicknesses agreed well with the values observed

in 1987 - 1988, but the modelled temperature stratification in late August was ten

days longer than observed and a mean error of 1.5 days arose in the ice duration.

Of the water quality model parameters, the BOD decay rate, sediment oxygen de-

mand, net sedimentation of phosphorus, phosphorus release from the sediment under

anaerobic conditions, algal growth, algal respiration and rate of chlorophyll a sedi-

mentation were calibrated with one year of observed data (May 15, 1988 - April 30,

1989). Parameters were fitted to predict average chlorophyll a, dissolved oxygen and

total phosphorus concentrations. Calibration was carried out graphically without

Page 63: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

63

mathematical parameter optimization or error analysis methods. The model over

predicted chlorophyll a to a moderate extent. The mean squared error (MSE) was

29.6, the sum of squares (SS) was 355 and the root mean square error (RMSE) was

5.4.

Figure 4.4: Observed and simulated chlorophyll a concentrations [µg l−1] in LakeLappajarvi, May 15, 1988 - April 30, 1989.

The study of the effects of the reduction in non-point source phosphorus loading on

chlorophyll a was based on the hydrological, meteorological and phosphorus load-

ing data collected from April 1, 1988 to March 31, 1989. Since chlorophyll a as

predicted with scenario number 4 (the best available protection measures, 44 %

reduction) (Fig. 4.5) was below the standard (10 µg l−1), the respective load (0.19

gPm−2 a−1) was selected as a target nutrient load for lake management. As the

model error and parameter and predictive distributions were not estimated, and the

long-term variation in control variables was not measured or used in the simulation,

the computed effects ignored natural variation and predictive uncertainty. Hence,

it is not possible to calculate an explicit margin of safety for the target phosphorus

load estimates.

Page 64: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

64

Figure 4.5: Calculated chlorophyll a concentration [µg l−1] with loading levels:1.Present (in April1. 1988 - March 31. 1989) 0.35 gP m−2 a−1 2. Fast obtain-able load reduction(14.8 % reduction) 0.30 gP m−2 a−1 3. Desirable loading level(32.7 %reduction) 0.23 gP m−2a−1 4. Best available protection measures (44.9 %reduction) 0.19 gP m−2 a−1.

4.5.2 River Kymi

Prediction method

Flow velocity, water level and the transport of contaminated sediments and PCDD/F

compounds along the 130 km stretch of the River Kymi were calculated using a one-

dimensional (1-D) river model (Figure 4.6). The model was also used to calculate

time series and longitudinal profiles for suspended solids and PCDD/F concentra-

tions in the river water and bottom sediment. The resulting model was then applied

for the evaluation of the impact of dredging on the transport of PCDD/F compounds

downstream in the river and into the Gulf of Finland.

In the 1-D unsteady river flow model, the full de Saint Venant equations were solved

numerically with a double-sweep finite difference method in which Verwey’s variant

of the Preissmann implicit discretization scheme was used (Cunge et al., 1980).

The resistance term was calculated using the Manning approach, with the Manning

number taken as an empirical constant (Cunge et al., 1980). The 1-D sediment

Page 65: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

65

Figure 4.6: Decision variables, prediction methods and predictions for water qual-ity management in the River Kymi.

and contaminant transport model was used to calculate the convection, dispersion,

sedimentation and erosion of suspended solids and PCDD/F with unsteady flow

(Cunge et al., 1980), and the model was linked to the flow model. The sedimentation

rate of suspended solids in the river water and rate of erosion of the bottom sediments

were calculated as functions of shear stress. The bottom sediment was divided into

4 layers with differing consolidation times, and the values for these constants were

selected according to the sediment properties analysed. PCDD/F compounds were

assumed to migrate adsorbed to particulate matter.

Model calibration and prediction

A large amount of information was collected and assimilated into the 1-D hydraulic

river model. The settling velocity of suspended solids ws was calibrated with ob-

servations from 1980 to 1996, and the calculated PCDD/F concentrations in the

river and the PCDD/F concentrations analyzed in the sediment trap samples in

1997 were compared (Figure 4.7) in order to validate the model. The model approx-

Page 66: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

66

imated the main features of PCDD/F transport successfully but somewhat over

predicted PCDD/F concentrations in the sediments at the downstream end of the

river.

0

5 000

10 000

15 000

20 000

25 000

30 000

35 000

Calculated

Sediment trap

Keltti

Concentr

ation

Koskenalus järvi

0

2 000

4 000

6 000

8 000

10 000

Concentr

ation

Tammijärvi

0

2 000

4 000

6 000

8 000

10 000

1 2 3 4 5 6 7 8 9 10 11 12

Time (month)

Concentr

ation

PC

CD

FP

CC

DF

PC

CD

F

(ng g

-1)

(ng g

-1)

(ng g

-1)

Figure 4.7: Model verification. Calculated PCDD/F concentrations in suspendedsolids in the river water, and concentrations observed in sediment traps in 1997.

The effects of the dredging and removal of contaminated sediments at Kuusankoski

over the period 2000-2020 were examined based on two responses: the immediate

increase in suspended solids and PCDD/F concentrations in the water caused by

dredging in 2005 and the subsequent decrease. It was assumed that the most con-

taminated sediments (140 000 m3) between Kuusankoski and Keltti (Figure 2.2)

would be removed by dredging during a half-year period. Based on earlier experi-

ence, from 1% to 10% of the sediment removed was expected to be resuspended in

Page 67: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

67

the river water. The PCDD/F concentration in the dredged and resuspended bot-

tom sediment was 40 400 ng g−1 (140 ng I-TEQ g−1). In this case PCDD/F loading

would be about 300 kg. The model predicted that the simulated restoration dredg-

ing would cause a sudden increase in PCDD/F concentrations in the river unless

implemented carefully (Figure 4.8), but that concentrations would soon decrease to

a significantly lower level than before dredging. The estimated sensitivity of the

model to sediment parameters within the specified ranges did not indicate high risk

of the spreading of PCDD/F compounds.

0

5 000

10 000

15 000

20 000

25 000

a)

0 365 730

0

5 000

10 000

15 000

20 000

25 000

Keltti

Tammijärvi

b)

0 365 730

Co

ncen

trati

on

Time (d)

PC

CD

F(n

g g

-1)

PC

CD

F(n

g g

-1)

Co

ncen

trati

on

Figure 4.8: Calculated PCDD/F concentrations [ng g−1] in suspended solids inthe upper (Keltti) and lower (Tammijarvi) river stretches of the River Kymi before,during (days 534–713) and after dredging, on the assumption that 10 % (a) or 1%(b) of the dredged sediment would be resuspended in the river water.

Page 68: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

68

4.5.3 Lake Tuusulanjarvi

Prediction method

In standard lake aeration planning techniques, the average winter respiration rate

in the lake is typically estimated with linear regression, where the y variable is

the dissolved oxygen content of the water body [mg m−2] and the x variable the

time after the beginning of the ice-cover period [d]. The slope of the regression

line represents the respiration rate [mg m−2 d−1] (Lorenzen and Fast, 1977). In

this study, a dynamic ordinary differential equation was formulated (Figure 4.9)

that consisted of respiration and the oxygen flux of the aerator. The temperature

dependence of the respiration was calculated according to the Arrhenius formulation

(Bowie et al., 1985). The dissolved oxygen concentration was the average vertical

concentration in the area of aerator impact (1 km2, Fig. 2.3). Due to the fact that the

biological oxygen demand (BOD) was below the detection limit in winter periods, it

was not included in the model. A similar formulation has been used for modelling

estuarine and coastal oxygen dynamics (Borsuk, 2001; Borsuk et al., 2001).

The respiration model for Lake Tuusulanjarvi included 31 respiration rate param-

eters, one for each winter period, and 31 initial dissolved oxygen concentrations.

The temperature dependence constant θ was assumed to be independent of time,

and thus added only one parameter. The error in the x variables approach for the

dissolved oxygen feed term introduced five more parameters. Together with the

unknown observation error σ2, the total number of parameters totalled 69.

Prior distributions

To fit the respiration models using Bayesian inference and MCMC methods, a prior

distribution was specified for the parameters.

Page 69: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

69

Figure 4.9: Decision variables, prediction methods and predictions for water qual-ity management in Lake Tuusulanjarvi.

Non-informative prior distributions were used to explore the posterior distributions

for the total respiration rate constants (one for each winter period) without any

prior constraints (other than positivity). The proper prior distribution for the

temperature dependence parameter θ was acquired from a laboratory experiment

(Lehtoranta and Malve, 2001). The distribution suggested by the experiment was

Gaussian N(1.45, 0.4). A non-informative conjugate prior distribution was used for

the unknown variance σ2 in the observation error ε.

The term feedvol

in the model corresponds to the amount of fresh oxygen feed dissolved

in the lake water. The feed estimated by the manufacturer and the volume of

aerator impact are also subject to some uncertainty. Gaussian prior distributions

were assigned for the oxygen feed in the five periods.

Page 70: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

70

Model fitting and posterior predictive inference

The estimation of the long-term evolution of lake winter respiration and the predic-

tion of the lake oxygen regime in future winters were used as examples of how uncer-

tainties can be taken into account and predictions can be updated using Bayesian

inference and MCMC sampling.

The benefits of Bayesian estimation were that it was possible to pool information

from different sources (laboratory experiments and lake data) and to quantify the

uncertainties with a full statistical approach using prior and posterior distributions.

The future winters can be predicted with posterior information derived from past

observations and the prior distribution. This allowed the oxygenation efficiency, for

example, to be designed and controlled in order to ensure a target dissolved oxygen

concentration with a given margin of safety.

The unidentifiability of the model parameters could prevent separation of their ef-

fects, but it will not hinder prediction. This is due to the Bayesian computations,

which take the full multidimensional distributions of the parameters into account

without resorting to linearizations or other approximations.

This simple model with a separate rate parameter k estimated for each year gave

very good agreement with the winter observations (Figure 4.10) and allowed changes

in respiration (Figure 4.11) and the effect of the external oxygen feed to be studied

over the years.

Prior distributions for the rate parameter k, the initial O2 concentration and the

feed term can be created by pooling the posterior distributions from the past years,

whereupon these prior distributions can be used to compute the predictive O2 con-

centration and its prediction interval. As soon as the first observation for a winter

was received, a new model with the information from the previous years as its prior

Page 71: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

71

0

5

10

151971 1973 1974 1975 1976

0

5

10

151978 1979 1980 1981 1982

0

5

10

151983 1984 1985 1986 1987

0

5

10

151988 1989 1990 1991 1992

0

5

10

151993 1994 1995 1996 1997

0 25 50 751001250

5

10

151998

0 25 50 75100125

1999

0 25 50 75100125

2000

0 25 50 75100125

2001

0 25 50 75100125

2002

DO

(m

g l-1

) T

em

pera

ture

(C

)

Days of ice cover (d)

Figure 4.10: Observed oxygen concentrations (circles) [gm−3] and temperatures[Celsius] (lower solid line) during the ice-covered period in 1970–2000. The x-axisis time from the start of the ice-cover period. The dots represent the observedvertically averaged O2 concentrations. The smaller dots and the dashed line showthe observed temperatures [Celsius]. The solid line with a grey area around it showsthe median and the 95% region of the posterior predictive distribution.

Page 72: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

72

1970 1975 1980 1985 1990 1995 2000-0.1

0

0.1

0.2

0.3

0.4

smoothed k valuesh = 0.2

1970 1975 1980 1985 1990 1995 20000

0.1

0.2

0.3

0.4

0.5h = 0.4

k (

d-1

)k (

d-1

)

Figure 4.11: The smoothed rate constant k with two levels of smoothing. Theupper plot with the lowest parameter h = 0.2 corresponds to about a 6-year trend,and the lower one with h = 0.4 to about a 12-year trend. The grey levels give 50%,90% and 95% limits for the posterior distribution.

data was fitted and new posterior predictions were computed. The model and pre-

dictions were updated recursively with new observations (Figure 4.12).

The probability of the dissolved oxygen concentration falling below 4 mg l−1 was

computed by predicting the concentration at the end of the ice-covered period (Fig-

ure 4.13). The empirical distribution of the length of the winter was derived from

the observed lengths of the past winters.

Predictive distributions for the fresh oxygen feed needed as a function of the length

of the winter (Figure 4.13) could be simulated with the Monte Carlo method. This

enabled the predictions to be used for optimization and real-time process control of

the aerators.

Page 73: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

73

0 20 40 60 80 1000

2

4

6

8

10

12

141. observation

0 20 40 60 80 1000

2

4

6

8

10

12

142. observation

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

k posterior

0 20 40 60 80 1000

2

4

6

8

10

12

143. observation

0 20 40 60 80 1000

2

4

6

8

10

12

144. observation

DO

(m

g l-1

)D

O (

mg

l-1

)

DO

(m

g l-1

)D

O (

mg

l-1

)

Days of ice cover (d)

k (d-1)

Figure 4.12: Predicting dissolved oxygen during new winter in Lake Tuusulanjarvi.The four plots in the upper part show how prediction limits for the concentrationdecrease as more data become available. The lower plot shows how the posteriordistribution of parameter k becomes more accurate as more data become available.

Page 74: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

74

0 5 10 15

Oxygen at the end of the winter

DO[mg m-3 ]50 60 70 80 90 100

0

50

100

150

200

250

300

Oxygen feed needed for O2

> 4

length of winter [d]

fee

d [

kg

/d]

0 50 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Probability of O2

< 4 g m-3

days

Pro

ba

bili

ty

95% limitsmedian

De

nsity

Figure 4.13: Predicting dissolved oxygen concentration in Lake Tuusulanjarvi dur-ing a new winter. The first plot on the left shows predictive posterior distributionsfor the amount of oxygen in the water at the end of winter. The four distributionscorrespond to the four observations in Fig. 4.12. The middle plot shows the proba-bility of the concentration falling below 4 mg l−1 after the second concentration hasbeen observed. The plot on the right shows how the estimated fresh oxygen feedthat would be needed to keep the amount of oxygen above 4 mg l−1 depends on thelength of the winter after the second observation.

Page 75: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

75

4.5.4 Lake Pyhajarvi in Sakyla

Prediction method

The model used for the phytoplankton dynamics in Lake Pyhajarvi was relatively

standard in specification. According to earlier trophic correlation analyses (Sarvala

et al., 1998; Helminen and Sarvala, 1997), the variation in summer phytoplank-

ton biomass in Lake Pyhajarvi is regulated both by bottom-up (total phosphorus)

and top-down (planktivorous fish and zooplankton) forces. A strong year-class of

age-0+ vendace will depress the total zooplankton biomass, which in turn will re-

duce the grazing pressure from zooplankton, allowing an increase in phytoplankton

biomass (Helminen and Sarvala, 1997). Based on these assumptions, phytoplank-

ton was modelled with first-order reaction terms for growth, respiration, settling

and death by predation (Figure 4.14). The growth rate coefficient varied in re-

sponse to temperature, nutrients and light, and the non-predatory loss rate was also

temperature-dependent. Temperature dependence was expressed in an exponential

form, as commonly used in surface water quality modelling (Bowie et al., 1985).

The Michaelis–Menten equation was used to calculate growth limitation by total

phosphorus and total nitrogen. Grazing by crustaceans was taken to be a product of

the zooplankton filtration rate, crustacean zooplankton and phytoplankton biomass

concentrations (Bowie et al., 1985). Temperature and half-saturation effects were

omitted.

The growth and decay mechanisms were integrated into a minimal mass-balance

equation for the wet weight concentration of algae Ai. Spatial variations were aver-

aged out, and the lake was modelled as a continuously stirred tank reactor (CSTR).

The use of this kind of model was supported by the earlier analyses of trophic inter-

actions in this lake carried out by Sarvala et al. (1998). Phytoplankton was divided

into three dominant groups, Diatomophyceae, Chrysophyceae and nitrogen-fixing

Page 76: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

76

Figure 4.14: Decision variables, prediction methods and predictions for waterquality management in Lake Pyhajarvi in Sakyla.

Cyanobacteria, and an inhomogeneous group consisting of minor species.

While mechanistic water quality models tend to be overparametrized with respect

to available data, the number of parameters in our water quality model was reduced.

Still, there were 10 parameters to be estimated for each of the groups, in addition

to which the noisy measurements of the initial spring values for each algal group in

each of the eight periods were treated as unknowns. Thus, a total of 72 unknowns

had to be estimated. Many of the parameters were clearly correlated, and both the

control variables and the response data had high noise levels. It was obviously not

possible to estimate the parameter values accurately in such a situation.

Page 77: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

77

Prior distributions

Non–informative prior distributions with positivity constraints only were used for

the maximum growth rates µi, the non-predatory loss rates σi and zooplankton

filtration rate pi. In addition, Gaussian prior distributions with additional positivity

constraints were used for the half-saturation parameters and for the temperature

coefficients θi based on the rather wide ranges presented in the literature (Bowie

et al., 1985).

The model error εi was assumed to follow a Gaussian distribution with unknown

variance, for which a standard non-informative conjugate prior distribution defined

by an inverse gamma distribution was used. Separate error variances were estimated

for each of the four algal groups.

Model fitting and posterior predictive inference

The parameters were estimated using eight years of water quality and hydrology

observations (Figure 4.15). The parameters corresponding to Cyanobacteria (group

3) differed most clearly from the prior distributions, as these were better identified

and had smaller standard deviations.

The model fits the rather noisy data relatively well, although not perfectly (Fig-

ure 4.15), and the predictive intervals for the observations cover the data reason-

ably well. The same set of parameters was used to model each of the eight years.

The cyanobacterial blooms were predicted by the model in every year in which they

were actually observed. It should be noted that the predictive intervals of the fitted

model were far narrower than those of the observations.

Validation with data from five later years (Figures 4.16 and 4.17) revealed error in

Page 78: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

78

5 6 7 8 9 10 110

1

2

3diatoms

1992

5 6 7 8 9 10 110

0.5

1chrysphycea

5 6 7 8 9 10 110

1

2

3

n.fix cyanob

5 6 7 8 9 10 110

0.5

1

1.5

2minor

5 6 7 8 9 10 110

1

2

3

1993

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1994

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1995

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1996

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1997

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1998

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1999

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

Figure 4.15: Plots of fitted models and 95 % credible intervals during the growingseason. The rows represent years and the columns phytoplankton groups. Circles(o) denote observed algae wet biomass concentrations [mg l−1], and solid lines showthe median fits obtained by the MCMC method. The darker areas correspond to95% posterior predictive intervals, and the lighter areas show the predictive intervalfor new observations. The horizontal axis shows months of the year.

Page 79: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

79

the mechanistic model, since the predicted Cyanobacteria biomass in 2000 was very

low compared with the observed value. Interestingly, a linear regression model for

Cyanobacteria fitted to the observation for that year quite well (Figure 4.18). The

data were averaged yearly and centred.

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 20040

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Sinileväkeskiarvot, mallin ennuste vs. havainnot

mallin ennustaman keskiarvon jakaumahavaitut vuosittaiset sinileväkeskiarvot

PredictedObserved

Year

Cya

no

ba

cte

ria

bio

ma

ss (

mg

l-1

)

Figure 4.16: Validation of the phytoplankton model. Observed and calculatedCyanobacteria biomass in 1992–2004. Validation period 2000–2004.

The effects of zooplankton (Z), total phosphorus Ptot and water temperature on the

mean nitrogen-fixing Cyanobacteria wet biomass concentration (A3) during the late

summer period (July 26 – September 15), were simulated using the phytoplankton

model, the estimated parameters and varying control variable profiles. The simula-

tions were performed on a grid of varying Ptot, Z and temp profiles and repeated

with model parameters sampled from their posterior distributions and the obser-

vations sampled from their estimated distributions. The effects of biomanipulation

and nutrient reduction were visualized on separate 3-dimensional probability sur-

faces for the different temperature profiles with averages of the Ptot and Z profiles

on the x and y axes. The probability of exceeding the predefined water quality

criteria for the mean late summer Cyanobacteria concentration (0.86 mg l−1) were

plotted as a response surface (Figure 4.19).

By combining the information contained in the surfaces of Figure 4.19, a more

compact representation was plotted (Figure 4.20) that can be used to evaluate the

limits on total phosphorus (upper limit) and zooplankton conditions (lower limit) in

different mean temperatures for the attainment of the chosen water quality criteria

Page 80: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

80

Figure 4.17: Observed control variable values in Lake Pyhajarvi for calibration(1992-1999) and validation (2000-2004) of the phytoplankton model.

with a 95% probability. The calculated limits indicated that more zooplankton is

needed to compensate for the effects of increasing temperature and total phosphorus

and to fulfil the Cyanobacteria criteria laid down here. Within the observed range,

total phosphorus had a marginal effect on Cyanobacteria compared with grazing

by zooplankton, although the phosphorus effect increased slightly with temperature

(Figure 4.20). These results agreed with the more qualitative results of Sarvala et al.

(1998), where increased Z (due to the removal of planktivorous fish) was also seen

to be more effective than a reduction in total phosphorus.

Page 81: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

81

1992 1994 1996 1998 2000 2002 2004

0.0

1.0

2.0

3.0

Model 1

vuosi

Cya

nob.

[mg

L−1]

o o

o oo

Cyanob ~ − 0.19 * Plankt − 0.33 * Crust + 0.65 * Pload + 0.22 * TN − 0.67

1992 1994 1996 1998 2000 2002 2004

0.0

1.0

2.0

3.0

Model 2

vuosi

Cya

nob.

[mg

L−1]

o o

o oo

Cyanob ~ 0.73 * Pload − 0.03 * Plankt + 0.87

Figure 4.18: Calibration (1992-1999) and validation (2000-2004) of two optionalCyanobacteria models. Model 1 fitted best with the yearly averaged and centreddata. Model 2 was designed for lake management. Variables: Cyanob - biomass ofnitrogen-fixing Cyanobacteria [mg l−1], Plank - planktivorous fish [kg ha−1], Crust- herbivore zooplankton [mg l−1], Pload - total phosphorus load [kg d−1], and TN -total nitrogen concentration [µg l−1].

Page 82: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

82

050

100

15

20

250

0.5

1

Z

temp = 16.5 °C

Ptot

prob

050

100

15

20

250

0.5

1

Z

temp = 17 °C

Ptot

prob

050

100

15

20

250

0.5

1

Z

temp = 17.5 °C

Ptot

prob

050

100

15

20

250

0.5

1

Z

temp = 18 °C

Ptot

prob

050

100

15

20

250

0.5

1

Z

temp = 18.5 °C

Ptot

prob

050

100

15

20

250

0.5

1

Z

temp = 19 °C

Ptot

prob

Figure 4.19: Probability of the summer mean Cyanobacteria level being greaterthan 0.86 mg l−1. Ptot - total phosphorus concentration [µg l−1], temp - watertemperature [Co], and Z - grazing zooplankton biomass concentration [µgC l−1].

10 20 30 40 50 60 70 8015

16

17

18

19

20

21

22

23

24

25

16.5 °C17 °C

17.5 °C18 °C

18.5 °C 19 °C

Z [µg L−1]

Pto

t [µg

L−1 ]

P(A3mean>0.86) = 0.05

Figure 4.20: Control variable limits on exceeding a summer mean Cyanobacteriaconcentration of 0.86 mg l−1 with 0.05 probability. Each line denotes a differentmean temperature profile. Ptot - total phosphorus concentration [µg l−1] and Z -grazing zooplankton biomass concentration [µgC l−1].

Page 83: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

83

Integration of lake and catchment models

Later on, Bayesian inference and MCMC methods were applied to the total phos-

phorus and nitrogen models for Lake Pyhajarvi, and the fitted model was combined

with the phytoplankton model and a non-point load model that simulated the influ-

ences of buffer strip width, wetland percentage and forestation on total phosphorus

leaching from the catchment area into the lake (Saloranta et al., 2004).

The estimated posterior parameter distributions in the nutrient models closely cor-

related and the credible intervals of the predictions were quite high (Figure 4.21).

The phosphorus model fitted to the data better than did the nitrogen model.

Monte Carlo simulation was performed on the estimated parameter distributions

and observed distributions of the control variables (wind velocity, discharge, total

phosphorus and total nitrogen loading, water temperature and global irradiance)

in order to estimate the impacts of nutrient loads and fisheries management on the

probability of a mass occurrence of Cyanobacteria, the random variability caused by

parameter uncertainty and the natural variability in the controlling variables. The

resulting model was used to predict the consequences of fisheries management and

a reduction in loading and to find an optimal combination of these measures with

respect to the given target summer maximum Cyanobacteria biomass.

To incorporate natural variability into the predictions, samples of control variables

were taken from observed 30-year time series using the bootstrap method, adding

some artificial variability to the observed fluctuation in nutrient loadings and graz-

ing zooplankton biomass in order to extrapolate their impact on the probability of

a mass occurrence of Cyanobacteria. This extra variability was obtained by mul-

tiplying the loadings and zooplankton biomass by random variables sampled from

the uniform distributions [0.5 1.5] and [0.1 2.0] respectively. In each simulation the

model was first run to cover 20 years, in order to reach an equilibrium between

Page 84: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

84

Figure 4.21: Observed and fitted nutrient concentrations [µg l−1] and loads[kg d−1] in Lake Pyhajarvi in 1980-2001. Total phosphorus is in the upper plotand total nitrogen in the lower one. The nutrient models were fitted using Bayesianinference and MCMC methods. The darker grey area corresponds to the 95 % predic-tive limits of the fitted model, the solid line denotes the median algae concentration,and the lighter grey area gives the 95 % prediction limits for the observations.

nutrient load and lake concentrations, and was then continued for 10 more years to

give a sample of predictive variables (nutrient concentrations and algal biomass).

The MC sample was used to calculate a density estimate for mean total phosphorus

and maximum summer Cyanobacteria biomass conditioned on a set of total phos-

phorus loading and zooplankton biomass (summer maximum) ranges. The levels of

external phosphorus loading and zooplankton biomass that could attain the target

summer maximum Cyanobacteria biomass with the given margin of safety (90 %

percentile in this example ) were then estimated on the basis of these calculations.

Page 85: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

85

It is also easy to calculate all the necessary percentiles for average total phospho-

rus concentrations and summer maximum Cyanobacteria biomasses as a function of

the MC-sampled combinations of these parameters (Figure 4.22). Such results can

be used to find the optimal combination of TotP load reduction and zooplankton

biomass with the given range of certainty.

Figure 4.22: Estimated total phosphorus and summer maximum Cyanobacteriabiomass percentiles (10% – 90%) as a function of total phosphorus load and summermaximum grazing zooplankton biomass. (a) mean total phosphorus percentiles asa function of total phosphorus load; (b) Max. summer Cyanobacteria biomass as afunction of total phosphorus load (zooplankton biomass summer maximum fixed toa level of [30 50] mgC l−1; (c) Max. summer Cyanobacteria biomass as a functionof zooplankton biomass (total phosphorus load fixed to the level [30 40] kg d−1);(d) Summer maximum Cyanobacteria biomass 80 % percentile as a function of totalphosphorus load and summer maximum grazing zooplankton biomass. This responsesurface can be used to optimize nutrient load reduction and fisheries management.

In addition, Bayes network software HUGIN (www.hugin.com) was used (Saloranta

et al., 2004) to learn causal relationships and conditional probability tables based

on the Monte Carlo simulations of the lake model and a non-point load model

(Figure 4.23) and to estimate attainment of the designated water quality criterion

(Cyanobacteria summer maximum biomass < 0.86 mg l−1) with a set of designed

Page 86: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

86

Figure 4.23: Impact diagram for management decisions in Lake Pyhajarvi. Thisdiagram combines the lake and catchment nutrient transport models and can beused to estimate statistical relationships with respect to the most important deci-sion variables (rectangles) and their expected utilities (parallelogram) in terms ofattainment or non-attainment of the water quality criterion (Cyanobacteria < 0.86mg l−1).

management options: buffer strip width, wetland percentage, forestation percentage

and planktivorous fish management. The management options were implemented

by means of decision nodes and attainment of the water quality goal with a discrete

change node and a utility node that relates a certain value to each of the states

of the parent nodes, in this case 1 for attainment and 0 for non-attainment of the

water quality goal.

The Bayes network, decision nodes and utility nodes together formed an impact

Page 87: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

87

diagram which could be used to study management decisions and their expected

utilities in terms of Cyanobacteria summer maximum biomass and attainment of the

water quality criterion (Cyanobacteria < 0.86 mg l−1, Figure 4.23). The postulated

fisheries management scenario (catch of fish 6-12 kg ha−1) combined with moderate

catchment measures yielded a high probability (0.779) of attaining the water quality

criterion.

Page 88: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

88

4.5.5 Finnish lakes

Prediction method

The predictive model for lake chlorophyll a (chla) concentrations was constructed

on the assumption that the parameters for all lakes of the same type are likely to

be similar. Therefore the estimates for these parameters can be expressed in terms

of a common prior distribution. In other words, it was assumed that lake-specific

model parameters are random variables representing a common distribution for the

lake type. Computationally, it is natural to model the data hierarchically. That

is, individual observations of chlorophyll a concentration are made conditional on

lake-specific parameter values, which are in turn conditional on lake-type-specific

parameters, which again are conditional on a parameter distribution for all lakes in

Finland (Figure 4.24). Details of the Bayesian hierarchical modelling approach can

be found in Gelman and Hill (2006). Qian et al. (2004) indicated that the use of a

hierarchical modelling approach to pool data from different sources often results in

reduced model uncertainty and improved accuracy in the parameters estimated.

The hierarchical linear model for chlorophyll a may be summarized as follows:

log(yijk) ∼ N(Xβij, τ2)

Xβij = β0,ij + β1,ij × log(TPijk) + β2,ij × log(TNijk) +

+ β3,ij × log(TPijk)× log(TNijk)

βij ∼ N(βi, σ2i )

βi ∼ N(β, σ2) (4.4)

where log(yijk) is the kth observed log(Chla) value from lake j of type i, X is the

matrix containing the observed total phosphorus (TP) and total nitrogen (TN)

Page 89: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

89

Figure 4.24: Decision variables, prediction methods and predictions for waterquality management in Finnish lakes.

values from lake j of type i, βij = [β0,ij, β1,ij, β2,ij, β3,ij] is the lake-specific model

parameter vector which consists of the intercept (β0,ij) and slopes for log(TP) (β1,ij),

log(TN) (β2,ij) and for the combined effect of log(TP) and log(TN) (β3,ij), τ 2 is

the model error variance, βi = [β0,i, β1,i, β2,i, β3,i] is a vector of the model parameter

means for lake type i, σ2i =

[σ2

0,i, σ21,i, σ

22,i, σ

23,i

]is a vector of variances in model

parameters between lakes of type i, and β = [β0, β1, β2, β3] and σ2 = [σ20, σ

21, σ

22, σ

23]

are the means and variance for lake types. Note that the hierarchical notation in

equations 1-4 indicates conditional distributions, i.e. yijk is normally distributed

conditionally on Xβij and τ 2, βij is normally distributed conditionally on βi, σ2i , and

βi is normally distributed conditionally on β, σ2. The interaction term was added

to the model to account for the non-additive effects of total phosphorus and total

nitrogen.

The Markov chain Monte Carlo simulation (MCMC) method Gilks et al. (2001) was

Page 90: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

90

used for estimating the distribution parameters simultaneously by sampling them

from their joint posterior distribution.

Prior distributions

The non-informative prior distributions of β, τ, σi and σ were:

β ∼ N(0, 10000)

σi, σ, τ ∼ U(0, 100) (4.5)

where N(0, 10000) is the normal distribution of β with mean 0 and variance 10,000

and U(0, 100) is the uniform distribution of σi, σ and τ with lower (0) and upper (100)

limits. The prior distributions for σi, σ, τi and β are considered non-informative

or vague. The width of the 95% credible interval for the prior distribution of β is

approximately ±200, i.e. it is practically flat in the region of interest. The stan-

dard non-informative prior distribution for a variance parameter is p(σ2) ∝ 1/σ2,

which arises from assuming that the log of the variance parameter has a uniform

distribution on (−∞, +∞). This prior distribution is improper, which could lead to

an improper posterior distribution. Instead, we used a uniform distribution for the

standard deviation, as suggested by Gelman et al. (2005).

Model fitting and posterior predictive inference

The hierarchical chlorophyll a model was compared with the non-hierarchical type

of specific dummy variable model and with the linear lake-specific model. Fits for

four selected lakes were computed to illustrate the differences between the models

Page 91: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

91

and to show the effect of the sample size on fit and on the credible interval of the

prediction. The selected lakes were Lake Onkilampi - (shallow humic lake, type 8),

Lake Nurmijarvi (large non-humic lake, type 1), Lake Kuhajarvi - (shallow non-

humic lake, type 7) and Lake Paijanne - (large humic lake, type 2). The numbers of

observations for each lake were three, seven, 22 and 265 respectively. The compar-

ison was in general overwhelmingly in favour of the hierarchical model rather than

the non-hierarchical, type-specific model. The median Chlorophyll a concentrations

predicted using the hierarchical model were usually closer to the observed Chloro-

phyll a values than were the means predicted using the non-hierarchical dummy

variable model (Figure 4.25), suggesting that the hierarchical model fits the data

far better. This was also indicated by the R2 which was greater for the hierarchical

model, while the deviance and DIC of the hierarchical model were smaller than that

for the non-hierarchical dummy variable model, indicating that the increased num-

ber of parameters in the former was more than compensated for by the improved

fit.

When using the non hierarchical lake type–specific dummy variable model, all the

lakes within one type were treated as the same and their individual observations were

pooled. This model represented a weighted average with the weights proportional

to the sample size for each lake, i.e. it was weighted heavily in favor of lakes with

larger sample sizes. Consequently, the resulting model may be grossly biased as far

as lakes with small sample sizes are concerned. This feature was clearly illustrated

in the four selected lakes (Figure 4.25), where the hierarchical model treated those

of the same type as exchangeable and fitted lake-specific parameters for them, but

these parameters were assumed to come from the same prior distributions, thereby

pooling the information from similar lakes. This pooling of information reduced the

bias at the lake level and reduced the error variance as well.

The lake-specific non–hierarchical linear models were fitted using only data for a

specific lake. Despite the better fit of the non-hierarchical lake-specific model relative

Page 92: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

92

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

0 1 2 3 4 5

01

23

45

Pre

d l

og

(Ch

la)

Obs log(Chla)

Figure 4.25: Fit plot. 10 %, 50 % (circle) and 90 % percentiles of predictedChlorophyll a concentration [µg l−1] as a function of the observed value for fourselected lakes: a. Lake Onkilampi – (shallow humic lake, type 8), b. Lake Nurmijarvi(large non-humic lake, type 1), c. Lake Kuhajarvi – (shallow non-humic lake, type7) and d. Lake Paijanne -(large humic lake, type 2). The line at a 45◦ angle is the1–1 line (perfect fit). Percentiles were calculated with the lake type-specific non-hierarchical model(type), the hierarchical linear model (hier) and the lake-specificnon-hierarchical model (lake). 10 % and 90 % percentiles are connected with verticallines (linear – grey, solid line).

Page 93: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

93

to its counterparts, its error variance tended to be large when the sample size was

small but decreased as the sample size increased (Figure 4.25).

Lake-specific 80% percentile contour lines for Lake Paijanne (large humic lake, type

2) simulated with the hierarchical model (Figure 4.26) revealed the usefulness of

posterior simulations for water quality management. The simulations were confined

to the observational ranges of total phosphorus and total nitrogen in large humic

lakes (type 2, TP: 2–160, TN: 31–4400), which are below the lake-specific maximum

values (TP: 150, TN: 2000). The simulation in Figure 4.26 included total nitrogen

values outside the lake-specific observational ranges (TP: 6–150, TN: 300–2000),

but extrapolation was reasonable in this hierarchical setting due to the pooling

of information within and among the lake types. This was a distinct advantage

compared with the non-hierarchical lake model, which can predict only within lake-

specific observational ranges. This range can be limited for lakes with only a few

observations. The contour lines for Lake Paijanne were parallel to the y-axis in the

observational range, showing clear total phosphorus limitation of Chlorophyll a with

this range. On the other hand, total nitrogen limitation seemed to prevail near the

low total nitrogen boundary and in the high total phosphorus range. A lake manager

would be able to read off from figures similar to Figure 4.26 nutrient concentrations

that comply with Chlorophyll a standards with a given credible interval.

The effects of total phosphorus and total nitrogen were also illustrated in the pre-

dictive plots (Figure 4.27)). The simulated Chlorophyll a increased with total phos-

phorus, but not very much with total nitrogen. The 10%–90% percentile predictive

intervals seemed rather wide at first glance, but they were designated as credible

for individual observations, since the credible interval is always wider than the com-

monly presented fitted confidence interval for the mean. The predictive distribution

is directly related to the process of lake eutrophication assessment, while the fitted

mean is not.

Page 94: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

94

As the co-linearity of total phosphorus and total nitrogen makes it difficult to de-

termine their effects on Chlorophyll a from the estimated slopes alone, posterior

simulations for the Lake Paijanne (large humic lake, type 2)(Figure 4.26 & 4.27)

were calculated. These showed very clear total phosphorus limitation within the

observational range, indicating accurate separation of the effects despite the high

correlation (0.7) between the coefficients β1 and β2. The co-linearity was not trans-

ferred to the predictions.

TP

TN

0 50 100 150

05

00

10

00

15

00

20

00

141422

7131415191514

177

445 910

36

2317 16

55 6

49

21

101912

82

44

864

13963344 15161210

43 101265 1916

110

4386

15

5

10335 13

106511

6411

11

67

446

4

29

4443

1110

332333 141443

11

7164 12

14

60

33

4

33444 84 103333

33

23185520

65 134

2011

47

56623

411

57

33

44433345

11

718

571166123

49

34 111047

55 10

42

233347

1153

3354

1213

523

6

3113

2344 871012

435

811

4468

610

32

43

7

13

444353488

5512

38

31

754423442524

(ug l-1

)

(ug l-1)

Figure 4.26: 80 % percentile contour lines for predicted Chlorophyll a concen-trations in Lake Paijanne (large humic lake, type 2) at 15, 30, 60, 120 µg l−1 asa function of observed total phosphorus and total nitrogen concentrations [µg l−1].The predictions were simulated with the hierarchical linear model. Numbers areobserved Chlorophyll a concentrations [µg l−1].

Page 95: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

95

20 40 60 80 120

020

40

60

80

100

120

500 1000 1500 2000

010

20

30

40

50

TN

Pre

d C

hla

(ug l-1)(ug l-1)

(ug

l-1

)

(ug

l-1

)

(ug l-1)

Figure 4.27: Predicted chlorophyll a concentration [µg l−1] as a function of totalphosphorus and total nitrogen concentration [µg l−1] for Lake Paijanne (large humiclake, type 2), predicted with the hierarchical linear model. (50 % percentile - dottedline and 10 % – 90 % percentile credible interval – solid lines.) Total nitrogen iskept constant (50% percentile) while total phosphorus is varied within the observedrange, and vice versa.

Page 96: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

96

5 Attainment of prediction objectives

5.1 Case–specific objectives

The objectives of prediction differed greatly between the cases studied here (Ta-

ble 5.1). The aim at first was to predict water quality responses to reductions in

pollutant loading and to plan management actions as efficiently and realistically as

possible. Later on, Bayesian inference and MCMC methods were adopted for esti-

mating prediction errors without linearization (Figure 1.2) typical of classical least-

square methods. The small size of the water quality samples in the lake monitoring

data available suggested that complex mechanistic models would prove inefficient

for practical river basin management purpose despite the ease of applying MCMC

methods, and therefore Bayesian inference and MCMC methods were applied only

to the water quality submodels at first. Later on, the nutrient model was fitted to

data from Lake Pyhajarvi by Bayesian methods, and was combined with the phy-

toplankton model to predict Cyanobacteria biomass as a function of nutrient load.

A hierarchical chlorophyll a model for the Finnish lakes was developed in order to

meet the need for a simple and efficient prediction method for use in river basin

management.

5.2 Efficiency in river basin management

The applications of the prediction methods developed here will be evaluated in the

following paragraphs on the basis of the case studies and the criteria for predictive

scientific theories as summed up by Peters (1991). The selected criteria highlight

the efficiency and predictive power of a scientific theory and reject exhaustive causal

explanation of the natural processes involved as an objective for useful scientific

Page 97: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

97

Table 5.1: Attainment of case-specific prediction objectives

Case Objective

Lappajarvi – Calculation of target phosphorus load

given a chlorophyll a standard

– Setting up a criteria for restoration

dredging

Kymijoki – Prediction of dioxin migration during

and after the planned restoration dredg-

ing

Tuusulanjarvi – Adaptive, real time control of artificial

oxygenation efficiency given a dissolved

oxygen standard and a acceptable prob-

ability of exceeding it

– Pooling of cross-sectional information

– Realistic error estimation

Pyhajarvi – Posterior predictive inference of target

nutrient concentration and zooplankton

biomass given a algae biomass standard

and a acceptable probability of exceeding

it

– Pooling of cross-sectional information

– Realistic error estimation

Finnish lakes – Posterior predictive inference of target

phosphorus loading given a chlorophyll a

standard and a acceptable probability of

exceeding it

– Pooling of cross-sectional information

– Realistic error estimation

Page 98: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

98

theories. It was assumed that water quality prediction is ideally based on such

theories. The criteria of Peters (1991) were:

• Relevance - focus on the management question in hand.

• Practicability - direct applicability to decision making.

• Generality - a small number of loose preconditions and applicability across a

greater range of predictor variables.

• Efficiency of effort - amount of information obtained with the least effort.

Effort is the cost required to perform the measurements and make and apply

the predictions.

• Heuristic power - capability for inspiring debate on management options.

• Quantification - ease of deciding the accuracy and precision of the predictions.

• Accuracy - similarity between predicted and measured mean values.

• Precision - narrowness of the confidence interval (credible interval in Bayesian

terms).

• Immediacy - a small number of intermediates necessary for relevant predic-

tion.

• Simplicity - minimization of mathematical treatment and structure.

The successes and failures of the predictions in the five management cases studied

here were evaluated according to the preceding criteria (Table 5.2). The majority

of the predictions did indeed focus on management questions (relevance), but the

chlorophyll a prediction failed in this respect because the link between nutrient

concentrations and nutrient loads that would have been needed for targeting nutrient

load reduction in Finnish lakes was lacking.

Page 99: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

99

The direct applicability of the mechanistic prediction models for Lake Lappajarvi

and the River Kymi to decision making (practicability) was limited by the lack of

proper error estimates. Sensitivity analysis was used as a surrogate to rule out the

risk of the migration of dioxin in the case of restoration dredging. The other predic-

tions entailed comprehensive and realistic error estimates, but the applicability of

the predictions for Finnish lakes was limited for the same reasons as their relevance.

A small number of loose preconditions and their applicability over greater ranges of

the predictor variables (generality) were distinctive in the case of oxygen prediction

in Lake Tuuslanjarvi and chlorophyll a prediction in the Finnish lakes. The other

predictions were derived from a number of initial and boundary conditions and

parameter values.

The highest amount of information for the least observational and computational

cost (efficiency) was obtained using the simple prediction models for Lake Tuusu-

lanjarvi, Lake Pyhajarvi and the Finnish lakes, whereas the complex mechanistic

models yielded information but at a substantial computational cost.

The capability for inspiring debate on management options (heuristic) was accept-

able in the case of all the predictions, though the complexity of the mechanistic

models for Lake Lappajarvi and the River Kymi made it hard for non-specialists to

track the entire causal chain.

The accuracy (similarity between predicted and observed values) of all the predic-

tions was reasonable, but precision (narrowness of the credible interval) was not

estimated for the mechanistic models for Lake Lappajarvi and the River Kymi, so

that the quantification of predictions could be said to have been insufficient. In

contrast, the precision of the phytoplankton predictions for Lake Pyhajarvi and the

chlorophyll a predictions for the Finnish lakes was low, but the most important

thing was that it was estimated realistically.

Page 100: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

100

The criteria of immediacy (a small number of intermediates necessary for relevant

prediction) and simplicity (minimization of mathematical treatment and causal de-

duction) were well implemented in the simple predictions for Lake Tuusulanjarvi,

Lake Pyhajarvi and the Finnish lakes, whereas the complex causal models for Lake

Lappajarvi and the River Kymi included many intermediary variables and processes.

The predictions for Lake Lappajarvi and the River Kymi had the lowest sum of scores

(Table 5.2), but the simulation ofthe mass and energy balances and the large amount

of information included in them were of relevance for river basin management. By

contrast, the simple models for Lake Tuusulanjarvi, Lake Pyhajarvi and the Finnish

lakes together with the Bayesian inference and MCMC sampling methods resulted in

better scores. In the end, the criteria for predictive scientific theories proved to be a

useful guide for the development of prediction methods for river basin management.

Page 101: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

101

Table 5.2: General efficiency of predictions. Scores (1 = success, 0 = fail) areassigned for the selected criteria of predictive scientific theories (Peters, 1991).

Characteristic Value

Lappajarvi River Kymi Tuusulanjarvi Pyhajarvi Finnish lakes

Pred. variable Chla Dioxin O2 Algae Chla

Stat. inference - - MCMC MCMC MCMC

Number of 53 33 4/year 11/species 4/lake

pararam. (69) (72) (9206)

Number of 11 4 1 4 1

variables

Number of 8 7 2 6 2

factors

Criterion

Relevance 1 1 1 1 0

Practicability 0 0 1 1 0

Generality 0 0 1 0 1

Efficiency 0 0 1 1 1

Heuristic 1 1 1 1 1

Quantification 0 0 1 1 1

Accuracy 1 1 1 1 1

Precision - - 1 0 0

Immediacy 0 0 1 1 1

Simplicity 0 0 1 1 1

Sum 3 3 10 8 7

Page 102: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

102

6 Discussion

6.1 Significance of the developed prediction methods

Implementation of the EU Water Framework Directive has initiated unparalleled

administrative preparations for restoring surface waters to a good ecological status.

Legislative demands for the sustainable use of surface waters have increased, river

basin planning as adopted in Finland according to the EU Water Framework Direc-

tive has altered the objectives and implementation of water quality management,

and new pollutant load controls that achieve the enhanced standards have had to

be planned, approved, executed and updated every six years in the catchment areas

of hundreds of lakes and rivers.

On the other hand, the efficiency of water quality predictions representing different

levels of mechanistic and statistical sophistication has not been examined system-

atically before, and water quality predictions using Bayesian posterior predictive

inference and MCMC methods have rarely been implemented in river basin plan-

ning or decision making (Adams, 1998). From now on, these prediction methods can

be applied to river basin planning with a better knowledge of their capacity and lim-

its. The MCMC methods that were used for the posterior simulation of predictive

distributions will be particularly useful in the adaptive management suggested for

the implementation of the EU Water Framework Directive (Saloranta et al., 2003),

providing updated predictions that are directly applicable to the adaptation of river

basin management plans and decisions.

Page 103: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

103

6.2 Benefits and limitations

Large predictive errors make river basin planning and decision making difficult for

without efficient prediction methods the risks remain beyond control and wrong

decisions may be made (NRC, 2001). The Bayesian posterior predictive inference

methods tested here enabled the prediction errors to be estimated more realistically

than with classical least-square methods and first-order error analysis.

Longitudinal (lake-specific) water quality predictions and river basin management

decisions are often based on either cross-sectional (observations from many lakes)

or longitudinal (observations from one lake) monitoring data, with the result that

they tend to be inaccurate or imprecise (wide credible intervals) (Qian et al., 2004).

Bayesian posterior predictive inference and hierarchical linear regression models were

used here as a remedy, to facilitate the pooling of cross-sectional information and

to make the lake-specific predictions more accurate and precise.

Simple statistical prediction is often fast, easy, inexpensive and the most effective

way of determining a predictive relationship (NRC, 2001; Brun et al., 2001; Reichert

and Vanrolleghem, 2001). Predictions made using overparametrized mechanistic

models without realistic error estimates did not score well in the present assessment

of efficiency (Table 5.2), showing their inability to ensure success in management.

The simple statistical predictions scored much better, indicating that complicated

mechanistic prediction methods are unreasonably difficult and expensive to apply.

Model confirmation is a very complicated issue and requires a considerable number

of observations, as it is believed that check runs with data not used in fitting the

model will offer the best means of revealing structural errors and limitations in a

mechanistic model. Besides the huge data requirements, the difficulty of coding,

fitting and validation of mechanistic models reduces the efficiency of mechanistic

modeling approach in a river basin management context. Moreover, measuring

Page 104: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

104

campaigns need to be comprehensively and well designed in order to provide water

quality predictions that are relevant in river basin management (Kettunen, 1993;

Reichert and Vanrolleghem, 2001; Brun et al., 2001). Unfortunately, a comprehen-

sive validation of mechanistic models is a luxury that is seldom achieved, due to the

overparametrization of models with respect to given data and sample size (NRC,

2001; Reichert and Vanrolleghem, 2001; Brun et al., 2001). In contrast, the compu-

tational cost and data need per lake of analyzing the two thousands Finnish lakes

with a hierarchical regression model was very low. Thus, simple models that are

easily substantiated are preferable as demonstrated here.

The implementation of MCMC runs may be more difficult, as there is always the

question of convergence of the MCMC chain to consider. This question is particu-

larly important when dealing with a model having a large number of parameters.

Parameter correlation and overparametrization increase these problems still further

(Haario et al., 2001, 2004, 2003). Nevertheless, adaptive MCMC methods speeded

up handling of the convergence considerably.

Page 105: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

105

7 Conclusions

This study provided a new, complete statistical error analysis method for water

quality prediction which facilitates realistic error estimation, the pooling of cross-

sectional information for the purposes of lake or river-specific prediction and the

updating of predictions. As a result, river basin planning can be based on efficient,

flexible and realistic prediction methods.

7.1 Main findings

The main findings were:

• The realistic estimation of error in predictions is a prerequisite for effective

river basin management.

• Realistic error estimates for mechanistic water quality predictions can be

obtained using Bayesian posterior predictive inference and MCMC sampling

methods.

• The accuracy and precision of water quality prediction can be improved us-

ing Bayesian inference and a hierarchical model which pools cross-sectional

information for lake-specific predictions.

• Bayesian inference and MCMC methods are no more difficult to implement

than classical statistical methods. Even models with large numbers of corre-

lated parameters can be fitted using modern computational methods.

• Simple empirical models are efficient for river basin management, indicating

that complex mechanistic models are unreasonably difficult and expensive to

apply.

Page 106: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

106

7.2 Water quality prediction, monitoring and river basin man-

agement

Guidelines were set up for water quality monitoring, prediction and river basin

management in order to cope with a large number of lakes and rivers using relatively

small sample sizes.

River basin monitoring should be designed statistically using the prediction error

of the water quality model as an objective function. This will maximize the in-

formation value of water quality observations and minimize the prediction error.

In addition, national networks for monitoring diffuse pollutant loads should be es-

tablished instantly in order to meet the pressing needs of river basin management.

Without determined monitoring efforts, water quality predictions will be biased and

river basin management may fail to maintain the sustainable use of water resources.

Prediction is ideally implemented using Bayesian inference, MCMC methods and

a simple hierarchical model. The complexity of existing mechanistic water quality

models should be simplified in order to reduce their computational costs and large

data requirements.

A river basin management decision should be based on a method of statistical in-

ference that takes account of all the prediction errors realistically. This will allow

progress towards the sustainable use and management of river basins to be efficiently

maintained.

Page 107: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

107

7.3 Continuation of research

The potential of prediction has not yet been fully realized in river basin management.

For example, a prediction model can be used for the statistical design of observations

in order to maximize their information value (Kettunen, 1993). Disregard of this

possibility has resulted in inefficient observations and imprecise predictions. This

feature will be even more important for the updating of river basin plans every six

years. The continuous updating of predictions and river basin plans along with con-

tinuous monitoring constitutes an adaptive management procedure that facilitates

continuous learning and correction of the courses of action adopted on the way to

achieving agreed water quality goals. The statistical design of measurement proto-

cols should be integrated into water quality prediction and management in order to

galvanize the development of adaptive management strategies. The Bayesian poste-

rior predictive inference methods introduced in this study provide new possibilities

for this kind of development – not least for the statistical updating procedure, which

is an intrinsic part of Bayesian inference. The use of simulated response surfaces

should be developed further to allow response surface methods to be applied to river

basin planning and to make good use of past and future monitoring data. The use

of MCMC methods in the present instance was limited to water quality models of a

single water body, but in order to enhance the utility of predictions for river basin

management, their application should now be extended to cover entire river basins

and a wider range of restoration techniques.

The inferential statistics developed in this study help in drawing conclusions and

making predictions on the basis of limited information, but statistical decision mak-

ing can go further. In addition to prediction, it can helps in revising management

actions and monitoring programmes and in choosing among a number of alterna-

tive forms of river basin management. Its full potential (Raiffa and Schlaifer, 2000;

Winkler, 2003) for river basin management remains to be explored.

Page 108: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

108

References

Adams, B., 1998. Parameter distributions for uncertainty propagation in water qual-

ity modeling. Ph.D. thesis, Department of Environment, Duke University.

Annan, J. D., 2001. Modelling under uncertainty: Monte Carlo methods for tempo-

rally varying parameters. Ecological Modelling 136, 297–302.

Anonymous, 1984. Tuusulanjarven kunnostussuunnitelma. Keski-Uudenmaan

vesiensuojelun kuntainliitto., Vantaa, Finland, in Finnish: Restoration plans for

Lake Tuusulanjarvi.

Ascher, W., Overholt, W., 1983. Strategic Planning and Forecasting, Political risk

and economic opportunity. Joh Wiley & Sons.

Berthouex, P., Brown, L., 2002. Statistics for environmental Engineers. Lewis Pub-

lishers.

Borsuk, M., E., June 2001. A graphical probability network model to support water

quality decision making for the neuse river estuary, north carolina. Ph.D. thesis,

Duke University, Nicholas of the environment and earth sciences.

Borsuk, M., Higdon, D., Stow, C., Reckhow, K., 2001. A bayesian hierarchical model

to predict benthic oxygen demand from organic matter loading in estuaries and

coastal zones. Ecological Moelling 143 (3), 165–181.

Bowie, G., Mills, W., Porcella, D., Campbell, C., Pagenkopf, J., Rupp, G., Johnson,

K., Chan, P., Gherini, S., Chamberlin, C., 1985. Rates, constants, and kinetic

formulations in surface water modeling. Tech. Rep. EPA/600/3-85/040, U.S. En-

vironmental Agency, ORD, Athens, GA, ERL.

Box, G. E. P., Tiao, G. C., 1973. Bayesian Inference in Statistical Analysis. Addi-

son-Wesley.

Page 109: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

109

Brun, R., Reichert, P., Kunsch, H., 2001. Practical identifiability analysis of large

environmental simulation models. Water Resources Research 37 (4), 1015–1030.

Chapra, S., Reckhow, K., 1983. Engineering Approaches for lake management, Vol-

ume 2: Mechanistic modeling. Buttreworth Publishers.

Chapra, S. C., 1997. Surface water-quality modelling. WCB/McGraw-Hill, New

York.

Chow, V., 1959. Open-channel hydraulics. McGraw-Hill Book Company.

Clark, J., January 2006. Models for ecological data, In press., Princeton University

Press, Princeton, New Jersey, USA.

Cunge, J., Holly, F., Verwey, A., 1980. Practical aspects of computational river

hydraulics. Pitman Advanced Publishing Program.

Dyer, K., 1986. Coastal and estuarine sediment dynamics. Joh Wiley & Sons, New

York.

Frisk, T., 1989. Development of mass balance models for lakes. Ph.D. thesis, Helsinki

University, Helsinki, Finland.

Gamerman, D., 1997. Markov Chain Monte Carlo – Stochastic simulation for

Bayesian inference. Chapman & Hall.

Gelman, A., Carlin, J., Stern, H., Rubin, D., 2005. Bayesian Data Analysis. Chap-

man & Hall.

Gelman, A., Hill, J., 2006. Data analysis using regression and multilevel/hierarchical

models. Cambridge University Press.

Gelman, A., Rubin, D., 1992. Inference from iterative simulation using multiple

sequences. Statistical Science 7, 457–511.

Gilks, W., Richardson, S., Spiegelhalter, D., 2001. Bayesian Statistical Modelling.

Wiley, West Sussex, England.

Page 110: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

110

Graf, W., 1971. Hydraulics of sediment transport. MaGraw-Hill Book Company.

Haario, H., Laine, M., Lehtinen, M., Saksman, E., Tamminen, J., 2004. MCMC

methods for high dimensional inversion in remote sensing. Journal of the Royal

Statistical Society, Series B 66, 591–607.

Haario, H., Laine, M., Mira, A., Saksman, E., October 2003. DRAM: Efficient adap-

tive MCMC. preprint 374, University of Helsinki, Department of Mathematics.

Haario, H., Saksman, E., Tamminen, J., 1999. Adaptive proposal distribution for

random walk Metropolis algorithm. Computational Statistics 14, 357–395.

Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive Metropolis algorithm.

Bernoulli 7 (2), 223–242.

Harmon, R., Challenor, P., 1996. A Markov chain Monte Carlo method for estima-

tion and assimilation into models. Ecological Modelling 101, 41–59.

Hastings, W., 1970. Monte carlo sampling methods using markov chains and their

application. Biometrika 57 (1), 97–109.

Helminen, H., Sarvala, J., 1997. Responses of lake Pyhajarvi (southwestern Finland)

to variable recruitment of the major planktivorous fish, vendace. Can. J. Fish. Aq.

Sci. 54, 32–40.

Huttula, T., 1994. Modelling the transport of suspended sediment in shallow lakes.

Ph.D. thesis, Department of Geophysics, University of Helsinki.

Jørgensen, S., 1980. Lake Management. Pergamon Press, Oxford.

Kettunen, J., 1993. Model-oriented data analysis with applications to lake and

soil water simulation. Ph.D. thesis, Helsinki University of Technology, Water Re-

sources Engineering, Espoo, Finland.

Kinnunen, K., Nyholm, B., Niemi, J., Frisk, T., Kyla-Harkka, T., Kauranne, T.,

1982. Water quality modlling of finnish water bodies. Publications of the water

research institute 46, National Board of Waters.

Page 111: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

111

Kokkonen, T., 1997. Parameter identification in groundwater models — Part I:

Bayesian approach to inverse groundwater problem. Licentiate thesis, Helsinki

University of Technology, Faculty of Civil and Environment Engineering, Helsinki,

Finland, 106 p.

Lehtoranta, J., Malve, O., 2001. Tuusulanjarven pohjasedimentin hapenkulutuskoe

26.4.–4.5.2001. Tech. rep., SYKE, Helsinki, in Finnish: Oxygen consumption rate

of Lake Tuusulanjarvi sediments.

Lorenzen, M., Fast, A., 1977. A guide to aeration/circulation techniques for lake

management. Tech. Rep. EPA-600/3-77-004, Corvallis environmental research

laboratory, Office of research and development, U.S. Environmental protection

agency, Corvallis, Oregon 97330.

Manly, B., 2001. Statistics for environmental science and management. Chapman &

Hall/CRC.

Niemi, J., Heinonen, P., Mitikka, S., Vuoristo, H., Pietilainen, O.-P., Puupponen,

M., E., R., 2001. The Finnish Eurowaternet with information about Finnish water

resources and monitoring strategies. No. 445 in Finnish Environment Institute,

Environmental Protection, The Finnish Environment. Edita Ltd., Helsinki, Fin-

land.

NRC, 2001. Assessing the tmdl approach to water quality management. Tech. rep.,

National Academy Press, Washington D.C., National Research Council, Water

science and technology board, Division of earth and life studies.

Nyroos, H., 1994. Water quality assessment in water protection planning. Publi-

cations of the water and environment research institute 14, National Board of

Waters and the environment.

Omlin, M., Reichert, P., 1999. A comparison of techniques for the estimation of

model prediction uncertainty. Ecological Modelling 115, 45–59.

Page 112: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

112

Orlob, G., 1983. Mathematical modeling of water quality: stream, lakes and reser-

voirs. Jon Wiley & Sons.

Peters, R., 1991. A critique for ecology. Cambridge University Press.

Qian, S., Donnelly, M., Schmelling, D., Messner, M., Linden, K., Cotton, C., 2004.

Ultraviolet light inactivation of protozoa in drinking water: a bayesian meta-

analysis. Water Research (38), 317–326.

Qian, S. S., Stow, C. A., Borsuk, M. E., 2002. On Monte Carlo methods for Bayesian

inference, to appear in Ecological Modelling.

Raiffa, Schlaifer, 2000. Applied statistical decision theory. Wiley Classicacs Library.

Rankinen, K., 2006. Analysis of inorganic nitrogen leaching in a boreal river basin

in northern finland. Ph.D. thesis, Helsinki University of Technology.

Reckhow, K., Chapra, S., 1983. Engineering Approaches for lake management, Vol-

ume 1: Data analysis and empirical modeling. Buttreworth Publishers.

Reckhow, K. H., 2002. Bayesian approaches in ecological analysis and modeling. In:

Canham, C. D., Cole, J. J., Lauenroth, W. K. (Eds.), The Role of Models in

Ecosystem Science. Princeton University Press.

Reichert, P., Vanrolleghem, P., 2001. Identifiability and uncertainty analysis of the

river water quality model no. 1. Water Science and Technology 43 (7), 329–338.

Saloranta, T., Kamari, J., Rekolainen, S., Malve, O., 2003. Benchmark criteria: A

tool for selecting appropriate models in the field of water management. Environ-

mental Management 32 (3), 322–333.

Saloranta, T., Malve, O., Bakken, T., Ibrekk, A., Moe, J., December 2004. Lake

water quality models and benchmark criteria; delivery report from the lake model

work package (wp6) of the bmw-project. Tech. rep., Norwegian Institute for Water

Research, P.O.Box 173 Kjelsas, N-0411 Oslo, Norway.

Page 113: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

113

Sarkkula, J., 1991. Measuring and modelling water currents and quality as a part

of decision making process for water pollution control. Ph.D. thesis, Tartu Uni-

versity, Taru, Estonia.

Sarvala, J., Helminen, H., Karjalainen, J., 2000. Restoration of Finnish lakes us-

ing fish removal: changes in the chlorophyll - phosphorus relationship indicate

multiple controlling mechanisms. Verh. Internat. Verein. Limnol. 27, 1473–1479.

Sarvala, J., Helminen, H., Saarikari, V., Salonen, S., Vuorio, K., 1998. Relations

between planktivorous fish abundance, zooplankton and phytoplankton in three

lakes of differing productivity. Hydrobiologia 363, 81–95.

Sarvala, J., Jumppanen, K., 1988. Nutrients and planktivorous fish as regulators of

productivity in Lake Pyhajarvi, SW Finland. Aqua Fennica (18), 137–155.

Scavia, D., 1980. Uncertainty analysis of lake eutrophication model. Ph.D. thesis,

Environmental and Water Resources Engineering in the University of Michigan.

Spiegelhalter, D., Best, N., Carlin, B., van der Linde, A., 2002. Bayesian measures

of model complexity and fit. Journal of the Royal Statistical Society,(B) (64),

583–639.

Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., 1996. BUGS 0.5: Bayesian

Inference Using Gibbs Sampling Manual. Medical Research Council Biostatistics

Unit, Institute of Public Health, Cambridge, UK.

Streeter, V., 1958. Fluid mechanics. MaGraw-Hill Book Company.

Svensson, U., 1977. A complete derivation of a turbulent model for environmental

fluid flows. Tech. Rep. 3007, Dept. of Water resources engineering, Lund Institute

of Technology, University of Lund, Lund.

Svensson, U., 1978. A mathematical model of the seasonal thermocline. Tech. Rep.

1002, Dept. of Water resources engineering, Lund Institute of Technology, Uni-

versity of Lund, Lund.

Page 114: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

114

Svensson, U., 1986. Probe- an instruction manual. Oceanogr. 10, Swedish Meteoro-

logical and Hydrological Institute (SMHI), S-60176, Sweden.

Svensson, U., Axell, L., Sahlberg, J., Omstedt, A., December 2002. PROBE, Pro-

gram for Boundary Layers in the Environment, System discription and Manual,

Updated version. Computer-aided Fluid Engineering AB.

van Rijn, L., 1989. Handbook; sediment transport by currents and waves. H 461,

Delft Hydraulics.

Varis, O., 1991. Computational modeling of the environment with applications to

lake eutrophication. Ph.D. thesis, Helsinki University of Technology, Water Re-

sources Engineering, Espoo, Finland.

Ventela, A.-M., Kirkkala, T., Sarvala, J., Mattila, H., 2001. Stopping the eutrophi-

cation process of Lake Pyhajarvi. In: 9th International Conference on the Conser-

vation and Management of Lakes, 11-16 November 2001, Otsu, Japan, Conference

Proceedings,Session 3-1. pp. 485–488.

Vollenweider, R., 1976. Advances in defining critical loading levels for phosphorus

in lake eutrophication. Mem. Ist. Ital. Idrobiol. 33, 53–83.

Vollenweider, R., Kerkes, J., 1980. The loading concept as basis for controlling

eutrophication philosophy and preliminary results of the oecd programme on eu-

trophication. Prog. Wat. Tech. 12, 5–38.

Winkler, R., 2003. An introduction to Bayesian inference and decision, 2nd Edition.

Probabilistic Publishing.

Page 115: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

115

Glossary

Bayesian (posterior predictive) inference is a branch of statistical inference

that permits the use of prior knowledge for assessing the probability of model

parameters in the presence of new data. Bayesian inference has been termed

’subjective’ inference, because it allows a certain subjectivity in the selection

of the prior distribution and the prior distribution can greatly affect the

posterior distribution (the results). Bayesian inference is also regarded as a

useful tool for the exploratory analysis of data and as a way of rigorously

comparing sets of assumptions. The use of prior distributions nevertheless

necessarily implies a greater responsibility on the part of the researcher for

ensure that no unintentional biases are introduced into the results through

such prior distributions.

Burn-in is needed in Markov chain Monte Carlo sampling, where the sampled

values are not independent. During a ’burn-in’ period, the Monte Carlo

averages converge towards the target distribution. Samples of parameters

taken after the ’burn-in’ period are used to estimate the posterior distribution.

Ecological status Good ecological status is defined in Annex V of the Water

Framework Directive in terms of the quality of the biological community

and the hydrological and chemical characteristics. As no absolute standards

for biological quality can be set which apply across the Community, due to

ecological variability, only a slight departure from the biological community

which would be expected under conditions of minimal anthropogenic impact

is allowed.

Hierarchical model linear modeling (HLM) also known as multi-level anal-

ysis, is a more advanced form of multiple linear regression. ANOVA with

random effects is a simple example of hierachical linear model. Multilevel

analysis allows variance in outcome variables to be analysed at multiple hier-

Page 116: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

116

archical levels, whereas in multiple linear regression all effects are modelled

as occurring at a single level. Thus, HLM is appropriate for use with lake

water quality data which are nested within lake types or ecoregions.

MCMC sampling Markov chain Monte Carlo sampling is a stochastic algorithm

for drawing samples from a posterior distribution so as to obtain an estimate

of the distribution. MCMC generetes samples from an unknown probability

distribution that is known up to a normalizing constant. Typical example is

the posterior distributions of model parameters. As the value of the unknown

constant can be given as multidimensional integral, MCMC algorithm can

also be seen as a way to evaluate high dimensional integrals, a task which is

computationally very demanding by any other means.

Mechanistic model is a tool for water quality prediction. Mechanistic models

were first constructed in the 1970’s according to a causal understanding of the

phenomenon concerned and a mathematical process description. They were

sometimes accompanied by least-squares parameter estimates, approximate

first-order error analysis, Monte Carlo analysis or Kalman filtering. The

error term attached to the model was usually neglected in prediction, and

the lack of proper error estimates was compensated for by a comprehensive

mathematical process description.

Model calibration or fitting includes the selection of the model (its functional

form), the estimation of the model parameters as well as the errors, and their

validation. It is a part of the inferential statistics used to model patterns in

data, account for randomness and draw inferences regarding larger popula-

tions. In classical inferential statistics, point estimation involves the use of

sample data to calculate a single value which is to serve as a best guess for

an unknown population parameter. Point estimation should be contrasted

with Bayesian methods of estimation, where the goal is usually to compute

posterior distributions for the parameters and other quantities of interest.

The contrast here is between estimating a single point (point estimation),

Page 117: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

117

versus estimating a probability density function.

Posterior distribution The posterior probability distribution (or posterior prob-

ability density) is the entity for which an MCMC analysis attempts to obtain

an estimate. The posterior distribution is the probability distribution over

the parameter state space, given the data in the chosen model.

Posterior predictive distribution is a posterior distribution on model predic-

tions given previous observations. It reveals all sources of uncertainty in

water quality prediction and can be simulated using Monte Carlo methods

and based on the MCMC chain of model parameters and on the statistical

distribution of observed control variables.

Posterior simulation entails performing repeated predictions with sampled pa-

rameter values from the posterior distribution and the distributions of mea-

sured environmental conditions. Posterior simulations of the effects of vari-

ous environmental conditions, i.e. the control variables of the lake model, are

valuable in river basin management.

Prediction Predicting a dependent variable using other explanatory descriptors

which can be manipulated experimentally, or which naturally exhibit envi-

ronmental variation. A predictive model is structured according to causal

relationships and process descriptions based on ecological theory and exper-

imental or observational data. In contrast, forecasting a dependent variable

using other explanatory descriptors is solely based on the extrapolation of

ecological structures in space and time and does not have to be based on

any law of nature and may be ecologically meaningless. Forecasting may still

be useful, although prediction is ideally based on causal relationships among

small number of descriptors.

Prior distribution The prior probability distribution is the probability distribu-

tion over the parameter space prior to seeing the data. This represents the

prior assumptions made about the probabilities of different parameter values

Page 118: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

118

before the data have been analysed. The prior distribution is combined with

the likelihood to yield the posterior distribution.

River basin management A river basin is managed as a natural geographi-

cal and hydrological unit instead of according to administrative or political

boundaries. Under the EU Water Framework Directive a management plan

needs to be established for every river basin and updated every six years.

River basin management plan This is a detailed account of how the objec-

tives set for a river basin (ecological status, quantitative status, chemical

status and protected area objectives) are to be reached within the time scale

required. The plan should include the characteristics of the river basin, a

review of the impact of human activity on the status of the water in the

basin, estimates of the effects of existing legislation, the remaining ”gap” to

be closed in order to meet these objectives; and a set of measures designed

to fill that gap. Public participation is essential, i.e. all interested parties

should be fully involved in the discussion of the cost-effectiveness of the vari-

ous possible measures and in the preparation of the river basin management

plan as a whole.

Sampling is the main function of an MCMC run. An MCMC analysis generates

a series of samples from the posterior distribution. Selection of a suitable

sample for study or the act of measuring are also called sampling.

Statistical model is a parametrized set of probability distributions which can be

used for statistical inference in river basin management.

Target pollutant load is the flux of a polluting substance into a lake or a river

that has a given probability of protecting a selected water quality standard.

It should ideally be estimated using statistically designed observational data,

a water quality model and inferential statistics.

Validation Runs with data that located outside the range of variation of the

calibration data are used to confirm a model and to reveal structural errors

Page 119: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

119

and limitations in it. In Bayesian analysis, the prior distribution is combined

with data to calculate the posterior distribution, from which inferences about

the parameters are made. The postulated probability model is never expected

to be entirely true but is chosen in the light of the available knowledge and

constructed with the simplest possible structure. It must therefore be tested

at each step in the investigation. Residual quantities are calculated and

sensitivity to prior distributions is tested in order to evaluate the probability

model critically and to suggest modifications.

Water frame work directive ”Directive 2000/60/EC of the European Parlia-

ment and of the Council establishing a framework for Community action in

the field of water policy” or in short, the EU Water Framework Directive

(WFD) was adopted on 23 October 2000, with the following key aims:

• to expand the scope of water protection to all waters, surface waters

and groundwater

• to achieve ”a good status” for all waters by a set deadline

• to implement water management based on river basins

• to introduce a ”combined approach” laying down emission limit values

and quality standards

• to involve citizens more closely

• to streamline the legislation

• to implement river basin management with reasonable costs.

Water quality criteria can be used to define a water quality standard, e.g. for

protection against pollutants with potential ecological effects. Biological cri-

teria, for example, describe the desired aquatic community for a water body

based on the numbers and kinds of organisms expected to be present. Nutri-

ent criteria are used to protect against nutrient over-enrichment and cultural

eutrophication. Sediment criteria describe the conditions necessary in order

to avoid the adverse effects of contaminated sediments.

Page 120: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

120

Water quality standard form the foundation of water quality-based pollution

control. They define the goals for a water body by designating its uses, setting

criteria for protecting those uses and establishing provisions for protecting it

from pollutants.

Page 121: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

121

Summary

River basin plans in the member states of the European Union are to be updated

every six years. To complete this enormous task efficiently, accurate and precise

water quality predictions and realistic error estimates have to be employed. These

will provide a better insight into the fate and influence of the pollutants for the

designing, operation and optimization of river basin management.

Water quality prediction has traditionally been based either on mechanistic or statis-

tical prediction models. Mechanistic models stand in for hydraulics, while statistics

are mainly used for biological and chemical processes. The statistical error analysis

applied to the mechanistic models using least-square parameter estimation and first-

order error analysis was only approximative, however, and unrealistic. This meant

that reconciliation of the methodologies was inefficient.

This thesis attempts to estimate the error in water quality predictions realistically

and to unify the mechanistic and statistical prediction methods using Bayesian pos-

terior predictive inference and MCMC sampling methods. By the same token, it

alters the paradigm of prediction and decision making from deterministic to sta-

tistical. These methods proved to be useful in the real time control of artificial

oxygenation devices and thus anticipated the efficiency of such an approach for the

adaptation of river basin plans every six years.

Water quality predictions are usually based either on a longitudinal lake-specific

sample or a cross-sectional sample from many lakes. Existing Finnish lake mon-

itoring data are a mixture of longitudinal and cross-sectional data. Lake-specific

predictions based on such data tend to be imprecise or inaccurate. This deficiency

was compensated for by using Bayesian inference methods and hierarchical models,

which enabled cross-sectional water quality data to be pooled efficiently in order to

Page 122: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

122

ensure more accurate and precise lake-specific chlorophyll a prediction.

The evaluation of mechanistic, statistical and Bayesian prediction methods was

based on extensive data from 5 water quality management cases. First, the chem-

ical and biological responses to pollutant loads and hydrological conditions were

modelled and predicted with a mechanistic lake and river model. Second, predictive

uncertainties in lake respiration and phytoplankton submodels were estimated using

Bayesian inference and MCMC sampling methods. This enabled deterministic wa-

ter quality predictions to be transformed into predictive distributions, which were

more useful for statistical decision making in the context of river basin management.

Third, targets were set for pollutant load reductions for the lakes studied here and a

criterion for the restoration dredging of contaminated river sediments based on the

predictions.

The main findings were:

• Realistic error estimation is a prerequisite for realistic decision making and

effective river basin management.

• Realistic estimates of the error entailed in mechanistic water quality pre-

dictions can be obtained using Bayesian posterior predictive inference and

MCMC sampling methods.

• The accuracy and precision of lake-specific chlorophyll a predictions based

on data from the Finnish lake monitoring network can be improved using a

hierarchical model structure.

• Bayesian inference and MCMC methods are no more difficult to implement

than classical statistical methods. Even models with a large number of cor-

related parameters can be fitted using modern computational methods.

• Simple empirical models are efficient for river basin management, indicating

Page 123: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

123

that complex mechanistic models are unreasonably difficult and expensive to

apply.

Guidelines for water quality monitoring, prediction and river basin management

were set up to cope with a large number of lakes and rivers using relatively small

sample sizes. River basin monitoring should be designed statistically using the pre-

diction error of the water quality model as an objective function. This will maximize

the information value of water quality observations and minimize the prediction er-

ror. In addition, national networks for monitoring diffuse pollutant loads should be

established instantly in order to meet the pressing needs of river basin management.

Without determined monitoring efforts, water quality predictions will be biased and

river basin management may fail to maintain the sustainable use of water resources.

Prediction should ideally be implemented using Bayesian inference, MCMC meth-

ods and a simple hierarchical model. The complexity of existing mechanistic water

quality models should be simplified to reduce their computational costs and large

data requirements. River basin management decisions should be based on a method

of statistical inference that takes account of all the prediction errors. This will en-

able the progress being made towards the sustainable use of water resources to be

efficiently maintained.

Page 124: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

124

Yhteenveto

Huonokuntoisten vesistoalueiden hoitosuunnitelmat tullaan Euroopan unionin jasen-

valtioissa tarkistamaan kuuden vuoden valein. Tahan niita velvoittaa vesipuitedi-

rektiivi, joka hyvaksyttiin europarlamentissa 22. joulukuuta vuonna 2000. Sen seu-

rauksena Suomessakin vahvistettiin vuonna 2004 laki vesistoalueiden hoidon organ-

isoimisesta. Hoidettavia, huonokuntoisia vesistokohteita on satoja, ja niiden hoito-

suunnitelmat pitaa olla valmiina vuonna 2009. Hyva vedenlaatu naissa kohteissa

saavutetaan nopeimmin ja pienimmin kustannuksin, jos suunnittelussa kaytetaan

vedenlaadun ennusteita, joidenka ennustevirheet on arviointu realistisesti.

Perinteisesti vedenlaatuennusteet ovat perustuneet joko mekanistiseen tai tilastol-

liseen mallintamiseen. Laskentaintensiivisia mekanistisia malleja on kaytetty paaasi-

assa vesistojen virtaus- ja kulkeutmisongelmien ratkaisemiseen, kun tilastollisia menetelmien

kaytto on painottunut kemiallisten ja biologisten ilmioiden analysoimiseen. Mekanis-

tisten ja tilastollisten menetelmien yhdistaminen on ennusteiden realistisen vir-

hearvioinnin ja vesistojen hoidon tehokkuuden kannalta ensiarvoisen tarkeaa.

Tassa tyossa mekanistinen ja tilastollinen lahestymistapa yhdistettiin kayttaen Bayeslaista

posterior paattelya ja MCMC menetelmia. Saman aikaisesti ennustamisen ja vesisto-

jen hoitoon liittyvan paatoksenteon periaateet muuttuivat deterministisesta tilastol-

liseksi.

Vedenlaatuennusteet perustuvat yleensa joko pitkittaiseen tai poikittaiseen havainto-

otokseen ts. havaintoihin yhdesta tai monesta vesistosta. Sen sijaan jarvien seuranta-

aineistolle on tyypillista pieni jarvikohtainen havaintomaara suuresta joukosta jarvia.

Talloin jarvikohtaiset ennusteet ovat epatarkoja tai virheellisia. Ennusteiden laatua

saatiin parannettua hierarkisen mallirakenteen ja Bayeslaisin paattelymenetelmien

avulla.

Page 125: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

125

Kehitetyt mekanistiset, tilastolliset ja Bayeslaiset vedenlaadun ennustemenetelmat

testattiin aineistolla viidesta vedenlaadun hoitotapauksesta. Ensiksi mekanistisilla

jarvi- ja jokimalleilla ennustettiin ravinnekuormitusten, kunnostusruoppauksen ja

hydrologisten osoluhteiden vaikutus vedenlaatuun. Seuraavassa vaiheessa jarven

happi- ja kasviplankton mallien ennustevirheet estimoitiin Bayes-paattelyn ja MCMC-

menetelman avulla. Nain mekanististen mallien pistemaiset ennusteet muutettiin

tilastollisiksi jakaumiksi, jotka ovat hyodyllisia vesistonhoidon tilastollisessa paatok-

senteossa. Lopuksi ennusteiden perusteella laskettiin kuormitusten ja pitoisuuksien

vahennystavoitteita ja asetettiin rajoitus kunnostusruoppauksen yhteydessa liikkelle

lahtevan ja dioksiinin likaaman sedimentin maaralle.

Tarkeimmat loydot olivat:

• Vedenlaatuennuste realistinen virhe-estimaatti on tehokkaan vesistonhoidon

edellytys.

• Mekanistisen mallin realistinen virhe-estimaatti voidaan laskea Bayes-paattelyn

ja MCMC-menetelman avulla.

• Suomalaiseen jarviseuranta aineistoon perustuvan jarvikohtaisen klorofylli a

ennusteen virhetta ja epatarkkuutta voidaan edelleen pienentaa hierarkisen

mallirakenteen avulla.

• Bayes-paattelyn ja MCMC-menetelman laskennallinen toteuttaminen ei ollut

vaikeampaa kuin klassisten tilastomatemaattisten menetelmien toteuttami-

nen. Jopa suuren maaran korreloituneita parametreja sisaltava vedenlaatu-

malli saatiin sovitettua havaintoaineistoon.

• Yksinkertaisen empiirisen mallin tehokkuus vedenlaadun ennustamisessa ja

vesistonhoidon suunnittelussa osoittaa, etta monimutkaiset mekanistiset mallit

voivat joskus olla kalliimpia ja tyolaampi kuin tarpeellista.

Page 126: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

126

Tyossa annettiin yleisluonteisia ohjeita vedenlaadun seurantaan ja ennustamiseen

seka vesistoalueiden hoitoon. Ohjeissa kiinnitettiin huomiota erityisesti siihen, kuinka

menetellaan, jos suurta maaraa jarvi tai jokia halutaan hoitaa pienen havaintoaineis-

ton antaman informaation perusteella. Tehokkaimmillaan ennustaminen voidaan

tehda Bayes-paattelyn, MCMC -menetelman ja hierarkisen mallirakenteen avulla.

Vedenlaadun havainnoinnin suunnittelu pitaisi perustua tilastolliseen paattelyyn

ja ennustemallin virheiden minimoimiseen. Lisaksi hajakuormituksen kansallinen

havaintoverkko pitaisi valittomasti perustaa ja liittaa toiminnallisesti yhteen veden-

laadun ennustamisen seka vesistonhoitotoimien suunnittelun ja toteutuksen kanssa.

Nain varmistetaan, etta havaintoaineistojen informaatiosisalto on riittavan suuri ve-

denlaadun ennustamisen ja vesistojen hoitotoimien suunnittelun kannalta. Vesien-

hoidon paatoksenteko pitaisi tehda tilastollisin perustein, jotka ottavat huomioon

kaikki havainto- ja ennustevirheet. Nain vesistojen hoito saavuttaa mahdollisim-

man tehokkaasti kohden vedenlaatutavoitteita ja yllapidetaan vesivarojen kestavaa

kayttoa.

Page 127: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

125

I

Publication I

Malve, O., Huttula, T. and Lehtinen, K. 1991. Modelling of Eutrophication andOxygen Depletion in the Lake Lappajarvi. In: Wrobel, L., Brebbia, C.(Eds.), Wa-ter Pollution: Modelling, Measuring and Prediction. Computational MechanicsPublications, pp. 111–124.

c© 1991 Computational Mechanics Publications

Page 128: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

126

Page 129: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 130: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 131: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 132: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 133: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 134: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 135: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 136: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 137: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 138: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 139: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 140: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 141: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 142: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships
Page 143: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

127

II

Publication II

Malve, O., Salo, S., Verta, M. and Forsius, J. 2003. Modelling the transport ofPCDD/F compounds in a contaminated river and possible influence of restorationdredging on calculated fluxes. Environmental Science and Technology, Vol. 37(15),pp. 3413–3421. DOI: 10.1021/es0260723

c© 2003 ACS Publications

Page 144: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

128

Page 145: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Modeling the Transport of PCDD/FCompounds in a Contaminated Riverand the Possible Influence ofRestoration Dredging on CalculatedFluxesO L L I M A L V E , * S I M O S A L O , A N DM A T T I V E R T A

Finnish Environment Institute, P.O. Box 140,FIN-00251, Helsinki, Finland

J O H N F O R S I U S

Fortum Engineering Ltd, P.O. Box 10, 00048 Fortum, Finland

River Kymijoki, the fourth largest river in Finland, hasbeen heavily polluted by pulp mill effluents as well as bychemical industry. Loading has been reduced considerably,although remains of past emissions still exist in riversediments. The sediments are highly contaminated withpolychlorinated dibenzo-p-dioxins (PCDDs), polychlorinateddibenzofurans (PCDFs), polychlorinated diphenyl ethers(PCDEs), and mercury originating from production of thechlorophenolic wood preservative (Ky-5) and other sources.The objective of this study was to simulate the transportof these PCDD/F compounds with a one-dimensional flowand transport model and to assess the impact of restorationdredging. Using the estimated trend in PCDD/F loading,downstream concentrations were calculated until 2020. Ifcontaminated sediments are removed by dredging, thetemporary increase of PCDD/F concentrations in downstreamwater and surface sediments will be within acceptablelimits. Long-term predictions indicated only a minor decreasein surface sediment concentrations but a major decreaseif the most contaminated sediments close to the emissionsource were removed. A more detailed assessment of theeffects is suggested.

IntroductionRiver Kymijoki, the fourth largest river in Finland, has beenheavily polluted by pulp mill effluents as well as by thechemical industry. Loading has been reduced considerably,although remains of past emissions still exist in riversediments. The objective here was to model the transport ofsediments and dioxins and to assess the impact of restorationdredging on sediment and contaminant transport.

During the 1990s, the sediments were recognized to behighly contaminated with polychlorinated phenols (PCPs),polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinateddibenzofurans (PCDFs), polychlorinated diphenyl ethers(PCDEs), and mercury (Hg) originating from production ofthe wood preservative Ky-5, chloralkali processes, and othersources (1, 2). High toxicity of sediment to exposed micro-organisms and high frequencies of mentum deformities in

midge (Chironomus spp.) larvae populations were measuredin areas with high pollutant concentrations in sediment (2).Certain PCDE congeners as well as hepta- and octachlori-nated dibenzofurans, all typical for Ky-5, predominated inriver sediments indicating that the production of thisfungicide was the main source of contaminants in the riversystem (1, 3).

Production of Ky-5 in Kuusankoski (Figure 1) was begunin 1939. In all 24 000 t of Ky-5 was manufactured from 1940to 1984, from which an unknown amount of the product andimpurities entered the river and finally the Gulf of Finland.The composition of the product and its impurities have beenanalyzed (4-7). The product consisted mainly of PCPs,PCDDs, and PCDFs; heptachlorinated dibenzofurans espe-cially occurred as impurities. Toxic substances were releasedinto the river in connection with washing of productioninstruments and with an explosion accident and resultingfire fighting in the plant at 1960. Estimates of the total amountsof combined PCDDs and PCDFs in river and marinesediments, based on sampling and echo sounding of loosecontaminated sediments, range from 4 000 to 5 000 kg ofPCDD/Fs [16-21 kg as international toxicity equivalents (I-TEQ); the toxicity of a mixture of various PCDD/F congenersis expressed as the toxicity equivalent of 2,3,7,8-tetrachlo-rodibenzo-p-dioxin] in the contaminated area (3). Thiscorresponds to the amount of 2,3,7,8-tetrachloro-p-diben-zodioxin emitted into the atmosphere over Seveso in 1976(8). Recently, it was estimated that the clearly river-impactedsedimentation area in the Gulf of Finland stretches for adistance of 75 km from the estuary (9). The total load to theGulf of Finland attributed to the Ky-5 source was 1770 kg ofPCDD/Fs or 12.4 kg WHO-TEQ. The surface sediments inthe impacted area still contained 24-66% of the maximumconcentrations present in the 1960-1970s depending on thesite and sediment profile, showing that the river remains asignificant PCDD/F source (9).

MaterialsStudy Area. The study area was a 130-km-long river stretchwith branches between Lake Pyhajarvi and the Gulf ofFinland,with the lake occupying only 2.1% of the area (Figure1). There are 11 power plants and 6 rapids in the river stretch.The total drop is 50 m, and the mean bottom slope is small(0.0006). The drainage area of the Kymijoki River is 37 200km2 (lake percentage 18%) with only 3% (1 100 km2) runningdirectly into the studied river stretch. Accordingly, 97% ofthe water running in the river stretch comes from upstreamsources. The mean discharge at the downstream end of theriver was 330 m3 s-1. Loading of plant nutrients and suspendedsolids (SS) originates from 8 industrial wastewater treatmentplants, some tributaries, and diffuse nonpoint sources.

Physical Properties of Bottom Sediments. The riverbottom in the area consisted mainly of transport or erosionsites, which contain noncohesive soil or solid clay and silt.In the expansions of the river there were sedimentation pools,which were the main traps of PCDD/F compounds. Theircombined area is small as compared with that of the transportor erosion site area. The bottom material in sedimentationareas was usually gyttja, clay, and silt with varying composi-tion. Abundant wood debris and fibers originating from thepulp and paper industry have been found in the mostcontaminated areas in the upstream and central parts of theriver stretch. Large amounts of timber were earlier transportedin the river and kept for landing, causing unknown amountsof wood debris to enter the system. The loading of organicparticulate material has decreased notably from the level

* Corresponding author e-mail: [email protected]; phone:+358-9-40300359; fax: +358-9-40300-391.

Environ. Sci. Technol. 2003, 37, 3413-3421

10.1021/es0260723 CCC: $25.00 2003 American Chemical Society VOL. 37, NO. 15, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 9 3413Published on Web 07/01/2003

Page 146: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

that prevailed from the 1950s to the early 1980s. It appearsthat contaminated organic particulate materials accumulatedearlier in the main sedimentation pool in Kuusankoski. Atpresent, sediment is slowly decomposing, eroding, andmigrating downstream. Transported sediment with highestsettling velocity has accumulated into downstream sedi-mentation pools whereas smaller particles have migrated tothe estuarine and the sea area. Because of implementedhydrological regulation at power plants, sediments have notbeen exposed to high floods, and discharges have notincreased. In the future, water construction projects andchanges in river regulation can cause a risk of mobilizationof PCDD/F compounds.

Historical Loading Records of PCDD/F Compounds.Variation in PCDD/F concentration in an age-determined(210Pb dating) sediment profile from the river estuary (Ah-venkoskenlahti) and in historical production records of Ky-5followed similar trends (unpublished data). PCDD/F con-centrations in the sediment layers corresponding to the yearsfrom 1959 to 1969 increased 4-fold while Ky-5 productionincreased only 3-fold during the same period. MaximumPCDD/F concentration in the estuary occurred in a sedimentlayer corresponding to the years 1966-1972, whereas maxi-mum Ky-5 production occurred during the 1970s. The

explosion accident and fire in the Ky-5 plant occurred in1960, which probably caused exceptionally high spillage ofimpurities at that time and subsequently contributed to thecontaminant profile in river and estuarine sediments. Afterthe closing down of the plant in 1984, the PCDD/F con-centration in the sediment has remained almost unchangedwith only a slight decrease. At present, the PCDD/F con-centration in the surface sediment at this site is only about25% lower than the maximum concentration in the sedimentlayer corresponding to the average for 1969. EvidentlyPCDD/F compounds are continuously transported down-stream from the most contaminated area in Kuusankoski.This is also indicated by the PCDD/F concentrations insediment traps collected in 1997-1998 from four sites alongthe river showing almost equal concentrations of contami-nants as in the surface sediment at each site (3).

Location of the Most Contaminated Bottom Sediments.The most contaminated sediments with the maximumsurface (0-3 cm) concentration of 193 000 ng g-1 (350 ng g-1

I-TEQ) for PCDD/Fs, 1017 ng g-1 for PCDEs, and 13.8 ng g-1

for Hg in the dry sediment are located between Kuusankoskiand Keltti (Figures 1 and 2). From Kuusankoski to Anjalan-koski (33 km), the maximum concentration lies between 3400and 190 000 ng g-1 (9.6-350 ng I-TEQ g-1). Further down-stream the concentration ranges from 120 to 1200 ng g-1

(0.5-4.3 ng I-TEQ g-1). In the estuarine and coastal areas,the range is 1-53 ng g-1 (0.01-0.2 ng I-TEQg-1). Theestimated total amount of contaminated sediments andPCDD/F compounds between Kuusankoski and Anjalankoskiis 1 052 170 m3 and 2377 kg, respectively (i.e., roughly halfof all PCDD/Fs in the river). The measured PCDD/Fconcentration in SS accumulated in sediment traps decreasesexponentially from 21 900 ng g-1 (dry weight) to 228 ng g-1

in a longitudinal direction from Kuusankoski to the Gulf ofFinland (3).

Analysis of Water and Sediments and Processing ofHydrological and Chemical Background Data Sets. Dailydischarge, water level, runoff, and concentrations of sus-pended solids (SS) from several points on the river wereavailable in a hydrological and water quality database of theFinnish Environment Institute (SYKE). Monthly values forSS from the industrial effluent point loading were collectedfrom sewage treatment plants. Nonpoint loading was esti-mated from continuous runoff data and weekly sampledwater quality data from two small representative catchments(30 and 178 km2) in the drainage area (1100 km2). A meandischarge in the upstream end of the modeled river stretchwas 336 m3 s-1. Runoff from the drainage basin was 9 l s-1

km-2. The mean concentrations of SS in the upstream anddownstream ends of the modeled river stretch were 2.4 and5.0 mg L-1, respectively. Average SS fluxes from upstream,from industrial plants, from the drainage basin and to Gulfof Finland were 24 000, 7700, 29 400, and 61 100 t a-1,respectively. Variation in fluxes was considerable in someyears due to rainfall (Figure 3).

MethodsModeling of Sediment and PCCD/F Transport in theKymijoki River. River hydraulics and sediment transport ofthe 130 km long river stretch with 21 branches between LakePyhajarvi and the Gulf of Finland (Figure 4) were modeledwith a one-dimensional (1-D) river model. The model wasused to calculate time-series and longitudinal profiles of SSand PCDD/F concentrations in river water and bottomsediment. The results were used for evaluating the impactof dredging on transport of PCDD/F compounds.

In the 1-D unsteady river flow model, the full de SaintVenant equations (eq 1) were solved numerically with a finite

FIGURE 1. Map of the study area.

3414 9 ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 15, 2003

Page 147: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

difference method (10) in which Verwey’s variant of thePreissmann implicit discretization scheme was used. Theequations were solved with the double-sweep method, andthe resistance term was calculated using the Manningapproach, with the Manning number as an empiricalconstant. External boundary conditions that could be appliedincluded discharge and water level as tabulated functions oftime and discharge as a tabulated function of water level (aQ-y relationship). The model can be applied to a river withseveral dams or power plants in a row and to a tree-like orlooped branching system. Inflow can consist of main river,

tributaries and lateral inflow:

where y ) water level (m) measured from a reference height,

FIGURE 2. Sediment quality between Kuusankoski and Keltti powerplants and horizontal distribution of contaminated mud sedimentsand measured flow fields in the upstream heavily contaminatedarea.

FIGURE 3. Fluxes of suspended solids in the modeled river stretchin 1980-1995.

FIGURE 4. Schematic map of modeled river stretch.

∂y∂t

+ 1b

∂Q∂x

) q

∂Q∂t

+ ∂

∂xâQ2

A+ gA

∂y∂x

+ gAQ|Q|K 2

) 0 (1)

VOL. 37, NO. 15, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 9 3415

Page 148: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Q ) discharge (m3 s-1), b ) width (m) of the river channel,q ) direct inflow from the catchment area to the river channel(m3 s-1), A ) cross section area (m2) of the river channel, g) the acceleration of gravity (m s-2), K ) water-carryingcapacity of the river channel ) n-1 AR2/3, n ) Manningcoefficient, R ) hydraulic depth (≈A/b), and x ) longitudinalcoordinate.

The 1-D sediment and contaminant transport model wasused to calculate the convection, dispersion, sedimentation,and erosion of SS and PCDD/F with unsteady flow (10); themodel was linked with the flow model. Mass flow of SS andpollutants into the river could be added as point or diffusivenonpoint loading. Convection, dispersion, sedimentation,and erosion of sediment and contaminant were solvednumerically from the convection and dispersion equation(eq 2) with a double-sweep method. The boundary conditionrequired was the known concentration value at the upstreamend of each river stretch. The concentration in tributariesand in point and nonpoint loading was given as a tabulatedfunction of time:

where C ) concentration of suspended solids (SS) (mg L-1)in the river water, Dx ) dispersion coefficient (m2 s-1), CL )concentration of SS (mg L-1) in the direct inflow from thecatchment area, D/h ) sedimentation term (mg L-1 s-1), E/h) erosion term (mg L-1 s-1), D ) sedimentation rate (mg m-2

s-1), h ) water depth (m), and E ) erosion rate (mg m-2 s-1).Sedimentation rate of SS in river water was calculated as

a function of shear stress and settling velocity (eq 3):

where τ is shear stress between water and bottom (N m-2),τd is critical shear stress (there is no sedimentation if τ > τd),ws is settling velocity (m s-1), H ) h/2 is the mean height offalling (m), h is the water depth (m), D is the sedimentationrate (g m-2 s-1), and C is the concentration of SS (mg L-l) orother concentration.

Erosion of bottom sediments was calculated as a functionof shear stress (eq 4):

where τe is the critical shear stress (there is no erosion if τ< τe), K2 is the erosion coefficient, and E is the erosion rate

(g m-2 s-1). Sedimentation and erosion cannot occursimultaneously (τd < τe).

Shear stress was calculated using

where F is density of water (1000 kg m-3), g is the accelerationof gravity 9.81 m s-2, v is the flow velocity (m s-1), h is thewater depth (m), and M is the Manning number.

The bottom sediment was divided into four layers withdiffering values of consolidation time TC and τe. The valuesfor these constants were selected according to analyzedsediment properties. The calculated value of shear stress τand selected values for critical shear stress τe and τd wereused to determine the presence of erosion or sedimentation.The mass of sediment in layers i ) 1, ..., 4 and in cross sectionj was integrated from the mass balance equation (eq 6). Ifthe mass in the topmost layer decreased, the modeldetermined the level of activity from the next lower layer. Itwas assumed that the material in a layer was transferred toa lower layer with a rate corresponding to the consolidationtime:

where mi ) mass of sediment (g m-2) in sediment layer i, TC,i

) consolidation time of a sediment layer i (s), i ) sedimentlayer number, and j ) cross section number.

Govers and Krop (11) have used a thermodynamic latticemodel to determine subcooled liquid vapor pressure andaqueous solubility, Henry law constant, n-octanol-water andsediment-water partition coefficients, and lipid weightbioconcentration factors for 210 PCDD/Fs. The resultsconfirmed that PCDD/Fs are poorly water soluble and thatthe solubility decreases with increasing chlorination level.The substances are increasingly sediment bound withincreasing chlorination level. With these statements, PCDD/Fcompounds were assumed to migrate adsorbed on particulatematter, and the same convection and dispersion equation(eq 2) and sediment mass balance equation were used tocalculate the transport of PCDD/F compounds.

TABLE 1. Scientifically Reported Sediment Parameter Values and Ranges Used in the Modela

value

parameter unit avg min max ref

ws m s-1 5 × 10-5 b 1 × 10-4 1 × 10-5 12, 13, 181 × 10-5 c

τd N m-2 0.084 0.1 9 × 10-4 (7 × 10-3)d 14-16, 18K2 g m2 s-1 0.1 0.01 0.5 14, 18τe N m-2

1, layer 0.5 3 1 × 10-3 (8 × 10-3)d 14-16, 182, layer 1.0 3 1 × 10-3 (8 × 10-3)d 14-16, 183, layer 2.0 3 1 × 10-3 (8 × 10-3)d 14-16, 184, layer 10 10 10TC,i d1, layer 12, layer 303, layer 3654, layer >365

a Dispersion coefficient (Dx) was 20 m2 s-1, and calculation time step was 12 h. Parameter ranges were arranged to produce minimum andmaximum SS concentration in the river water. b Calibrated value. c Calibrated settling velocity in Lake Tammijarvi. d The values in brackets wereused to evaluate the sensitivity of calculated PCDD/F concentration in bottom sediments (Figure 12c).

∂C∂t

+ QA

∂C∂x

- ∂

∂x(Dx∂C∂x) ) q

A(CL - C) - D

h+ E

h(2)

D/h ) -(1 - τ/τd)wsC/H if τ<τd (3)

E/h ) K2(1 - τe/τ) if τ > τe (4)

τ ) Fgv2/(M2h1/3) (5)

dmi,j

dt) Dj - Ej - 1

TC,imi,j i ) 1 (6a)

dmi,j

dt) Dj - Ej + 1

TC,i-1mi-1.j - 1

TC,imi,j i ) 2, 3, 4

(6b)

mi g 0 always, Ei+1 ) Di+1 ) 0, if mi > 0

3416 9 ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 15, 2003

Page 149: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Parameter Ranges. Transported sediment consists mainlyof clay, silt, wooden debris, cellulose fibers, and humus.Average value and range for settling velocity (Table 1) wereobtained from laboratory experiments conducted withsamples taken from Myllykoski (Figure 1), and they werefurther specified with the literature (12, 13, 18) and calibration.Average values of critical shear stress (τe and τd) and theirpossible ranges were also obtained from the literature (14-16, 18). The τd value selected was 0.084 N m-2. Erosioncoefficient K2 was fixed to 0.1 g m-2 s-1 (14, 17, 18). Theconsolidation time TC,i in the first to fourth layers were 3, 30,365, and >365 d, respectively. The selected τe values ofconsolidating sediment were 0.5, 1.0, 2.0, and 10.0 N m-2,respectively. Flow velocities corresponding to selected τe andτd values were found to be normal. When considering themorphological dimensions and sediment characteristics ofRiver Kymijoki and the given values of critical τ, calculatedsedimentation was above zero if flow velocity was <10 cms-1. Calculated erosion of the first, second, and third sedimentlayers was initiated if flow velocity was over 20, 30, and 40cm s-1, respectively (Figure 5). Manning coefficients (Table2) originated from previous efforts to study and model riverice formation and nutrient transport in the Kymijoki River.

Data Availability. The needed hydrologic and hydraulicdata (water level and discharge) to perform model calcula-tions were observed at least daily or even more frequentlyin power plants and rapids. SS concentrations upstream andalong the studied river stretch were observed frequentlyenough for calibration. Observations of direct runoff andcorresponding SS concentration did nor cover the wholecatchment. Transported sediment was sampled with sedi-ment traps in six locations, and PCDD/F concentrations insediment were analyzed to be used in model verification.Direct measurements of historical PCDD/F loading fromKuusankoski were not available. The amount and variationof historical PCDD/F loading from Ky-5 plant and fromeroding sediments in Kuusankoski were estimated from a

bottom sediment sample originating from the downstreamend of the river in Ahvenkoskenlahti (Figure 1) underfollowing assumptions: (i) Ahvenkoskenlahti and Tammijarviare so near each other that there is a time invariant relationR′ between PCDD/F concentration in age-determined sedi-ment slices in Ahvenkoskenlahti and in temporally corre-sponding Tammijarvi water samples. (ii) With the river model,PCDD/F concentration in Tammijarvi can be calculated withgiven boundary conditions and with PCDD/F loading fromKuusankoski. The estimation proceeded in the followingsteps: (i) The PCDD/F loading from Kuusankoski in 1997(0.174 kg d-1) was calculated as the product of the observedPCDD/F concentration of the sediment sample trapped inKeltti, observed SS concentration in river and discharge. (ii)The PCDD/F concentration in Tammijarvi was calculatedwith the given loading and the model. (iii) The ratio R′ wascalculated using PCDD/F concentration in surface sedimentslice in Ahvenkoskenlahti (52.89 ng g-1) and the temporallycorresponding average concentration in Tammijarvi calcu-lated with the river model (1848 ng g-1). (iv) The temporalvariation of PCDD/F concentration in Tammijarvi wascalculated as a product of R′ and PCDD/F concentration ofage-determined sediment slices. (v) The temporal variationof PCDD/F loading from Kuusankoski was estimated byoptimizing the loading with a trial-and-error method so thatthe difference between PCDD/F concentrations in Tammi-jarvi, as calculated in step iv, and with the model wasminimized. During the early calibration period (1970), thePCDD/F loading was 237 kg a-1. The exponential regressionline was fitted to the estimated loading data (Figure 6). Futuretrends describing the most probable routes of recovery wereobtained from the extrapolated line.

Model Confirmation. Parameter values and ranges ob-tained from the literature and from laboratory sedimentationexperiment gave feasible model results that were correctwithin an order of magnitude. Settling velocity (ws) of SS wascalibrated with observations from 1980 to 1996. Calculatedand observed SS concentrations were compared graphically.Observations from year 1997 were used for model verification.Calculated and observed SS concentrations in the river andPCDD/F concentrations in the river and sediment trapsamples were compared. Finally, the observed and calculatedfluxes of PCDD/F were compared. Model sensitivity tosediment parameters was studied. Parameter values werevaried between scientifically reported ranges (Table 1).Simulations were done with parameter values that wouldresult in average, minimum, and maximum SS concentrationin the river. Simulations that were relevant for assessmentof dredging projects were compared graphically. Sensitivityanalyses performed with parameter ranges give an outlookover model uncertainty.

FIGURE 5. Rate of sedimentation ) S/G1 (cm a-1) and erosion )E/GI (cm d-1) calculated with mean hydraulic dimensions of theKymijoki River as a function of flow velocity. Density of the sedimentlayer G1 ) G2 ) 1330 kg m-3, G3 ) G4 ) 1990 kg m-3.

TABLE 2. Calibrated Manning Coefficients (n) in RiverBranches (no.)

no. n no. n no. n no. n no. n

1 0.020 6 0.053 11 0.050 16 0.100 21 0.0402 0.050 7 0.043 12 0.036 17 0.1433 0.040 8 0.037 13 0.033 18 0.0404 0.040 9 0.036 14 0.033 19 0.0405 0.050 10 0.050 15 0.100 20 0.040

FIGURE 6. Estimated loading of PCDD/F compounds (kg d-1, totalmass) from Kuusankoski-Keltti area in 1969-1994 and exponentialregression curve with 95% confidence limits as an estimate forextrapolated evolution of loading.

VOL. 37, NO. 15, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 9 3417

Page 150: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Model Application. The effects of dredging and removalof contaminated sediments at Kuusankoski were examinedbased on two responses during 2000-2020: immediateincrease in SS and PCDD/F concentrations in water causedby dredging in 2005 and subsequent decrease afterward. Weassumed that the most contaminated sediments (140 000m3) between Kuusankoski and Keltti (Figure 2) would beremoved by dredging during a half-year period. From 1% to10% of sediment removed was expected to be resuspendedin river water. The PCDD/F concentration of dredged andresuspended bottom sediment was 40 400 ng g-1 (140 ngI-TEQ g-1); in this case PCDD/F loading would be about 300kg. Since the area of contaminated bottom decreases duringdredging, the amount of PCDD/F compounds resuspendedfrom the river bottom by normal currents in addition to thedredging was set to decrease linearly from the projected levelfor 2005 (0.11 kg d-1) to one-tenth (0.01 kg d-1). After dredging,the resuspension rate of contaminated sediments left on theriver bottom was set to decrease further with correspondingrate, as would be the case if no dredging would have beendone. Loading for 2020 was set to 0.005 kg d-1. Assessmentof the impacts of dredging on sediment and contaminanttransport was based on the dredging experiment performedin Myllykoski in year 2001 and on earlier experience with

dredging projects in Finland. The experiment suggested thatthe variation in the portion of resuspended sediment is from1% to 10% of dredged material depending on the dischargeand dredging methods used.

ResultsCalibration. Fitted settling velocity (ws) was 5 × 10-1 m s-1.In Lake Tammijarvi near the downstream end of the river,ws was fixed at a somewhat lower level (1 × 10-1 m s-1) dueto wind-induced resuspension of bottom sediments. Withthe calibrated settling velocity, calculated sedimentation anderosion appear to occur at natural levels. The calculated valuefor SS in the river water corresponds well with observedconcentration in the calibration (Figure 7), except duringsummer in Ahvenkoski below Lake Tammijarvi where flowconditions do not favor sedimentation as much as assumedin the model. During spring and autumn floods, the calculatedconcentration in all observed points was some times higherthan observed, which might stem from rapid variation inconcentration and from sparse sampling.

Verification. The calculated concentration of SS of riverwater in 1997 was compared with the observed value (Figure7). As in the calibration phase, the calculated concentrationwas occasionally higher than observed during the spring

FIGURE 7. Calculated and observed concentrations of suspended solids (SS) in various points of the river during calibration (1980-1996)and verification (1997) phases.

3418 9 ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 15, 2003

Page 151: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

flood, and in Ahvenkoski was lower during summer. Thecalculated PCDD/F concentration in SS of river water andobserved in sediment trap samples in 1997 were alsocompared (Figure 8). The calculated total PCDD/F concen-tration was usually 1000 ng g-1 (∼10 ng I-TEQ g-1) less thanthe observed concentration. It can be a consequence ofunderestimation of SS loading. As particle-size distributionin sediment traps might differ somewhat from the distributionof SS of river water, the difference might also stem from this.However, the order of magnitude at each site was within theobserved range.

As calculated with the river model, the mean flux ofPCDD/F in 1997 from the Kuusankoski-Keltti area was 57kg a-1, sedimentation at downstream accumulation bottomswas 28 kg a-1, and outflow to Gulf of Finland was 29 kg a-1.The observed values for PCDD/F concentration of SS in thesediment trap in Keltti, for the SS concentration in river water,and for discharge in the river suggest that PCDD/F fluxeswere somewhat higher (73, 32, and 41 kg a-1, respectively).Observations did not cover the whole of 1997.

Impact of Dredging on Sediment and PCDD/F Trans-port. As shown previously (Figure 6), the extrapolatedPCDD/F loadings from contaminated sediment are decreas-ing, while the calculated PCDD/F concentrations in sedi-mentation areas show similar trends (Figure 9). This sug-gested that the PCDD/F concentrations will decrease by 2020to one-tenth of the level in 1969 and to one-third of thepresent level at all sites computed.

Modeling results also suggested that an immediateincrease in PCDD/F concentration in the surface sediments(layer 3 in the model) on accumulation bottoms would beconsiderable if 10% of dredged sediment were resuspendedin the river water (Figure 10a). Instead, the increase wouldbe insignificant if only 1% were resuspended (Figure 10b).After the dredging, the PCCD/F concentration of surfacesediment would decrease clearly as compared with theconcentration before dredging and also with the situationwithout dredging (Figure 9). During the dredging, PCDD/Fconcentration in the SS of the river water would increaseconsiderably in the worst-case scenario (10%; Figure 11a)but would remain within normal annual variation in the 1%scenario (Figure 11b). After the dredging, the concentrationwould decrease to a level lower than that before dredging,as would be expected.

Sensitivity Analysis. Sensitivity of the model to sedimentparameters were performed with scientifically reportedparameter ranges (Table 1). First, difference between cal-culated extreme SS concentrations in Huruksela River waterwas studied (Figure 12 a) with the verification data from year1997 (Figure 7). Extreme values were on the same order ofmagnitude, suggesting that uncertainty in parameters iswithin acceptable level or that the model is not very sensitiveto parameters. Calculated maximum is nearer to observationsin summer than fitted average, stemming most probably fromthe underestimation of diffuse SS loading noted in verificationphase. Second, difference between calculated extremePCDD/F concentrations in river water (Figure 12b) in thedredging simulation (Figure 11a) was studied. Difference wassomewhat higher than that between extreme SS concentra-tions. Maximum concentration was two times higher thanaverage but still in the same order of magnitude. Third, thedifference between calculated extreme PCDD/F concentra-tions in bottom sediment after dredging (Figure 10a) in year2005 was studied with reported parameter ranges (Figure12c). The difference was on the same level as in second case.It was noteworthy that, in maximum case, PCDD/F depositedonly in Lake Tammijarvi although τd and τe were elevatedsomewhat (7 × 10-3 and 8 × 10-3, respectively).

FIGURE 8. Calculated PCDD/F concentrations in SS of river waterand observed concentration in sediment traps in 1997.

FIGURE 9. Calculated relative PCDD/F concentration in surfacesediments of accumulation bottoms (1-year-old sublayer 3 in themodel) at given sites as a function of time (1969-2020). The referenceconcentration year was set to 1969.

VOL. 37, NO. 15, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 9 3419

Page 152: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Sensitivity analysis revealed substantial uncertainty inmodel results. However calculated uncertainty ranges ofPCDD/F concentrations in river water and in surfacesediments did not manifest risk of unexpected high spreadingof PCDD/F compounds during dredging. Thus, uncertaintyranges did not question conclusions.

Discussion and ConclusionsA large part of the past dioxin emission was deposited intoan single bottom sediment area in Kuusankoski. At present,sediments from the area are migrating slowly, partially intofew sedimentation pools downstream and partially spreadingover the estuary and over the nearby sea area. To preventmigration, environmental authorities have plans to dredgethe most contaminated sediments. The main features ofPCDD/F transport were successfully described, and the riskof excessive contaminant spreading was assessed with the1-D hydraulic river model. A lot of information was collectedand assimilated into the model. Model sensitivity to scien-tifically reported sediment parameter ranges was studied

FIGURE 10. Calculated relative PCDD/F concentrations in surfacesediments of accumulation bottoms (1-year-old sublayer 3 in themodel) at given sites as a function of time (1969-2020). The referenceconcentration year was set to 1969. It was assumed that 10% (a)or 1% (b) of dredged sediment would be resuspended in the riverwater.

FIGURE 11. Calculated PCDD/F concentrations in the suspendedsolids of the upper (Keltti) and lower (Tammija1rvi) river stretchesbefore, during (days 534-713), and after dredging if 10% (a) or 1%(b) of dredged sediment were resuspended in the river water.

FIGURE 12. Sensitivity calculations of (a) suspended solid con-centration in Huruksela 1997 (Figure 7). (b) PCDD/F concentrationbefore and after dredging of Kuusankoski in Lake Tammija1rvi (Figure11a). (c) PCDD/F concentration after dredging of Kuusankoski inbottom sediment layer 3 (Figure 10a) in sedimentation pools.Sediment parameters (Table 1) were varied within scientificallyreported ranges to produce average, minimum, and maximum SSconcentrations in the river water.

3420 9 ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 15, 2003

Page 153: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

graphically by varying parameters within minimum andmaximum limits. It pointed out that model was not toosensitive within specified parameter ranges. The modelpredicted that simulated restoration dredging can cause asudden increase in PCDD/F concentrations in the river ifnot implemented carefully. Concentrations will, however,decrease soon to a significantly lower level than beforedredging. The simulated results were in accordance with thefindings of elevated PCDD/F concentrations and burden inthe Gulf of Finland (9).

The total PCDD/F concentrations used in calculationsincluded both toxic and nontoxic congeners and may beconverted to I-TEQ (NATO) with a mean TEF-value of 0.0034(as calculated from sediment data). Consequently, a totalPCDD/F concentration of 10 000 ng g-1 is roughly equivalentto 34 000 pg I-TEQ g-1. Both at present and during dredging,predicted PCDD/F levels in surface sediments and insuspended solids (Figures 11 and 12) far exceeded theprovisory Finnish guideline trigger value for contaminatedsoils, which is now under revision (500 pg I-TEQ g-1).Moreover, the predicted trend of concentration was de-scending stemming from mixing of contaminated solids withuncontaminated solids derived from upstream and fromtributaries. The trend did not suggest a decrease of con-centration below the guideline for many decades (Figure 9).The predicted long-term (A.D. 2015-2020) surface concen-trations after remedial dredging varied between 300 and 1 000pg I-TEQ g-1 depending on site.

The outcome of this investigation was used in preliminarymodeling of PCDD/F bioaccumulation in zoobenthos andfish and will be used in the planning of sediment restoration,assessment of future remedial project, and as a basis for ahigher-resolution 2-D model or other modeling approachesand their extensions.

Although based on several assumptions, the results werereasonable and support a more specific, detailed, and cau-tious assessment of restoration dredging in the most con-taminated area in Kuusankoski. For example, the spatial andtemporal variation of resuspension from Kuusankoski andtheir physical dependence on river discharge should be stud-ied in the future. This could be done by measuring criticalshear stress and erosion rate in sediment flume experiments.With measured parameter distributions, sediment transportshould be calculated with 2-D or 3-D hydrodynamic models.A huge amount of information is on file in complicatedmodels such as the one applied here, so the model may aidin specifying future research needs (19, 20). The use of com-plicated models has been questioned without well-plannedexperiments, and it has been stated that model identifiabilityanalysis can be used in discrimination of competing experi-ments (19, 20). If parameter distributions and variability ofinput data can be statistically specified, model uncertaintycould be evaluated with the Monte Carlo method (21, 22).Parameter distributions could be estimated best with Markovchain Monte Carlo computational methods (21, 23, 24) ifparameters are identifiable (19, 20) with available data.

AcknowledgmentsThe Finnish Ministry of the Environment, UPM-KymmeneCorporation, and Academy of Finland (MaDaMe Project) areacknowledged for their financial support, and the South-eastern Regional Environment Centre is acknowledged fortheir support in field sampling. Two anonymous reviewersand Dr. Timo Assmuth are acknowledged for the compre-hensive review.

Literature Cited(1) Koistinen, J.; Paasivirta, J.; Suonpera, M.; Hyvarinen, H.

Contamination of pike and sediment from the Kymijoki Riverby PCDEs, PCDDs, and PCDFs: Contents and patterns com-

pared to pike and sediment from the Bothnian Bay and sealsfrom Lake Saimaa. Environ. Sci. Technol. 1995, 29, 2541-2547.

(2) Verta, M.; Korhonen, M.; Lehtoranta, J.; Salo, S.; Vartiainen, T.;Kiviranta, H.; Kukkonen, J.; Hamalainen, H.; Mikkelson, P.; Palm,H. Ecotoxicological and health effects caused by PCPdS, PCDEdS, PCDDdS and PCDFdS in river Kymijoki sediments, south-eastern Finland. Organohalogen Compd. 1999, 43, 239-242.

(3) Verta, M.; Lehtoranta, J.; Salo, S.; Korhonen, M.; Kiviranta, H.High concentrations of PCDDdS and PCDFdS in river Kymijokisediments, south-eastern Finland, caused by wood preservativeKy-5. Organohalogen Compd. 1999, 43, 261-264.

(4) Schlor, H. Chemie der Fungizide. In Chemie der Pflanzenschutz-und Schadlingsbekampfungsmittel; Wegler, R., Ed.; Springer:Berlin, 1970; Band 2, pp 44-161.

(5) Jensen, S.; Renberg, L. Contaminants in pentachlorophenol:chlorinated dioxins and predioxins. Ambio 1972, 1, 1-4.

(6) Paasivirta, J.; Lahtipera, M.; Leskijarvi, T. Experiences of structureanalyses of chlorophenol dimers and trimers found in differentsamples. In Chlorinated Dioxins and Related Compounds;Hutzinger, O., Frei, R. W., Merian, E., Pocchiari, F., Eds.;Pergamon: Oxford, UK, 1982; pp 191-200.

(7) Vartiainen, T.; Lampi, P.; Tolonen, K.; Tuomisto, J. Polychlo-rodibenzo-p-dioxin and polychlorodibenzofuran concentrationin lake sediments and fish after a ground water pollution withchlorophenols. Chemosphere 1995, 30, 1439-1451.

(8) Ramondetta, M., Repossi, A., Eds. Seveso 20 years after. Fromdioxin to Oak Wood; Fondazione Lombardia per l’Ambiente:Milano, Italy, 1998.

(9) Isosaari, P.; Kankaanpaa, H.; Mattila, J.; Kiviranta, H.; Verta, M.;Salo, S; Vartiainen, T. Spatial and temporal accumulation ofpolychlorinated dibenzo-p-dioxins, dibenzofurans, and diphen-yls in the Gulf of Finland. Environ. Sci. Technol. 2002, 36, 2560-2565.

(10) Cunge, J. A.; Holly, F. M., Jr.; Vervey, A. Practical Aspects ofComputational River Hydraulics; Pitman Publishing Limited:London, 1980; p 420.

(11) Govers, H. A. J.; Krop, H. B. Partition constants of chlorinateddibenzofurans and dibenzo-p-dioxins. Chemosphere 1998, 25,53-56.

(12) Dayer, K. R. Coastal and estuarine sediment dynamics; JohnWiley & Sons: London, 1986; p 339.

(13) Cheremisinoff, N. P. Encyclopedia of fluid mechanics, Vol. 10:Surface and ground water flow phenomena; Gulf Publishing:Houston, 1990; pp 211-265.

(14) van Rijn, L. Principles of sediment transport in rivers, estuariesand coastal areas; Aqua Publications: Amsterdam, 1993.

(15) Huttula, T.; Krogerus, K. Water currents and erosion of cellulosefibres in a short-term regulated water course. Aqua Fenn. 1986,16 (2), 167-180.

(16) Huttula, T. Modelling the transport of suspended sediment inshallow lakes. Academic Dissertation, Department of Geophys-ics, University of Helsinki, 1994.

(17) Sheng, Y. P.; Chen, X.-J. Modelling three-dimensional circulationand sediment transport in Lakes and estuaries. Estuarine CoastalModel. 1991, 105-115.

(18) van Rijn, L. C. Handbook. Sediment transport by currents andwaves; Delft Hydraulics Report H 461: 1989.

(19) Reichert, P.; Vanrolleghem, P. Identifiability and UncertaintyAnalysis of the River Water Quality Model No. 1. Water Sci.Technol. 2001, 43 (7), 329-338.

(20) Brun, R.; Reichert, P.; Kunsch, H. Practical identifiability analysisof large environmental models. Water Resour. Res. 2001, 37 (4),1015-1030.

(21) Adams VanHarn, B. A. Parameter distributions for uncertaintypropagation in water quality modeling. Ph.D. Thesis, Depart-ment of Environment, Graduate School of Duke University. 1998.

(22) Bansidhar, S. Giri; Karimi, I. A.; Ray, M. B. Modeling and MonteCarlo simulation of TCDD transport in a river. Water Res. 2001,35 (5), 1263-1279.

(23) Harmon, R.; Challenor, P. A Markov chain Monte Carlo Methodfor estimation and assimilation into models. Ecol. Model. 1996,101, 41-59.

(24) Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B. Bayesian dataanalysis; Chapman & Hall: London, 1995.

Received for review August 21, 2002. Revised manuscriptreceived May 13, 2003. Accepted May 20, 2003.

ES0260723

VOL. 37, NO. 15, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 9 3421

Page 154: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

130

Page 155: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

129

III

Publication III

Malve, O., Laine, M. and Haario, H. 2005. Estimation of winter respiration ratesand prediction of oxygen regime in a lake using Bayesian inference. EcologicalModelling, Vol. 182:2, pp. 183–197. DOI:10.1016/j.ecolmodel.2004.07.020

c© 2005 Elsevier

Page 156: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

130

Page 157: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Ecological Modelling 182 (2005) 183–197

Estimation of winter respiration rates and prediction of oxygenregime in a lake using Bayesian inference�

Olli Malvea,∗, Marko Laineb, Heikki Haariob,c

a Finnish Environment Institute, P.O. Box 140, FI-00251 Helsinki, Finlandb Department of Mathematics and Statistics, University of Helsinki, P.O. Box 68, FI-00014 Helsinki, Finland

c Lappeenranta University of Technology, P.O. Box 20, FI-53851, Lappeenranta, Finland

Received 27 March 2003; received in revised form 29 April 2004; accepted 12 July 2004

Abstract

In this paper, we estimate the winter respiration (oxygen depletion per unit area of hypolimnetic surface) in a hyper-eutrophicshallow lake (Tuusulanjarvi) in the northern hemisphere (Finland, northern Europe, latitude 60◦26′, longitude 25◦03′) underice-cover periods in the years 1970–2003. We present a dynamic nonlinear model that can be used for predicting of the oxygenregime in following years and to dimensioning of needed artificial oxygenation efficiency that will prevent fish kills in the lake.We use Bayesian estimation of respiration using Markov chain Monte Carlo (MCMC) method (Adaptive Metropolis–Hastingsalgorithm). This allows for analysis and predictions that take into account all the uncertainties in the model and the data, poolinformation from different sources (laboratory experiments and lake data), and to quantify the uncertainties using a full statistical

r

riod. Theven data.ervationsn devices

kehep-Ouren

approach. The mean estimated respiration in the study period was 301± 105 mg m−2 d−1, which is on the upper limit of winterespiration of eutrophic Canadian lakes on the same latitude. The reference rate of the respirationk (d−1) at 4◦C indicatedcyclic behavior of about 9-year amplitude and had a statistically significant negative trend through out the study petemperature coefficient and respiration rate of the model prove to be highly correlated and unidentifiable with the giThe future winters can be predicted using the posterior information coming from the past observations. As new obsarrive, they are added to the analysis. Methods are shown to be applicable to the dimensioning of artificial oxygenatioand to the anticipation of the need for oxygenation during the winter.© 2004 Elsevier B.V. All rights reserved.

Keywords:Lake winter respiration; Model uncertainty; Dynamical model; Bayesian modelling; MCMC

� This research is a part of the Academy of Finland’s MaDaMeprojectDevelopment of Bayesian methods with applications in geo-physical and environmental research.

∗ Corresponding author. Tel.: +358 9 4330 0359;fax: +358 9 4030 0390.

E-mail addresses:[email protected] (O. Malve),[email protected] (M. Laine), [email protected] (H. Haario).

1. Introduction

In this research, we model winter respiration in LaTuusulanjarvi as total consumption of the oxygen in tlake (mg m−2 d−1), which includes both the consumtion in the lake water and on the bottom sediment.principle aim is not to develop a complicated oxyg

0304-3800/$ – see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.ecolmodel.2004.07.020

Page 158: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

184 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

model, but to study the changes in the condition ofthe lake as manifested by yearly average respiration,to predict future oxygen regime in the lake, to dimen-sion probabilistically artificial oxygenation efficiencyand to illustrate the multiplicity of model uncertaintiesand the strengths of the MCMC method in trackingthese. The identifiability and uncertainty analysis oflarge environmental simulation models(Scavia, 1980;Adams VanHarn, 1998; Brun et al., 2001; Omlin etal., 2001a,b; Reichert and Vanrolleghem, 2001)haveby now manifested some shortcomings in the scientificbasis of complicated ecological model equations and oftheir parameter ranges. This is why the use of compli-cated water quality models as quantitative forecastingtools without comprehensive and well-designed mea-suring campaigns has been challenged and why newinterest has been developed in the application of meth-ods that can better trace unidentifiabilities and uncer-tainties in the models (e.g. MCMC)(Harmon and Chal-lenor, 1996; Adams VanHarn, 1998; Omlin and Re-ichert, 1999; Annan, 2001; Borsuk et al., 2001a).

As a subject of research and management, water re-sources are complicated, multiform and sometimes re-spond to external disturbances in an unpredictable way.Moreover, our measurements are spatially and tempo-rally limited. Data analysis using traditional statisticsrelies on simplifications, typically in forms of lineariza-tion and large sample arguments, both of which canlead to unrealistic estimates of the parameters and es-pecially of their accuracy. On the other hand, traditionalp oret-i nnotb duet sci-e ed ind cer-t cant sta-t odsh un-c l-l 98;O al.,2

anal-y ainM them od-

elling. The time evolution of winter respiration in eu-trophic Lake Tuusulanjarvi is estimated, the long-termimpact of loading reduction and artificial aeration isassessed and oxygen regime in the lake in a future win-ter is predicted. Modelling of the interaction between alake’s oxygen regime and all relevant ecological vari-ables may be based on a complicated theoretical frame-work. The difficulty lies in the design and implemen-tation of experiments that minimize parameter uncer-tainty, in the quantification of parameter uncertainty,and in the propagation of uncertainty to the predictions.In our particular example, a simplified model was se-lected to clearly illustrate the multiplicity of uncertain-ties and the strengths of the MCMC method in trackingthem. Limited observational resources, typical in lakemanagement practice, supported the use of a simplemodel structure. The efficiency and operational statusof the artificial oxygenation devices over 30 years couldnot be traced very accurately and the temperature de-pendence of respiration was inaccurately known, whichcaused uncertainties in the inference. Uncertainties inthe results were quantified and studied thoroughly andMCMC methods were evaluated with the given exam-ple.

We present the basic principles of Bayesian method-ology needed for environmental modelling. In additionto the standard MCMC methodology used in some re-cent papers on environmental problems(Borsuk, 2001;Borsuk et al., 2001a; Qian et al., 2003)we also showuseful ideas not commonly implemented. The mod-e toa he-n tiesc tiono hm( a-t t is,w ownp

2

i-s2 hy-d owni ake

hysics has encouraged the use of complicated thecal models which may contain parameters that cae estimated accurately with the available data or

o nonlinearities of the model. In the environmentalnces, the results of the analyses are frequently usecision-making, and the more accurately the un

ainties can be evaluated, the more accurately wehen evaluate the risks of the decisions. Bayesianistical inference with modern computational methave provided very useful tools for assessing theertainties in environmental studies(Harmon and Chaenor, 1996; Kokkonen, 1997; Adams VanHarn, 19mlin and Reichert, 1999; Annan, 2001; Borsuk et001a).

In this research, we have applied Bayesian datasis with a computational tool called the Markov chonte Carlo (MCMC) method. The usefulness ofethod is demonstrated with an example of lake m

lling of the error in the control variables is usedccount for all the relevant uncertainties of the pomenon. We also show how predictive probabilian easily be calculated for trends in the time evoluf the model. Use of the adaptive MCMC algoritHaario et al., 2001)makes it feasible to do calculions with a high dimensional parameter vector, thahen we have a large (from 50 up) number of unknarameters.

. Materials

Lake Tuusulanjarvi is located in the northern hemphere in southern Finland, latitude 60◦26′, longitude5◦03′. The lake is hyper-eutrophic and shallow, itsrologic and morphometric characteristics are sh

n Table 1. The previously mesotrophic state of the l

Page 159: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 185

Table 1Hydrology and morphology of Lake Tuusulanjarvi (Anonymous,1984)

Surface area (km2) 6.0Volume (m3) 19× 106

Maximum depth (m) 10Average depth (m) 3.2Length (maximum) (km) 7.5Theoretical water residence time (days) 250Area of drainage basin (km2) 92Lake percentage of drainage basin (%) 8.4

shifted to hyper-eutrophy in the 1960s due to sewagedischarge. In the beginning of the 1970s, the winteroxygen regime was in critical condition. The oxygenregime was slightly improved in 1973 when winter aer-ation began. The situation was further improved withreductions in nutrient loading. Sewage discharge wasdiverted in 1979. Summer aeration started in 1980. Thehigh trophic condition in the lake remained, and bloomsof blue–green algae occurred every summer after load-ing reduction (50% in phosphorus loading) in 1979.The phosphorus load from agriculture (4500 kg year−1

= 0.75 g m−2 year−1) still surpasses the lake’s tolerancelevel, which is why the reduction of phosphorus con-tent of the water body by decreasing both external andinternal phosphorus loading has been required.

In the beginning of the 1970s, the lake’s winter netoxygen consumption was estimated to be 200 000 kgon average. The flux of pumped dissolved oxygen hasbeen approximated by aerator consultants to be about100 t on average. It has a large yearly variation dueto technical problems and duration of ice-cover. Thisleaves significant uncertainty concerning the estimateddissolved oxygen fluxes, and affects the lake respira-tion estimates. The prior distribution for the flux wascalculated from the information available in technicalreports. In 1971, before aeration was started no dis-solved oxygen was actually pumped (average 0 kg d−1,S.D. 20 kg d−1). In 1972–1980, the amount of oxygendissolved into the lake’s hypolimnion by Nokia aera-tor was 1350 kg d−1 (S.D. 675 kg d−1). In 1982–1990,w ora callya rteddI byP

1997 onwardsLappalainen (1994), oxygen was not dis-solved at all (average 0 kg d−1, S.D. 20 kg d−1), butoxygen-rich water from the epilimnion was pumpedby Mixox aerator into the hypolimnion. In 2003, theaerator was not used at all.

Other restoration measures such as dilution of thelake water with nutrient-poor water from a neighbor-ing water body as well as more recent biomanipulationhave been performed to decrease the trophic state andinternal loading of the lake. Those measures also havean implicit impact on the lake respiration and on theoxygen regime.

The lake water was sampled at two-meter intervalsat the deepest point of the lake (point P10 inFig. 1)(maximum depth 10 m) by the Uusimaa regional en-vironment center and Central Uusimaa Federation ofMunicipalities for the Water Protection during the pe-riod 1968–2003. Samples were collected two to seventimes each winter. Oxygen concentration and temper-ature were analysed with standard analysis methods.Vertical average and standard deviation were calcu-lated. In March 2001, oxygen concentrations were alsomeasured in situ at nine points (P1–P9 inFig. 1) todetermine the area of aerator impact.

3. Methods

3.1. Modelling of winter oxygen regime

thelly

e-re-

te

ater from the hypolimnion was pumped by Hydixerator to the lake surface, where it was mechanierated and pumped back to the hypolimnion. Repoissolved oxygen flux was 240 kg d−1 (S.D. 24 kg d−1).

n 1990–1997, the average oxygen flux pumpedlanox aerator was 450 kg d−1 (S.D. 45 kg d−1). From

In standard lake aeration planning techniques,average winter respiration rate in the lake is typicaestimated with linear regression where they variableis the oxygen content of the water body (mg m2) andthex variable is the time after the beginning of the iccover period (d). The slope of the regression line rep

Table 2The lake oxygen consumption model equation

dCO2

dt= −kCO2θ

Tobs−Tref + feed

volCO2 oxygen concentration in the lake (g m−3)k total respiration rate constant (d−1)θ temperature coefficient of respiration raTobs observed temperature in lake water (◦C)Tref reference temperature (4◦C)Feed 103 × dissolved oxygen flux (kg d−1)vol volume of aerator impact (m3)

Page 160: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

186 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

Fig. 1. Bathymetric map of Lake Tuusulanjarvi. Location of water and bottom samples, aerator, area of aerator impact, and depth-volume curve.

sents the respiration rate (mg m−2 d−1) (Lorenzen andFast, 1977). By coupling the temperature dependencyof the respiration according to the Arrhenius formula-tion (Bowie et al., 1985)and the time-varying oxygenflux from the aerator, we introduced nonlinearity anddynamics to the model. The oxygen concentration is theaverage vertical concentration in the area of aerator im-pact. The model used in this study (Table 2), describesthe average oxygen regime in the area of aerator impact

(1 km2, Fig. 1). Due to the fact that biological oxygendemand (BOD) is under the detection limit in winterperiods, it was not included in the model. Similar for-mulation has been used in the modelling of estuarineand coastal oxygen dynamics(Borsuk, 2001; Borsuket al., 2001a,b).

As we were interested in the time evolution of therespiration in the lake, the rate parameterk was es-timated separately for each winter period. The tem-

Page 161: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 187

perature dependency was, however, assumed to havecommon value for all years. Fitting separate rate pa-rameters for each year is also reasonable, as such time-varying factors as weather condition, nutrient loading,inflow–outflow rates and plankton population compo-sition, which are not included in the model, must stillcontribute to the total oxygen consumption.

Uncertainties in the model are connected to the totalrespiration rate parameterk, to the temperature coeffi-cientθ, to pumped flux of dissolved oxygen and to thevariance of vertically averaged oxygen concentration.The definition procedure of the prior values for theseparameters is presented inSection 3.3. By their charac-ter the respiration rate parameterk and the temperaturecoefficientθ are correlated due to the model formu-lation. Correlation may constitute severe identifiabilityproblem in parameter estimation but does not necessar-ily have severe impact on distributions of model pre-dictions.

3.2. Bayesian modelling

Unidentifiability of the parameters can be causedby limited availability of observational data or by thestructure of the nonlinear model equations. This cansometimes be solved with better design of the experi-ments and reparametrizing the model, but many timeswe are stuck with the available inaccurate data andmodelling is to be done in conventional units.

Bayesian approach has shown to be powerful wayt od-e s ine n,2 andC ow,2 -t d inm etert redic-t tionsa toa

tedp com-p omt om-p ari-e

cent advances in the MCMC computing, together withconstantly increasing CPU resources, have made evenlarger problems tractable, see for example,Haario etal. (2001).

In the classical least squares estimation, we usuallyassume that the control variables (thex variables) aremeasured without error or at least with error that isnegligible compared with the observational error in thedependent variables. Bayesian approach offers an intu-itive way to model these uncertainties. An observationwith error is considered as an extra parameter and aprior distribution describing the error structure is at-tached to it. A drawback with this approach, however,is that the number of the parameters of the model cangrow quite large, which in turn imposes extra burdenon the calculations.

3.3. Applying Bayesian modelling to the lakemodel

To use the model (1) in Bayesian MCMC settingswe need to specify a prior distribution for the parame-ters. If we assume prior independence of the parametervalues it is sufficient to specify a marginal density foreach component of the parameter vector. Although inprinciple any kind of prior distributions can be used inMCMC calculations we have used only Gaussian pri-ors with possible upper and lower limits for the values(e.g. positivity constraints).

For the total respiration rate constantsk1, . . . , k31( usen rigi-n eirp pos-i am-e ti-fi d ahp ex-p1 -ga i-ac

a-r singn

o quantify the uncertainties in the whole the mlling procedure. This is show in many examplecological modelling(Adams VanHarn, 1998; Anna001; Borsuk, 2001; Borsuk et al., 2001a; Harmonhallenor, 1996; Omlin and Reichert, 1999; Reckh002; Qian et al., 2003). Unidentifiability is easily de

ected by inspecting the posterior distributions anany times it does not even matter if the param

hemselves are correlated, because the model pions are what we are interested on. These predicre not affected by the unidentifiability if taken inccount accordingly.

Nonlinear models with large number of correlaarameters causes additional challenges to theutations, however, MCMC simulation of values fr

he posterior distribution of the parameters is a cutational method that can be applied to wide vty of modelling problems(Gamerman, 1997). Re-

one for each winter period) it was reasonable toon-informative priors. These parameters were oally of primely interest and we wanted to explore thosteriors without any prior constraints (other than

tivity). The temperature dependency of the rate parter θ in the form used here is not very well idened by the model. Preliminary modelling suggesteeavy nonlinear correlation betweenk andθ. A properrior distribution was acquired from a laboratoryeriment (sediment sampling points are shown inFig.) (Lehtoranta and Malve, 2001). The distribution sugested by the experiment was GaussianN(1.45, 0.4),nd it was used as a prior forθ. For the unknown varnceσ2 of the observation errorε, a non-informativeonjugate prior was used.

Calculating the model (1) given the data and pameters requires solving a differential equation uumerical methods. To solve the equation (1) numer-

Page 162: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

188 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

ically, we need to specify an initial condition for theoxygen concentration, i.e. the value ofCO2 at the ini-tial time point 0 for each year. However, this value isitself an observation subject to error, and solving theequation with fixed initial values causes bias. A betterapproach is to let the initial concentrations vary also,meaning that we have an extra parameter for each win-ter period. As with the other parameters, the MCMCalgorithm will produce posterior distributions for thesevalues as well.

The term feed/vol in the model (1) corresponds tothe amount of fresh oxygen feed dissolved in the lakewater. For the value of feed we have the manufactur-ers information but the value is subject to some doubt.The imprecise knowledge of the volume of aerator im-pact also causes uncertainties. Gaussian priors for theoxygen feed in the five periods were (mean± S.D.

(kg d−1)): (1) 1970–1972: 0± 20, (2) 1973–1982:1350± 675, (3) 1983–1990: 240± 24, (4) 1991–1997:450± 45, (5) 1998–2002: 0± 20. Note that in the pe-riod (1) no pump was installed yet, while in the latestperiod (5) the aerator only mixed the water, keepingthe ice cover open near the pump.

To sum up the model and the parameters so far: wehave 31 respiration rate parametersk1, . . . , k31, one foreach winter period, together with 31 initial values of theconcentrations. The temperature dependency constantθ is assumed to be independent of the time and thusadds only one parameter. Error in thex variables ap-proach for the oxygen feed term of the model bringsfi ob-s alsu

3

s a( lesf nalp odela om-m stan-d to-g amsa ivedv pre-d ach( hain.

For example, this way we can calculate trends in theyearly respiration of the lake oxygen regime during thewinter and have full posterior distribution describingthe uncertainties in the trend estimates. In this way, anypredictions based on the model include all the relevantbackground information and uncertainties.

Unidentifiability of model parameters, which re-sults from the parameter correlation, non-linearity ofthe model and from the characteristics of the data isalso revealed by the MCMC chain. The posterior cor-relations and two-dimensional plots of the parametercombinations can be used as diagnostics of parame-ter identifiability. High posterior correlations betweenparameters make estimates of single parameters them-selves useless if the dependency in the other parame-ters are not clearly stated, but usually have little im-pact on prediction distributions as the correlations arein consistent way taken into account in the predic-tions.

The oxygen time profile can be predicted for a newyear using the posterior information of the fitted model.This can even be done without any new observations.The posterior from the previous years is treated as aprior for the new year. Previous years can be weightedin a suitable matter. If we believe that winters closer tothe current situation are better for the prediction (webelieve that there is a trend in the parameter values) wecan give decreasing weight to past year’s posteriors.This has been done inSection 4.3.

Monte Carlo simulations with different aerator ef-fi beu kills,S

4

4p

t witht thec ffecto dt ds.T pos-t dian

ve more parameters. Together with the unknownervation errorσ2, the total number of parameters totp to 69.

.4. Using the MCMC chain

The MCMC chain produced by the algorithm in:o simulations)× (n:o parameters) matrix of samprom the posterior distribution of the multi-dimensioarameter vector. All statistical reasoning on the mnd the parameters can be based on this matrix. Con usage is the calculation of posterior means,ard deviations and correlations from the chainether with some illustrative plots such as histogrnd density estimates. In addition, any model deralues (calculations based on the model) are givenictive posterior distributions when calculated for eor randomly sampled subset of) parameter in the c

ciencies together with the model predictions cansed to asses the risks of oxygen depletion or fishection 4.4.

. Results

.1. Posterior distributions for the lake modelarameters

This simple model with separate rate parameterk es-imated for each year gives very good agreementhe winter time observations and allows us to studyhanges in the respiration over the years and the ef the external oxygen feed.Fig. 2show the data use

ogether with the fit obtained by Bayesian methohe model predicted oxygen concentration 95%

erior limits are shown by gray area around the me

Page 163: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 189

Fig. 2. Observed oxygen concentrations (circles) (g m−3) and temperatures (◦C) (lower solid line) at ice-covered period in years 1970–2000.Thex-axis is time from the start of the ice-cover period. The dots are the observed vertically averaged O2 concentrations. Smaller dots and thedashed line show observed temperatures (◦C). The solid line with gray area around shows the median and the 95% region of posterior predictivedistrubution.

estimate. The rate parameterk for each year is plottedin Fig. 3.

The prior and posterior distributions of the oxygenfeed for the three middle periods of the different aera-tors are plotted inFig. 4. These were the periods whenthe pump was operated in a way that increased theavailable oxygen in the water (compared to just mix-

ing). Priors were based on the manufacturers announce-ments. The prior/posterior acceptance is good exceptfor the first period of 1973–1982 where the prior mean1350 (±675) is changed to posterior value 124 (±63).It seems that the ability of the aerator to dissolve oxy-gen to the water was over estimated at that time.

In the later aeration periods this was not the case.

Page 164: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

190 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

Fig. 3. Box plots of the posterior distribution of the fitted oxygen respiration rate constantk (d−1) of the lake model (1) for each year.

The average respiration (mg m−2 d−1) in each yearcan be calculated from the fitted model as average oxy-gen consumption (mg m−2 d−2) multiplied by the av-erage depth. It was calculated for each parameter real-ization of the MCMC chain thus giving the posteriorpredictive distribution of the respiration, as explainedin Section 3.4. Fig. 5shows the posteriors for estimatedyearly values of the respiration.

The mean estimated average respiration for the en-tire period 1970–2000 was 301 mg m−2 d−1. This valueis in the upper level of 11 eutrophic Canadian lakes onthe same latitudes (latitude 64°) where the winter res-piration was reported to be 131–306 mg m−2 d−1 by

−1 tted l

Welch and Bergmann (1985). Annual variation in res-piration was also quite high (S.D. 105 mg m−2 d−1).Average respirations in the 1970s, 1980s, 1990s and2000–2002 were 327, 311, 281, and 270 mg m−2 d−1

showing slight (but not statistically significant) down-ward trend.

Laboratory experiments conducted at Lake Tuusu-lanjarvi in April–May 2001 showed the value for thebottom sediment respiration to be 130 mg m−2 d−1 at+3◦C and 230 at+4◦C(Lehtoranta and Malve, 2001).Estimated respiration from the lake data in winter2000–2001 was 215 (S.D. 97), which is on the samelevel as respiration in the laboratory experiments.

Fig. 4. A prior distribution of the oxygen feed (kg d) (do

ine) compared with the posterior distribution (solid line).
Page 165: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 191

Fig. 5. Average respiration (mg m−2 d−1) calculated from the MCMC chain. Box plots corresponds the predictive posterior distribution ofestimated respiration.

The estimated oxygen respiration rate constantsand the respiration values represent the area of aer-ator impact. In spite of that, those values are quitenear the lake averages because most of the respi-

Fig. 6. Two-dimensional MCMC chain from a preliminary fit where the respiration constantk was held fixed over all the years (withTref = 20).o parametersk and θ (the temperature dependency parameter), and also thefrom

ration in Lake Tuusulanjarvi consists of the bot-tom sediment respiration, which is horizontally quitehomogenously distributed(Lehtoranta and Malve,2001).

The picture illustrates a heavy nonlinear dependency of the twevolution of the MCMC algorithm when the chain is started a far

the core of the posterior distribution.
Page 166: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

192 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

Fig. 7. Prior and posterior distributions of the temperature depen-dency parameterθ.

4.2. Parameter identifiability

The model includes a temperature dependency termθ. However the temperature variation of the water inthe winter time and during the aeration period variesquite a little. The temperature is typically steadily ris-ing from about 1◦C in the beginning of ice to some2◦C when the ice melts. At the same time the tempera-ture dependency confounds itself with thek parameter.

F d Oxyg nd to5

As the temperature rises the concentration decreasesat the same time. This makes it practically impossibleto distinguish the temperature effect from the effect ofthe decrease because the respiration rate being propor-tional to the actual concentrations. InFig. 6, a modelwith commonk andθ for each year is fitted withoutany prior constraints. It clearly shows the nonlinearrelationship of these parameters and unidentifiabilityarising from that.

Laboratory experiment to determine the tempera-ture dependency parameterθ (Lehtoranta and Malve,2001)suggested a value 1.45 with standard deviation0.2 so the no temperature effect (θ = 1) still remainedvalid alternative. In the literature several values fortheta are given, but mostly for temperatures well above4◦C. Predictions given with model withθ fixed as 1gave practically almost as good results as withθ > 1.However to allow predictions to depend on the tem-perature conditions and thus to study their effects, pa-rameterθ was held in the final model.Fig. 7 showsthe used prior together with the posterior from the finalmodel with separate respiration ratek for each year,while Fig. 8shows the posterior correlations of the pa-rameters in the final model for one particular winterperiod.

ig. 8. Two-dimensional marginal posterior distributions ofk, θ, an0 and 90% regions.

en feed at winter 1993–1994. The two contour lines correspo

Page 167: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 193

Fig. 9. Smoothed rate constantk with two levels of smoothing. Upper plot with lowess parameterh = 0.2 correspond to about 6-year trend, andthe lower withh = 0.4 about 12-year trend. Gray levels give 50, 90, and 95% limits of the posterior.

Fig. 10. Smoothed respiration with two levels of smoothing, as in Fig. 9 for thek parameter. Gray levels give 50, 90, and 95% limits of theposterior.

Page 168: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

194 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

4.3. Time evolution of lake winter respiration

The trend in different time scales of the respirationrate can be calculated as a derivate of a smoothed rateversus year curve. Using different smoothing parame-ters we can detect trends over different time periods.Smoothing could be done for instance with a movingaverage. We have used “lowess”—a locally weightedregression smoother(Cleveland, 1985). In Figs. 9 and10, posterior limits for smoothed values of the rate pa-rameterk and for the average respiration are calculatedtogether with the value of the trend (one year change).The smoothing used here corresponds to an approxi-mate 10-year trend. Note that here again we get poste-rior confidence limits of the trend as we calculate thetrend curves over the parameters sampled in the MCMCchain.

w how prediction limits for the concentration decrease as more data becomes

Trend plots in Figs. 9 and 10 show statistically sig-nificant fluctuations and cyclic behavior in yearly val-ues of the parameterk and the respiration in 5–10-yeartimescales. There seems to be a delay in the expectedcyclic rise of the rate parameterk in the most recentyears and the overall long time trend shows decreasingvalues for the rate. In the respiration, no overall trendover the whole study period can be detected.

4.4. Predicted oxygen regime in future winters

Combining the posteriors from the past years, wecan have data based priors for rate parameterk, for theinitial concentration at the beginning of the winter andfor the feed term. With these distributions we can sim-ulate possible O2 time series and calculate predictionintervals. As soon as we receive data, for example, a

Fig. 11. Predicting a new winter. Four plot in the upper part sho

available. The lower plot displays how the posteriors distribution of pa rameterk becomes more accurate with new observations.
Page 169: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 195

first observation for that winter, we can fit a new modelwith the information from the previous years as priorand have a new posterior predictions. As new obser-vations arrive we again update the model accordingly.Fig. 11shows an example how the posterior distribu-tion of the parameterk evolves as new observationsbecome available.

To predict the possible oxygen depletion and fishkills we want to know whether there is a risk of theoxygen concentration to go below 4 g m−1. This isdone by predicting the concentration at the end ofthe ice covered period. As the length of the winter isunknown beforehand we must allow uncertainty forit too. The distribution of the length was taken tobe empirical distribution of the observed past win-ters. Fig. 12 shows one prediction from the data inFig. 11.

The prediction procedure can be used to plan theaeration measures, to optimize aerator efficiency and tohelp in the real time process control of aerators. MonteCarlo simulations for new years using the model andthe posterior distributions together with simulated val-ues for aerator efficiency and temperature profiles givesus predictive distributions of possible oxygen depletionscenarios. This can be done “on-line” with new obser-vations.

dictiv at the endns in ration willn. Pl en feed thatthe le

5. Discussion and summary

In this study, the estimation of the long-term evo-lution of lake winter respiration and the prediction oflake oxygen regime in future winters were used as ex-amples of how uncertainties can be taken into accountwith the Bayesian approach and MCMC methods.

The respiration, the rate of oxygen depletion per unitof hypolimnetic surface, has been used as a trophic stateindex(Hutchinson, 1938; Mortimer, 1941). The meanestimated respiration in whole the study period was301± 105 mg m−2 d−1, which is on the upper limit ofwinter respiration of eutrophic Canadian lakes on thesame latitude reported byWelch and Bergmann (1985).

The rate parameterk which is the reference rate ofthe respiration (d−1) at 4◦C indicated cyclic behav-ior of about 9-year amplitude and some signs of globallong term decrease. It had a statistically significant neg-ative trend through out the study period, indicating adecrease in respiration rate independent of tempera-ture. Coupling this with the fact that the temperatureincreased in the 1990s, it can be hypothesized that res-piration would have decreased earlier and more steeplyif the temperature had remained stable.

The temperature dependencyθ was also estimatedfrom the data. Due to very little variability in the ob-

Fig. 12. Predicting a new winter. First plot on the right shows preof winter. The four distributions correspond to the four observatiogo under 4 g m−3 after we have observed the second concentratiowould be needed to keep oxygen amount above 4 g m−3 depends on

e posterior distributions for the amount of oxygen in the waterFig. 11. The middle plot shows the probability that the concentot on the right shows how the estimated estimated fresh oxygngth of the winter after the second observation.

Page 170: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

196 O. Malve et al. / Ecological Modelling 182 (2005) 183–197

served temperatures during the winters and to high nonlinear correlation between it and the rate parameterk,the temperature effect could not be identified by itself.Prior information forθ obtained from separate labo-ratory experiment solved the unidentifiability and al-lowed a proper posterior distribution for it.

The poor acceptance of the prior oxygen feed term ofthe model compared to the posterior (Fig. 4) at the firstaeration period 1973–1982 suggested that the abilityof the aerator to dissolve oxygen to the water was overestimated at that time. In the later aeration periods, thiswas not the case.

The reduction of the waste water loading in 1979did not have an immediate impact on respiration rates.The overall decrease of respiration rate and of its confi-dence limits could have been a response to restorationmeasures and stabilization of lake trophic state.

The Bayesian approach depends on the ability todefine prior distributions on all the unknowns of themodel. In ecological modelling the priors can be basedon previous experiments and observations and also onthe physical constrains of the phenomena. The demandof quantizising the uncertainties forces a fruitful dialogbetween modellers and specialists.

Markov chain Monte Carlo (MCMC) method is acomputational tool for Bayesian modelling. It can han-dle complicated and over parametrized environmen-tal models with confounding badly identifiable effects.Prior information is combined with the model and thedata to produce posterior information and can be usedi l theu

ma-t for-m ntsa withf dis-t

os-t andc tivep , tod ygenc

htp s butd esianc bu-

tions of the parameters into account without resortingto linearizations or other approximations.

Acknowledgments

This research has been funded by the Academy ofFinland. We are grateful to Mauri Pekkarinen fromCentral Uusimaa Federation of Municipalities for theWater Protection and Jarmo Vaariskoski from the Uusi-maa Regional Environment Centre for transmission ofdata. The head of the Water Resources Laboratory, Pro-fessor Pertti Vakkilainen, and Mr. Hannu Sirviö fromthe Finnish Environment Institute are warmly thankedfor their help in the preparation phase of this research.

References

Adams VanHarn, B.A., 1998. Parameter distributions for uncertaintypropagation in water quality modeling. Ph.D. thesis, Departmentof Environment, Graduate School of Duke University.

Annan, J.D., 2001. Modelling under uncertainty: Monte Carlo meth-ods for temporally varying parameters. Ecol. Model. 136, 297–302.

Anonymous, S., 1984. Tuusulanjarven kunnostussuunnitelma.Keski-Uudenmaan vesiensuojelun kuntainliitto (Restorationplans for Lake Tuusulanjarvi), Vantaa, Finland (in Finnish).

Borsuk, M.E., 2001. A graphical probability network model to sup-port water quality decision making for the neuse river estuary,

nt in

B . Aand

Ecol.

B ck-ntlya. Es-

B , J.,, C.,wa-

ntal

B nal-rces

C orth,

G Sim-

n analysis and predictions that take into account alncertainties in the model and the data.

The benefits coming from using Bayesian estiion are that we are able to combine and pool ination from different sources (laboratory experimend lake data) and to quantify the uncertainties

ull statistical approach using prior and posteriorributions.

We can predict the future winters using the perior information coming from past observationsombine it with new observations to produce predicosterior distributions. This allows us, for exampleimension the aerator to secure a target level of oxoncentration with a given margin of safety.

The unidentifiability of model parameters migrevent us to separate the effects of parameteroes not affect the model predictions because Bayomputations take the full multidimensional distri

north carolina. Ph.D. thesis, Department of the Environmethe Graduate School of Duke University, in press.

orsuk, M.E., Higdon, D., Stow, C.A., Reckhow, K.H., 2001aBayesian hierarchical model to predict benthic oxygen demfrom organic matter loading in estuaries and coastal zones.Model. 143, 165–181.

orsuk, M.E., Stow, C.A., Luettich Jr., R.A., Paerl, H.W., Pinney, J.L., 2001b. Modelling oxygen dynamics in an intermittestratified estuary: estimation of process rates using field dattuarine Coastal Shelf Sci. 52, 33–49.

owie, G., Mills, W., Porcella, D., Campbell, C., PagenkopfRupp, G., Johnson, K., Chan, P., Gherini, S., Chamberlin1985. Rates, constants, and kinetic formulations in surfaceter modeling. Tech. Rep. EPA/600/3-85/040, US EnvironmeAgency, ORD, Athens, GA, ERL.

run, R., Reichert, P., Künsch, H., 2001. Practical identifiability aysis of large environmental simulation models. Water ResouRes. 37 4, 1015–1030.

leveland, W.S., 1985. The Elements of Graphing Data. WadswMonterrey, CA.

amerman, D., 1997. Markov Chain Monte Carlo — Stochasticulation for Bayesian Inference. Chapman & Hall.

Page 171: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

O. Malve et al. / Ecological Modelling 182 (2005) 183–197 197

Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive Metropo-lis algorithm. Bernoulli 7 2, 223–242.

Harmon, R., Challenor, P., 1996. A Markov chain Monte Carlomethod for estimation and assimilation into models. Ecol. Model.101, 41–59.

Hutchinson, G., 1938. On the relation between oxygen deficit andproductivity and topology of lakes. Int. Rev. Gesamten Hydro-biol. 36.

Kokkonen, T., 1997. Parameter identification in groundwater mod-els. Part I. Bayesian approach to inverse groundwater problem.Licentiate thesis, Faculty of Civil and Environment Engineering,Helsinki University of Technology, Helsinki, Finland, 106 pp.

Lappalainen, K., 1994. Positive changes in oxygen and nutrient con-tents in two Finnish lakes induced by Mixox hypolimnetic oxy-genation method. Verh. Inter- nat. Verein. Limnol. 25, 2510–2513.

Lehtoranta, J., Malve, O., 2001. Tuusulanjarven pohjasedimentinhapenkulutuskoe 26.4.–4.5.2001. Tech. rep., SYKE, Helsinki,Oxygen consumption rate of Lake Tuusulanjarvi sediments (inFinnish).

Lorenzen, M., Fast, A., 1977. A guide to aeration/circulation tech-niques for lake management. Tech. Rep. EPA-600/3-77-004, Cor-vallis Environmental Research Laboratory, Office of Researchand Development, US Environmental Protection Agency, Cor-vallis, OR.

Mortimer, C.H., 1941. The exchange of dissolved substances be-tween mud and water, 1 and 2. J. Ecol. 29.

Omlin, M., Brun, R., Reichert, P., 2001a. Biogeochemical model oflake Zürich: sensitivity, identifiability and uncertainty analysis.Ecol. Model. 141 13, 105–123.

Omlin, M., Reichert, P., Forster, R., 2001b. Biogeochemical modelof lake Zürich: model equations and results. Ecol. Model. 14113, 77–103.

Omlin, M., Reichert, P., 1999. A comparison of techniques for theestimation of model prediction uncertainty. Ecol. Model. 115,45–59.

Qian, S.S., Stow, C.A., Borsuk, M.E., 2003. On Monte Carlo methodsfor Bayesian inference. Ecol. Model. 159 (2), 269–277.

Reckhow, K.H., 2002. Bayesian approaches in ecological analysisand modeling. In: Canham, C.D., Cole, J.J., Lauenroth, W.K.(Eds.), The Role of Models in Ecosystem Science. PrincetonUniversity Press.

Reichert, P., Vanrolleghem, P., 2001. Identifiability and uncertaintyanalysis of the river water quality model no. 1. Water Sci. Tech-nol. 43 7, 329–338.

Scavia, D., 1980. Uncertainty analysis of lake eutrophication model.Ph.D. thesis, Environmental and Water Resources Engineeringin the University of Michigan.

Welch, H., Bergmann, M.A., 1985. Winter respiration of lakes atSaqvaqjuac, N.W.T. Can. J. Fish. Sci. 25, 521–527.

Page 172: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

132

Page 173: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

131

IV

Publication IV

Malve, O., Laine, M., Haario, H., Kirkkala, T. and Sarvala, J. 2006. Bayesian mod-elling of algae mass occurrences – using adaptive MCMC methods with a lake waterquality model. In press, Environmental Modelling and Software, DOI:10.1016/j.envsoft.2006.06.016.

c© 2006 Elsevier

Page 174: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

132

Page 175: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Environmental Modelling & Software 22 (2007) 966e977www.elsevier.com/locate/envsoft

Bayesian modelling of algal mass occurrencesdusing adaptiveMCMC methods with a lake water quality model

Olli Malve a,*, Marko Laine b, Heikki Haario c, Teija Kirkkala d, Jouko Sarvala e

a Finnish Environment Institute, P.O. Box 140, FI-00251 Helsinki, Finlandb Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland

c Technical University of Lappeenranta, Lappeenranta, Finlandd Water Protection Association of SW Finland, Finland

e Department of Biology, University of Turku, Turku, Finland

Received 21 September 2005; received in revised form 16 June 2006; accepted 24 June 2006

Available online 24 August 2006

Abstract

Our study aims to estimate confounded effects of nutrients and grazing zooplankton (Crustacea) on phytoplankton groupsdspecifically onnitrogen-fixing Cyanobacteriadin the shallow, mesotrophic Lake Pyhajarvi in the northern hemisphere (Finland, northern Europe, lat. 60�54 0e61�06 0, long. 22�09 0e22�22 0). Phytoplankton is modelled with a non-linear dynamic model which describes the succession of three dominantalgae groups (Diatomophyceae, Chrysophyceae, nitrogen-fixing Cyanobacteria) and minor groups summed together as a function of total phos-phorus, total nitrogen, temperature, global irradiance and crustacean zooplankton grazing. The model is fitted using 8 years of in situ observa-tions and adaptive Markov chain Monte Carlo (MCMC) methods for estimation of model parameters. The approach offers a way to deal withnoisy data and a large number of weakly identifiable parameters in a model. From our posterior simulations we calculate the lower limit forzooplankton carbon mass concentration (45 mgC L�1) and the upper limit for total phosphorus concentration (16 mg L�1) that satisfy with0.95 probability our predefined water quality criteria (Cyanobacteria concentration during late summer period does not exceed the value0.86 mg L�1). Within the observational range total phosphorus has marginal effect on Cyanobacteria compared to the zooplankton grazing ef-fect, which is temperature-dependent. Extensive fishing efforts are needed to attain the criteria.� 2006 Elsevier Ltd. All rights reserved.

Keywords: Lake eutrophication modelling; Bayesian modelling; Parameter identifiability; Adaptive Markov chain Monte Carlo

1. Introduction

The analysis of trophic interactions is central to nutrientloading capacity estimation. The main goal of this study isto capture the response of phytoplankton in a lake to variationin total phosphorus, total nitrogen, water temperature, globalirradiance, and in total crustacean zooplankton biomass.

A dynamical model is constructed to describe the phyto-plankton kinetics. The modelling approach is standard, butthe key point here is to show how a proper statistical treatment

* Corresponding author.

E-mail addresses: [email protected] (O. Malve), marko.laine@

helsinki.fi (M. Laine), [email protected] (H. Haario).

1364-8152/$ - see front matter � 2006 Elsevier Ltd. All rights reserved.

doi:10.1016/j.envsoft.2006.06.016

enables us to draw reliable quantitative conclusions even in thepresence of high noise level in the datada situation that ham-pers the modelling of biological systems. The estimation of themodel parameters and predictions are performed according tothe Bayesian paradigm. A multidimensional posterior distribu-tion of the unknown parameters is constructed using the avail-able prior information and by carefully stating the statisticalassumptions concerning the observations. The practical prob-lem of computational complexity of the calculations is solvedusing Markov chain Monte Carlo (MCMC) simulation to-gether with up-to-date adaptive computational schemes tomake the simulations as effective as possible.

Our observational set-up contains in situ lake observationsonly. Lake Pyhajarvi is a shallow, mesotrophic lake (Tables 1

Page 176: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

967O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

and 2). Increased eutrophication of Lake Pyhajarvi has beenthe major concern since the late 1980s when Cyanobacteriablooms became more frequent. Sediment studies reveal thatlake productivity started to increase in the 1950s, in responseto the intensified cultivation and the use of industrialfertilizers.

Sarvala et al. (1998) have shown that the between-year var-iations of the chlorophyll a and phosphorus concentrations inLake Pyhajarvi are associated with the changes in the totalbiomass of planktivorous fish. Strong stocks of planktivorousfish are accompanied by depressed zooplankton biomass, thepractical disappearance of larger cladocerans, and high chloro-phyll a levels. One-third of the total variation in chlorophylla is attributed to changes in zooplankton biomass, and anotherthird to the changes in phosphorus concentrations.

The commercial fishery keeps the standing stock of ven-dace (Coregonus albula), the dominant planktivore in Pyha-jarvi, small and its water quality effects moderate (Sarvalaet al., 2000). After the reduction of the vendace stock in thebeginning of 1990s, other fish stocks increased, deterioratingwater quality. During the 1990s the fishing of smelt (Osmeruseperlanus), roach (Rutilus rutilus), ruffe (Gymnocephalus cer-nuus) and small perch (Perca fluviatilis) was subsidized andthis fishing has successfully reduced the amount of smeltand roach (Sarvala et al., 2000). Several management effortshave been applied to improve the status of Lake Pyhajarviand farmers have participated in the water protection projectsstarted by the Southwest Finland Regional Environment Cen-tre (SFREC) in 1991 and since 1995, coordinated by the Pyha-jarvi Protection Fund (Ventela et al., 2001).

2. Methods

2.1. Observational set-up

The water chemistry and hydrology of Lake Pyhajarvi has been monitored

since the 1960s. Intensified monitoring of nutrient concentrations started in

1980 by the Water Protection Association of SW Finland and continued

from 1993 by the Southwest Finland Regional Environment Centre. Vertical

profiles ranging 0e8 m were taken at the deepest point of the lake 6e8 times

during the open water period in 1980e1991 and at 2-week intervals in recent

years. Due to the openness and shallowness of the lake, there is no extended

stratification during the summer. Nutrient and plankton concentrations are

Table 1

Watershed characteristics of Lake Pyhajarvi

Total area (inclusive of lake’s surface) 615 km2

River Ylaneenjoki 234 km2

River Pyhajoki 77.5 km2

Remaining area (small sub-basins) 149.5 km2

Table 2

Characteristics of Lake Pyhajarvi

Surface area 155 km2

Volume 849 million m3

Mean depth 5.4 m

Maximum depth 26 m

Coastline 110 km

Water residence time 3e5 years

vertically and horizontally homogeneous most of the time (Sarvala and

Jumppanen, 1988). Phytoplankton was sampled together with nutrients and

counted with an inverted microscope at the Department of Biology, University

of Turku (Sarvala et al., 2000). Zooplankton was sampled at approximately

weekly intervals from surface to bottom with a 1 m tube sampler at ten loca-

tions selected with a stratified random design (Sarvala et al., 2000). Samples

were concentrated with 25 or 50 mm mesh net, and combined by date in the

laboratory. Crustacean zooplankton was enumerated at the Department of

Biology, University of Turku, from samples until 50e200 individuals of

each dominant species had been measured. Eight years of observations col-

lected between 1992 and 2000 were used for this study. Our data set

(Fig. 1) contains biomass concentration of Diatomophyceae, Chrysophyceae,

nitrogen-fixing Cyanobacteria, minor groups of phytoplankton summed

together, total phosphorus concentration (Ptot), total nitrogen concentration

(Ntot), water temperature (T ), global irradiance (I ), grazing zooplankton bio-

mass concentration (Z ), and outflow rate (Q).

2.1.1. Dominant species of Cyanobacteria in Lake Pyhajarvi in 1992e2000

In 1992e1998 all major cyanobacterial blooms were dominated by Ana-

baena flos-aquae (Lyngb.) Breb., but several other species were moderately

abundant: Anabaena lemmermannii P. Richter, Anabaena planctonica Brunnt.,

Planktothrix rubescens and other Oscillatoriales, Microcystis reinboldii

(Richter) Forti, Microcystis wesenbergii (Kom.) Starm., Woronichinia com-

pacta (Lemm.) Kom. and Hind., Merismopedia elegans A. Braun, Snowellalacustris (Chod.) Kom. and Hind., Chroococcus limneticus Lemm.

Gloeotrichia echinulata (J.S. Sm.) P. Richter became more abundant begin-

ning from 1994, but was never dominant in the open lake. Towards the end of

the study period, there was a change in species composition and an increase in

the number of species (typically >20 cyanobacterial species were determined

from each sample). In 1999, Anabaena planctonica Brunnt., Anabaena curva

Hill, Cyanodictyon reticulatum (Lemm.) Geitl., Aphanothece clathrata W. and

G.S. West became dominant.

2.2. Modelling of phytoplankton dynamics

The trophic correlation analyses (Helminen and Sarvala, 1997; Sarvala

et al., 1998) have revealed that the variation of the late summer phytoplankton

biomass in Lake Pyhajarvi is regulated both by bottom-up (total phosphorus)

and top-down (planktivorous fish and zooplankton) forces. In the years with

a strong year-class of age-0þ vendace the total zooplankton biomass has

been depressed and the grazing pressure by zooplankton diminished, allowing

an increase in phytoplankton biomass (Helminen and Sarvala, 1997). Only

top-down regulation of zooplankton biomass has been detected in Lake Pyha-

jarvi. The absence of bottom-up regulation of zooplankton by the phytoplank-

ton simplifies the model and enables us to focus our model on phytoplankton

only.

The model is relatively standard in specification. The linear trophic corre-

lations previously estimated by Sarvala et al. (1998) and Helminen and Sarvala

(1997) are modelled with a first-order reaction term for algal growth, respira-

tion, settling and death by predation. For a recent account, see Zeng et al.

(2006). The growth rate coefficient varies in response to temperature, nutrients

and light. The non-predatory loss rate is also temperature dependent. The tem-

perature dependency is expressed in exponential form, commonly used in sur-

face water quality modelling (Bowie et al., 1985).

We use the MichaeliseMenten equation to model the total phosphorus and

total nitrogen limitation of growth. This equation responds linearly to concen-

tration at middle nutrient levels and approaches a constant value at elevated

nutrient values. The limitation by light is similarly calculated. The combined

effect of temperature, light and nutrients is calculated multiplicatively. Non-

predatory losses that contribute to the loss rate of phytoplanktondrespiration,

excretion and settlingdare coupled together. The grazing by crustaceans, the

main grazing zooplankton in Lake Pyhajarvi, also contributes to the decay of

the phytoplankton. The grazing loss is a product of zooplankton filtration rate

pi, crustacean zooplankton and phytoplankton biomass concentrations (Bowie

et al., 1985). The temperature and half-saturation effects are omitted here.

Growth and decay mechanisms are integrated into a minimal mass-balance

equation system for the wet weight concentration of algae. No spatial

Page 177: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

968 O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

1992 1993 1994 1995 1996 1997 1998 1999 20000

2

4

A 1A 2

A 3A 4

P tot

Nto

t

1992 1993 1994 1995 1996 1997 1998 1999 20000

0.5

1

1992 1993 1994 1995 1996 1997 1998 1999 20000

2

4

1992 1993 1994 1995 1996 1997 1998 1999 20000

1

2

1992 1993 1994 1995 1996 1997 1998 1999 20000

102030

1992 1993 1994 1995 1996 1997 1998 1999 2000200400600

800

1992 1993 1994 1995 1996 1997 1998 1999 20000

10

20

T

1992 1993 1994 1995 1996 1997 1998 1999 20000

200

400

I

1992 1993 1994 1995 1996 1997 1998 1999 20000

50100150

Z

1992 1993 1994 1995 1996 1997 1998 1999 20000

10

20

Q

Fig. 1. Time series of observed variables used in the model: biomass concentration (mg L�1) of A1, Diatomophyceae; A2, Chrysophyceae; A3, nitrogen-fixing Cy-

anobacteria; A4, minor groups of phytoplankton summed together; Ptot, total phosphorus concentration (mg L�1); Ntot, total nitrogen concentration (mg L�1); T,

water temperature (�C); I, global irradiance (W m�2); Z, grazing zooplankton biomass concentration (mgC L�1); Q, outflow rate (m3 s�1).

variations are taken into account, so the lake is modelled as a continuously

stirred tank reactor (CSTR). The use of this kind of model is supported by

the previous analyses of the trophic interactions in this lake by Sarvala

et al. (1998). Phytoplankton is divided into four groups (which are marked

as Ai, i ¼ 1;.; 4 in the equations). These are chosen to describe the succession

of three dominant groups plus an inhomogeneous group that consists of minor

species of phytoplankton. The groups are Diatomophyceae, Chrysophyceae,

nitrogen-fixing Cyanobacteria and the minor groups. So we arrive at the sys-

tem of equations

dAi

dt¼

~mi �~si

h�Q

V� piZ

!Ai; i¼ 1;2;3;4 ð1Þ

with

~mi ¼ miqT�Trefi

I

KIi þ I

P

KPi þP

N

KNi þN;

~si ¼ siqT�Trefs : ð2Þ

Page 178: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

969O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

Here P and N denote the total amounts of phosphorus and nitrogen minus that

included in the phytoplankton:

P¼ Ptot �X4

i¼1

aiAi; N ¼ Ntot �X4

i¼1

biAi; ð3Þ

where the constants ai and bi give the nutrient content of the corresponding

phytoplankton species. The notation is explained in Table 3. Note that the

growth term could contain yet another limitation factor, the suspended solids

concentration which causes increase in light attenuation in water and decrease

in growth of phytoplankton. Due to the small observed variation in suspended

solids and rather constant effect on phytoplankton, the effect of suspended

solids is omitted from the model.

2.3. Parameter estimation procedure

In general, we must assume that the mathematical formulation of the

model can adequately describe the mean behaviour of the biological system

with different environmental conditions (different control variables, see be-

low). The model is parameterized in such a way that it gives enough flexibility

to fit the observed data. The parameters should also have meaningful physical

and biological interpretations.

There is a tendency to overparameterize an environmental model by in-

cluding many assumed biological or physical mechanisms in it. Although

our model is reduced, the equations (1), (2) and (3) reveal that we have, in

principle, 10 parameters to be estimated for each of the four phytoplankton

groups. In addition, the initial spring values of the algae are noisy measure-

ments for each of the eight periods measured and should also be treated as

unknowns. Thus, a total of 72 unknowns are to be estimated. Many of the

parameters are clearly correlated. Moreover, both the control variables and

the response data (Fig. 1) have high noise levels. Obviously, it is not possible

to accurately estimate the parameter values in such a situation. Some of the

parameters may be considered as known from the literature or independent

measurements. Another natural remedy is to reduce the model. As noted

above, we have done this to some extent, e.g. by dropping the limiting factor

Table 3

Notations and units for the model parameters, data variables and constants

Model parameters

mi maximum growth rate at 20 �C (d�1)

si maximum non-predatory loss rate at 20 �C (m d�1)

qi, qs temperature coefficients for growth and non-predatory loss rate

KIi global irradiance half-saturation coefficient (W m�2)

KPi phosphorus half-saturation coefficient (mg L�1)

KNi nitrogen half-saturation coefficient (mg L�1)

pi zooplankton filtration rate (mgC L�1 d�1)

aI phosphorus content of Ai

bI nitrogen content of Ai

State and control variables

Ai phytoplankton concentration, i ¼ 1,2,3,4 (mg L�1)

P total phosphorus concentration available for the phytoplankton

(mg L�1)

Ptot total phosphorus concentration (mg L�1)

N total nitrogen concentration available for the phytoplankton

(mg L�1)

Ntot total nitrogen concentration (mg L�1)

Z zooplankton herbivore (crustacean) carbon mass concentration

(mgC L�1)

T, Tref temperature, the reference temperature (20 �C)

Q outflow (m3 s�1)

I global irradiance (W m�2)

ConstantsV volume of lake (m3)

h depth of lake (m)

of suspended solids. These measures, however, entail the risk of introducing

bias to the parameter values, or overly simplifying the model. We will here dis-

cuss an approach that provides a way to achieve reasonable results, even in the

presence of noisy data and an overparameterized model.

The estimation of parameters is done according to the Bayesian paradigm.

We want to treat all the uncertainties in data as well as the modelling results as

statistical distributions. Instead of a single fit to the data, we want to determine

‘‘all’’ the parameterizations of the model that adequately fit the data (see

Figs. 3 and 4). The posterior distributions of the parameters show how well

they are identified, the predictive distributions reveal to what extent the

parameter uncertainty is relevant with respect to the model predictions.

This, together with prior constraints for the parameters, offers a way to accom-

modate even ‘‘too many’’ or unidentifiable parameters in a model, provided

that the computational routines can handle such situations.

In practice, the approach consists of three key ingredients: firstly a formu-

lation of the a priori knowledge about the parameter values in form of prior

probability distributions; secondly a careful statistical analysis of the errors

in the measurement data; and thirdly the use of modern MCMC (Markov chain

Monte Carlo) sampling methods to generate the posterior probability distribu-

tions of the parameters and modelling results.

We follow an ‘‘objective’’ Bayesian paradigm. The priors are taken as as-

sumptions on the model and the data. Naturally, good priors are not always

available in advance. Their applicability can be tested by using the predictive

distributions and comparing them to present data and future observations. If

our model does not produce realistic predictions we must either change the

model or look for other specifications of the priors. The question is how to re-

tain a meaningful interpretation of the model in terms of interesting features of

the system but not to force the unidentified parameters to some given values. If

the parameters are well identified by the data, their posterior distribution will

reflect the fact and give precise estimates for them. On the other hand, if the

information provided by the data for a parameter is weak then its posterior will

follow the prior values. Sometimes we can have very precise priors, for exam-

ple by biological or physiological considerations or by independent previous

experiments. If no accurate prior information is available specifications of

the priors must be vague enough not to cause bias in the predictions.

The MCMC computations and adaptive MCMC strategies are demon-

strated and described in (Malve et al., 2005; Haario et al., in press). MCMC

is one of the main trends in computational statistics at the moment (Gelman

et al., 1995). Still, certain difficulties may be faced when applying the meth-

odology. The computational problems arise from the presence of many corre-

lated parameters. As a remedy, we have developed adaptive MCMC methods

that make the procedure statistically efficient and reduce the need for exhaus-

tive hand tuning of the algorithm. A typical obstacle in applying MCMC

methods is to find a suitable ‘‘proposal’’ distribution for the generation of

new samples. Instead of using a fixed proposal distribution our approach

adapts the proposal during the sampling according to the Adaptive Metropolis

algorithm (AM). In addition, a number of different scales of the proposal dis-

tribution are used employing the Delayed Rejection (DR) method (Haario

et al., 2001, 2004, in press).

The calculations are done using a combination of Matlab and Fortran sub-

routines. We have developed a Matlab ‘‘Toolbox’’ for the adaptive MCMC

computations. It is available upon request from the authors.

2.3.1. Errors in the measurements

The modelled algal concentrations can be expressed as

Amodt ðtÞ ¼ fiðt; xt;qÞ; ð4Þ

where fi is the numerical solution of the algal dynamics eq. (1), i ¼ 1;.; 4 are

the four algal groups, t is time, xt is a vector of time dependent control vari-

ables describing the environmental conditions (Fig. 1). The vector q is

a time independent set of the model parameters that also includes the concen-

trations at the initial time t0.

We use a Gaussian distribution with a square root transformation as an ap-

proximation to the true error structure. Some of the observed algal biomasses

are low in the beginning of the summer period and much higher during the

bloom. The variability in the observations increases with their magnitude.

The variance of the original observed values can be seen to be approximately

Page 179: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

970 O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

proportional to the observed mean value. The square root transformation

stabilizes the residual variance in this case.

The raw measurements are counts calculated using a microscope, and the

biomass concentrations are derived values of these counts. However, each al-

gal group is a combination of several different species of algae that vary during

the summer and between the years. This makes the use of a Poisson distribu-

tion, typically used with count data, infeasible. The actual error term of the

model will contain all the unexplained factors and thus includes several sour-

ces of errors other than the pure observational error due to the measurements,

which also suggest the use of Gaussian distribution. It will also produce real-

istic simulated observations for the range of the time and for different environ-

mental conditions under consideration (Section 3.2).

Our non-linear model (1) describes the dynamics of the system in a non-

transformed scale so the model has to be transformed accordingly for the fit-

ting procedure. Thus the observational error, used for calibrating the model pa-

rameters against data, is modelled as a Gaussian random variable in such way

that it is additive with respect to the square root of the modelled

concentrations:

ffiffiffiffiffiffiffiffiffiffiffiffiffiAobs

i ðtÞq

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiAmod

i ðtÞq

þ 3i; ð5Þ

where 3i is approximated by a zero mean Gaussian variable, independent of

each other and the observational time t, with a constant standard deviation,

that may be different for each different algal groups i.

2.3.2. Definition of prior values for parameters and initial values

The strongest form of a prior is to assume that a parameter is a fixed con-

stant, as obtained, e.g., from literature. Alternatively, a ‘‘fixed’’ constant may

be treated as a parameter with a narrow prior distribution. If the posterior dis-

tribution then coincides with the prior, we may be satisfied with the prior

value, otherwise we might question how well known the constant actually

is. If no prior value is known or if we want the posterior value to depend solely

on the observed data, a flat ‘‘non-informative’’ uniform prior is preferred, per-

haps with positivity constraint as with many of the parameters in the present

case. Nevertheless, every new parameter increases the dimension of the vector

to be sampled, and increases the computational burden.

Strong systematic changes in the concentrations actually only happen in

the Cyanobacteria group (Fig. 1 and later Fig. 4). Thus, we expect that the pa-

rameters of that group will be better identified, while the posteriors for the pa-

rameters of the other groups will more likely resemble the respective prior

distributions.

For the initial algal concentrations Ai(t0) there are measured values avail-

able for each spring, with the measurement noise known to the extent de-

scribed in the previous section. Gaussian priors are used, with the measured

values as the centre points and variances taken from the measurement noise

level. The assumed accuracies of algal concentrations for the groups 1, 2

and 4 are 50%, 30%, and 30% of the observed value, respectively, while for

the Cyanobacteria group 3 having very low initial spring concentrations we

use an absolute value of 0.02 mg L�1 as the standard deviation.

For the maximum growth rates mi, non-predatory loss rates si and zoo-

plankton filtration rate pi we use non informative priors, with positivity con-

straints only.

Estimates of the temperature coefficients qi of the growth rates differ

widely (between 1.0 and 1.5) in the literature, see for example Bowie et al.

(1985) p. 303. The standard temperature dependency relation used here is bi-

ologically meaningful only for a limited range of the values for the coeffi-

cients, and we specify more informative priors for them. The priors are

approximated by Gaussian distributions so that they agree with the tempera-

ture-growth curves originally published by Canale and Vogel (1974). We

use the priors N(1.01, 0.082), N(1.07, 0.082), N(1.16, 0.082) and N(1.07,

0.082) for the parameter qi for four algal groups in the model, respectively,

with lower bound 1 for all of them to allow only positive effect of the temper-

ature. The prior for the nitrogen-fixing Cyanobacteria is highest, suggesting

that it is the most sensitive with respect to temperature. For a prior value

for the temperature coefficient qs of non-predatory loss rate we use N(1.20,

0.092) according to Bowie et al. (1985).

For the half-saturation parameters we use wide Gaussian priors with addi-

tional positivity requirements. They agree with the rather wide literature

ranges (Bowie et al., 1985) (Table 4).

Lastly, we used fixed values for the phosphorus and nitrogen content

parameters ai ¼ 4 and bi ¼ 27, i ¼ 1;.; 4. We are left with seven unknown

parameters for each of the four algal groups and a common temperature coef-

ficient for non-predatory losses qs. In addition, we have the four initial algal

concentration values for each of the eight summers, adding up to 61 parame-

ters to be estimated.

The error term of the model, 3i in eq. (5), was assumed to follow Gaussian

distribution with unknown variance. For the variance we use standard non-

informative conjugate prior defined by an inverse gamma distribution. Sepa-

rate error variances are estimated for each of the four algal groups. The

estimation of the four unknown error variances can be treated separately

from the rest of the parameters due to the prior-posterior conjugacy property

and they do not add to the computational burden (Gelman et al., 1995).

2.4. Posterior simulation methods

Even with highly correlated parameters with large uncertainties, the pre-

dictions and simulations made with the model need not be imprecise. A param-

eter may be unidentifiable because the values of it have practically no impact

on the model predictions. By performing the predictions repeatedly with sam-

pled parameter values from the posterior, this issue is properly taken into ac-

count. Posterior values can also give us new insight into the strength of effects

on plankton mass occurrences. This insight is valuable in the management of

these mass occurrences. We are particularly interested in comparing effects of

phosphorus, nitrogen and grazing zooplankton (Crustacea) on phytoplankton

biomass.

Table 4

Prior and posterior means and standard deviations of the estimated parameters

of the phytoplankton model

Parameter Prior mean Prior SD Posterior mean Posterior SD

m1 . . 0.0886 0.043

m2 . . 0.0465 0.033

m3 . . 0.329 0.089

m4 . . 0.212 0.10

s1 . . 0.0845 0.062

s2 . . 0.137 0.077

s3 . . 0.349 0.20

s4 . . 0.0463 0.025

q1 1.01 0.08 1.14 0.051

q2 1.07 0.08 1.07 0.049

q3 1.16 0.08 1.16 0.060

q4 1.07 0.08 1.13 0.051

qs 1.20 0.09 1.05 0.060

KI1 100 100 61.9 48

KI2 100 100 115 60

KI3 100 100 16.4 11

KI4 100 100 134 67

KP1 2.5 10 10.4 5.3

KP2 5 10 8.27 4.9

KP3 10 20 5.50 3.2

KP4 20 50 77.3 28

KN1 10 20 14.5 11

KN2 10 40 32.9 22

KN3 10 20 21.0 13

KN4 10 40 45.7 28

p1 . . 0.0438 0.036

p2 . . 0.0665 0.046

p3 . . 1.09 0.33

p4 . . 0.0802 0.033

Single dot (.) in the table means that uniform prior with a positivity constraint

is used for the corresponding parameter.

Page 180: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

971O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

We want to see the effects of various environmental conditions (the control

variables in the lake model) to the interesting features of the modelled algal

system. One such feature is the mean Cyanobacteria concentration during

the summer as this is a good indication of the water quality for recreational

use, fisheries and specifically for the good ecological status of the lake. As

we are studying what happens during one summer period of the lake we

must specify the control variables as continuous profiles. The specification

of these profiles for input into the simulations poses some challenges. We

base the profiles on observed time series used in the estimation.

Minimum and maximum profiles are constructed from the observations as

smoothed versions of corresponding observed values, for example the

smoothed minimum daily temperatures over the years. This way we can

have a continuous transition from minimum to maximum observed values,

and each profile can still be summarized for example by its average over

the simulation period. The profiles used in simulations are chosen to vary

between the extremes, i.e.

ProfileðlÞ ¼ 1� l

2minProfileþ 1þ l

2maxProfile; ð6Þ

where l is in the interval from �1 to 1. Naturally, any interesting or a priori

known feature of the profile can be incorporated and used in ‘‘what if’’ anal-

ysis of the system.

We consider four variables: temperature, zooplankton, total phosphorus

concentration and total nitrogen concentration. The profile of global irradiance

was kept fixed to some realized profile. The nitrogen concentrations are rela-

tively high through the whole summer and we do not observe any effect of to-

tal nitrogen on the maximum Cyanobacteria biomass concentration. Thus the

simulations are done on the median total nitrogen profile level only. It should

be noted that when constructing the simulation profiles in this way we do not

need to extrapolate the model beyond the range of observed data. This allows

us to see the effect of varying conditions in a way that agrees with the observed

variation in water quality.

For the dimensioning of biomanipulation and nutrient reduction we are in-

terested to define the conditions under which algal blooms are very unlikely.

We formulate this such that there will be less than 5% posterior probability

that the mean observed Cyanobacteria concentration during late summer pe-

riod (from July 26 to September 15) exceeds the value 0.86 mg L�1. The

date span for the averaging is chosen to be comparable with earlier studies

(Sarvala et al., 1998). This is the period when the highest cyanobacterial con-

centrations normally occur and therefore these means are sensitive indicators

of cyanobacterial abundance. Moreover, the current lake management guide-

lines are based on the average algal concentrations. The value 0.86 mg L�1

is our subjective judgment for the boundary between good and acceptable eco-

logical status of this particular lake.

3. Results

3.1. Posterior distributions

We have to run MCMC sampling chains for a vector withdimension 61. Several of the parameters are highly correlated,which further complicates the task. Indeed, the use of basicMCMC schemes like the standard Metropolis-Hastings algo-rithm tends to lead to a rather time consuming ‘‘tuning’’ be-fore an effective sampling distribution is found. The problemcan be successfully dealt with our adaptive schemes (Haarioet al., 2001, in press). As a result, we obtain a MCMC chainof the sampled parameter values, a large matrix of samplesfrom the multidimensional posterior distribution of the un-knowns. From it we may generate several marginal posteriordistributions of interest.

Using 8 years of observations of the water quality and hy-drology of the lake, the parameters of the eq. (1) are estimated.Table 4 gives the numerical values used for constructing the

priors, as well as the mean and standard deviation values com-puted from the posterior sample. In Fig. 2 the prior and mar-ginal posterior distributions of the model parameters aregraphically presented. As expected, we can see that the param-eters corresponding to the Cyanobacteria group in the thirdcolumn differ most clearly from the prior distributions.

For growth (mi) and for non-predatory loss rate (si), forwhich a uniform prior was used, we can see that the standarddeviations of posterior distributions are quite high, but thevalues remain within published and biologically acceptablerange (Bowie et al., 1985).

For the temperature coefficients (qi) of the growth terms,rather informative priors were chosen, and the posteriors differquite little from them. A priori, the coefficient for the Cyano-bacteria group was taken to be a little bit higher compared tothe other groups, and the posterior values agree. However, tak-ing into account the posterior standard deviations of the pa-rameters, no difference can be found between the fourcoefficients. Posterior distribution of the temperature coeffi-cient qs for the non-predatory loss rate is located around some-what smaller values (1.05 � 0.06) than its prior distribution(1.20 � 0.09). The � uncertainties are posterior standarddeviations.

The global irradiance half saturation constants (KIi) have

vague priors and the posterior variances become smaller. Var-iations in half saturation constants among the four phytoplank-ton groups were notable, Cyanobacteria having the lowestvalue, indicating that Cyanobacteria may tolerate lower lightlevels.

The half saturation constants (KPi and KNi) of total phos-phorus and nitrogendthe variables for which we can havesome controldalso had loose priors. The posteriors are flat,too, except for the posterior of KP3 and KI3 for Cyanobacteria,which have rather well identified peaks. All the other half sat-uration parameters are not identified by the data.

Except for the Cyanobacteria, the average posterior filtra-tion rates ( pi) (Table 4) are lower than the published ranges(0.7e1.4 [L mgC�1 d�1]) (Bowie et al., 1985). Cyanobacteriahave a notably larger posterior mean (1.09 � 0.33). This mayindicate a greater effect of grazing for Cyanobacteria comparedto the other groups but disagrees with the fact that in generalsome Finnish Cyanobacteria species are poisonous andthat crustacean zooplankton does not favour them as food(Sandgren, 1988). This may also indicate some other, moreindirect effect of zooplankton abundance on Cyanobacteria.

The posterior means and accuracies of the standard de-viations of the observation error for the four algae groupsare 0.25 � 0.016, 0.14 � 0.0093, 0.26 � 0.017, and 0.15 �0.0096. These are the standard deviations of the term 3i ineq. (5), so they correspond to the observation error in thesquare root scale.

As a summary of the one dimensional parameter posteriordistributions we may say that the parameters for Cyanobacte-ria are better identified and have smaller standard deviationsthat those of the other three algae groups.

While the one-dimensional marginal posteriors are informa-tive, they can only tell a partial truth of the multidimensional

Page 181: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

972 O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

0 0.2 0.4μ 1 μ 2 μ 3

0 0.1 0.2 0.2 0.4 0.6 0 0.2 0.4 0.6

μ 4

0 0.1 0.2 0.3

σ 1 σ 2 σ 3

0 0.2 0.4 0 0.5 1 0 0.05 0.1 0.15

σ 4

1 1.5

θ 1 θ 2 θ 3

1 1.5 1 1.5 1 1.5

θ 4

0 200 400

K I1

K I2

K I3

K I4

0 200 400 0 100 200 0 200 400

0 20 40

K P1

K P2

K P3

K P4

0 20 40 0 20 40 0 100 200

0 50 100

K N1

K N2

K N3

K N4

0 100 200 0 50 100 0 100 200

0 0.1 0.2

p 1 p 2 p 3 p 4

0 0.1 0.2 0.5 1 1.5 2 0 0.1 0.2

Fig. 2. Prior (dashed line) and marginal posterior distributions (solid line) of the estimated model parameters. Uninformative priors are used for mi, si and pi. mi,

maximum growth rate at 20 �C (d�1); si, non-predatory loss rate at 20 (m d�1); qi, temperature coefficients for growth and non-predatory loss rate; KIi, global

irradiance half-saturation coefficient (W m�2); KPi, phosphorus half-saturation coefficient (mg L�1); KNi, nitrogen half-saturation coefficient (mg L�1); pi,

zooplankton grazing coefficient (mgC L�1 d�1).

problem. In studying the identifiability and uncertainty in theestimated values of the model parameters, the inspection ofthe mutual correlations of the parameters can be revealing.Some of the correlations between the parameters are due tolack of information in the data, some are due to mathematicalstructure of the model equations. As an example of the lattercase, the correlation between growth and loss parameters isreadily seen: the terms mi and si/h appear additively with op-posite sign in the model equation (1). An increase by any con-stant to both terms does not change the model at all. Theintroduction of limiting factors due to temperature, nutritionand light in eq. (2) makes it possible to distinguish betweenthe maximum growth and loss factors given that the availabledata are informative. Similarly, although not quite as

obviously, the factors limiting the growth terms appear ina multiplicative manner, which is bound to lead to correlationsbetween the respective parameters. This makes it difficult todifferentiate the individual effects of the various terms in themodel.

In such situations, without any prior restrictions to the pa-rameter values, the two dimensional marginal posterior re-gions can be practically infinite. The use of reasonablepriors prevents this, and the MCMC sampling is able to revealthe correlations. Fig. 3 exhibits examples of two-dimensionalposterior distributions in our case.

In the traditional, optimization-based, model fitting it iscustomary to calculate the error bounds for the fitted parame-ters by linearizing the model at the best fitting argument value

Page 182: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

973O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

0

0.5

1

σ 3

μ3

1

1.1

1.2

1.3

θ 3σ3

0.2 0.4 0.60

5

10

15

K P3

0 0.5 1 1 1.1 1.2 1.3

θ3

Fig. 3. Two-dimensional marginal posterior distributions for parameters m3, s3, q3, and KI3 of the Cyanobacteria group (see Table 3 for explanations). The dots

shows the points in the MCMC chain from which the distribution contour lines (for the 50% and 95% regions of the distribution) are constructed using a statistical

kernel density estimation method. Distributions drawn along the axis are the corresponding one dimensional marginal densities.

and then using the formulas from the statistical theory of linearmodels. While this always is an approximative procedure fornon-linear models, there is another, numerical pitfall espe-cially in cases with weakly identified, strongly correlated pa-rameters: the Jacobian matrix constructed in the linearizationbecomes numerically singular, making the calculated correla-tion matrices singular, too, and preventing the calculations forthe error bounds of the parameter estimates. In the MCMC ap-proach, on the other hand, we are able to calculate the MCMCsampling chains, from which the correlation coefficients forindividual parameter pairs are readily computed.

In Bayesian statistical terminology the term predictive dis-tribution refers to the future values of the response variablepredicted by the posterior parameter distribution. This in-cludes the distribution or uncertainty of the ’’fitted values’’,the modelled values for the observed data, and also the uncer-tainty in predicting new results. When using the MCMC sam-pling approach, the distributions are technically calculated inthe same way in both cases: the model forecasts are calculatedusing sampled parameter values from their posterior distribu-tions. This is repeated often enough, so that the distributionof the modelled values is reliably obtained. We will firstdiscuss the distributions of the model fits; the use of themodel for various simulation purposes is dealt with inSection 3.2.

The variance of the predictive distribution reflects the pre-dictive power of the model. A large variance may be due touncertainties in the model, effects not taken into account inthe model or noise in measurements. In Fig. 4 we have pre-sented the values that correspond to the 95% predictive inter-vals of the fitted model due to parameter uncertainty as a dark

grey zone. The solid line is the median algal concentration(close to the best least squares fit). The light grey areas givethe 95% prediction limits for the observations. They containboth the parameter uncertainty, the uncertainties due to model‘‘lack-of-fit’’ (the mis-specification of the model) and thenoise of the measurements.

We can see from Fig. 4 that the model fits the rather noisydata sufficiently well, although the fit is not perfect. The pre-dictive limits of for the observations cover the data reasonablywell. The same set of parameters was used to model each ofthe eight years. The cyanobacterial blooms were predictedby the model in every year they were actually observed. It isimportant to note that the predictive limits of the fitted modelare far narrower than predictive limits of the observations,even if many of the parameters are only weakly determinedby the data. We shall return to this below when discussingthe distributions of various simulation situationsdthe full pre-diction limits (model uncertainty plus observational variance)help us to quantify risks associated to lake management deci-sions guided by such modelling predictions.

3.2. Posterior simulations

To see the effects of temperature (T ), zooplankton (Z ) andtotal phosphorus Ptot on nitrogen-fixing Cyanobacteria wetbiomass concentration (A3), we perform simulations usingthe model, the estimated parameters, and with varying controlvariable profiles.

The observed total nitrogen concentration (Ntot) is so highin all the observed years that it does not seem to affect thegrowth of Cyanobacteria. So the Ntot profile is kept at median

Page 183: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

974 O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

5 6 7 8 9 10 110

1

2

3diatoms

1992

5 6 7 8 9 10 110

0.5

1chrysphycea

5 6 7 8 9 10 110

1

2

3

n.fix cyanob

5 6 7 8 9 10 110

0.5

1

1.5

2minor

5 6 7 8 9 10 110

1

2

3

1993

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1994

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1995

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1996

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1997

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1998

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

5 6 7 8 9 10 110

1

2

3

1999

5 6 7 8 9 10 110

0.5

1

5 6 7 8 9 10 110

1

2

3

5 6 7 8 9 10 110

0.5

1

1.5

2

Fig. 4. Plots of fitted models and uncertainties during the growing season. Circles (B) present the observed algae wet biomass concentrations (mg L�1). Solid lines

show the median fits obtained by MCMC method. Darker areas correspond to 95% posterior limits of the model uncertainty and the lighter areas show uncertainty

in predicting new observations. The horizontal axis shows months for the year.

Page 184: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

-

975O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

observed level in all of the simulations. The profile of the fifthcontrol variable in the model, global irradiance (I ), is keptfixed to one observed profile.

Simulations of the summer maxima of A3 concentrations,conditioned on the levels of control variables, can easily beobtained to reveal their relative effects. We chose a valuethat reflects the water quality of the lake: the mean observedA3 concentration during late summer period from July 26 toSeptember 15. Again the simulations are done on a grid ofvarying Ptot, Z and temperature profiles. To take into accountthe uncertainty in both the parameters and the observations,the simulations are repeated with model parameters sampledfrom their posterior distributions and the observations sampledaccording to their estimated error distribution. The results canbe visualized for example by plotting 3-dimensional mean A3

surfaces with averages of the Ptot and Z profiles on the x and yaxes and with a separate surface for different temperature pro-files. These results can be used to judge the effects of bioma-nipulation and nutrient reduction.

We can aid decision making by providing probability state-ments regarding some event of interest, given the control vari-able profiles. Fig. 5 displays the probability (obtained by thesimulation described above) that the mean observed latesummer Cyanobacteria concentration exceeds our predefinedwater quality criteria (0.86 mg L�1) determining good ecolog-ical status. Combining the information in the surfaces ofFig. 5, we can produce a more compact representation, shownin Fig. 6. It may be used to estimate the limits on total phos-phorus (upper limit) and zooplankton conditions (lower limit)for different mean temperatures that assure attainment of thechosen water quality criteria with 95% probability.

Calculated limits indicate that with increasing temperatureand total phosphorus more zooplankton is needed to compen-sate their effects and to attain our Cyanobacteria criteria. Forexample, in summer with average temperature 18.5 �C, zoo-plankton carbon mass concentration should at a minimum bebetween 45e60 mgC L�1, depending on total phosphorus con-centration. If the summer mean total phosphorus concentration

10 20 30 40 50 60 70 8015

16

17

18

19

20

21

22

23

24

25

16.5°C17°C

17.5°C18°C

18.5°C 19°C

Z [µg L−1]

P tot

[µg

L−1]

P(A3mean>0.86) = 0.05

Fig. 6. Control variable limits to the exceedance of 0.86 mg L�1 summer mean

Cyanobacteria with 0.05 probability. Each line is for different mean tempera

ture profile. Ptot is total phosphorus concentration (mg L�1) and Z is grazing

zooplankton biomass concentration (mgC L�1).

050

100

15

20250

0.5

1

Z

temp = 16.5°C

Ptot

Ptot Ptot Ptot

PtotPtot

prob

050

100

15

20

250

0.5

1

Z

temp = 17°C

prob

050

100

15

20

250

0.5

1

Z

temp = 17.5°C

prob

050

100

15

20

250

0.5

1

Z

temp = 18 °C

prob

050

100

15

20

250

0.5

1

Z

temp = 18.5 °C

prob

050

100

15

20

250

0.5

1

Z

temp = 19 °C

prob

Fig. 5. Probability that the summer mean Cyanobacteria level is greater than 0.86 mg L�1. Ptot is total phosphorus concentration (mg L�1), temp is water temper-

ature (�C), and Z is grazing zooplankton biomass concentration (mgC L�1).

Page 185: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

976 O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

can be limited below 16 mg L�1 the mean zooplankton carbonmass concentration need not be over 45 mgC L�1. As we seefrom our observational data in Fig. 1, the Z concentration isoften below this limit and more extensive fishing of planktiv-orous fish is needed to reach this limit. Within the observa-tional range, total phosphorus has a marginal effect onCyanobacteria compared to the zooplankton grazing effect, al-though the phosphorus effect increases slightly with tempera-ture (Fig. 6). These results agree with the more qualitativeresults in (Sarvala et al., 1998), where increased Z (due to fish-ing planktivorous fish) was also seen to be more effective thanthe reduction of total phosphorus, and also with recent exper-imental comparisons of the functioning of southern and north-ern lake ecosystems (Vakkilainen et al., 2004).

4. Discussion

Our study aims to estimate trophic effects of nutrients andgrazing zooplankton (Crustacea) on phytoplankton groupsdspecifically on nitrogen-fixing Cyanobacteriadin Lake Pyha-jarvi using a non-linear phytoplankton model and MCMCcomputational methods for the parameter estimation. Experi-ence from our earlier study (Malve et al., 2005) encouragedus to use the power of MCMC to statistically reveal the uncer-tainties in water quality predictions with this mathematicallymore complex model. We explore the common ground be-tween statistical computational methods and mathematicalprocess modelling and their relative usefulness in the manage-ment of lake eutrophication (Barlund et al., 2007).

Numerous studies have been undertaken to examine nutri-ent and grazing effects on phytoplankton mass occurrenceseither with statistical ANOVA or least square fitting of deter-ministic models. Studies have been based either on controlledlaboratory and meso-scale experiments (Vakkilainen et al.,2004) or on monitoring data at the whole lake scale. In con-trolled experiments, the separation of confounded effects usingstatistical methods is far more efficient than in uncontrolledobservational studies. The use of non-linear dynamic phyto-plankton models and observational data, without proper statis-tical treatment, can be very inefficient and even misleading.Model-based simulation of phytoplankton dynamics can beused to study the average spatial and temporal patterns ina lake system. But to do so in a realistic and statistically soundmanner we need careful examination of the uncertainties in theestimation of non-linear models and also realistic modelling ofthe observation error. The management of lake eutrophica-tiondlike the estimation of frequency of nitrogen-fixingCyanobacteria bloomsdshould be based on the best possiblecombination of mathematical and statistical methods avail-able. This can be accomplished by using mechanistic model-ling approach with the MCMC methods.

Comparison of first order error analysis, Monte Carlo sim-ulation and Kalman filtering of a non-linear, dynamic lake eu-trophication model performed by Scavia (1980) revealed hugeuncertainty ranges in predictions. Long after that, least squarepoint estimates of parameters and rough linear error approxi-mations of predictions have been justified by lack of any other

computationally feasible alternatives (Omlin et al., 2001). Thelinearized approximations are generally valid only if the non-linearities in the model equations are not significant within pa-rameter uncertainty ranges, which is not generally true for lakeeutrophication models. Sometimes model runs are so time-consuming that the sensitivity of the model predictions to pa-rameter values can only be explored manually using publishedranges (Malve et al., 2003). However, this change-one-param-eter-at-a-time strategy has one serious drawbackdwe lose theinformation on the multidimensional correlations between theparameters. Implementation of the European Water Frame-work Directive has underlined the necessity for uncertaintyanalysis, as recently discussed by Saloranta et al. (2003).More sophisticated statistical methods are needed for the esti-mation of parameters and distributions of the modelpredictions.

The lack of suitable software and the need for problem spe-cific coding may prevent the use of MCMC methods. Also,MCMC computations are typically quite demanding of CPUtime. The model must be repeatably simulated to explore pos-terior distributions of the parameters. A large number of pa-rameters increases the number of simulations needed. Thecurrent model, with more than 60 estimated parameters, wassimulated several million times, each simulation containinga numerical solution of a set of differential equations.

Although the technical details are not fully explained here,we have used, in principle, quite standard and easily applica-ble computational methods for the posterior calculations. Theexception is that we have developed a suite of adaptivemethods that complement the standard MetropoliseHastingsMCMC algorithm, making it more robust and statistically ef-ficient without the need for overwhelming ‘‘hand tuning’’ ofthe algorithms. Computational details are discussed in a com-panion article (Haario et al., in press). It has become clear tous, with this present application and with several similar bio-logical and geophysical applications, that this methodology isvery useful. It gives accurate and more meaningful results inproblems and models that are more complicated than thoseusually used for statistical estimation in environmental sci-ences. We are positive that in the future the MCMC method-ology will allow more realistic models to be used and, andits risks in the predictions more accurately estimated. Also,these methods will better reveal the problems with overlyparameterized models and poor quality of the observations.

Acknowledgements

This research was partly funded by the Academy of Fin-land’s MaDaMe project Development of Bayesian methodswith applications in geophysical and environmental research,and also by the Land and Water Technology Foundation, Fin-land (Maa-ja vesitekniikan tuki ry). Lake data were derivedfrom the archives of the Pyhajarvi Protection Fund, SouthwestFinland Regional Environment Centre, and the University ofTurku. Our special thanks are due to Kristiina Vuorio for pro-viding detailed data on Cyanobacteria and other phytoplank-ton. Data collection and analyses were partly financed

Page 186: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

977O. Malve et al. / Environmental Modelling & Software 22 (2007) 966e977

through a series of grants from the Academy of Finland to J.S.(e.g. 35619, 63764, 201414).

References

Bowie, G., Mills, W., Porcella, D., Campbell, C., Pagenkopf, J., Rupp, G.,

Johnson, K., Chan, P., Gherini, S., Chamberlin, C., 1985. Rates, constants,

and kinetic formulations in surface water modeling. Tech. Rep. EPA/600/

3e85/040, U.S. Environmental Agency, ORD, Athens, GA, ERL.

Barlund, I., Kirkkala, T., Malve, O., Kamari, J., 2007. Assessing SWAT model

performance in the evaluation of management actions for the implementa-

tion of the water framework directive in a Finnish catchment. Environmen-

tal Modelling and Software 22 (5), 719e724.

Canale, R.P., Vogel, A.H., 1974. Effects of temperature on phytoplankton

growth. ACSE, Journal of the Environmental Engineering Division 100

(EE1), 231.

Gelman, A., Carlin, J., Stern, H., Rubin, D., 1995. Bayesian Data Analysis.

Chapman and Hall, London.

Haario, H., Laine, M., Lehtinen, M., Saksman, E., Tamminen, J., 2004.

MCMC methods for high dimensional inversion in remote sensing. Journal

of the Royal Statistical Society, Series B 66, 591e607.

Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive Metropolis algo-

rithm. Bernoulli 7 (2), 223e242.

Haario, H., Laine, M., Mira, A., Saksman, E. DRAM: Efficient adaptive

MCMC. Statistics and Computing, in press.

Helminen, H., Sarvala, J., 1997. Responses of lake Pyhajarvi (southwestern

Finland) to variable recruitment of the major planktivorous fish, vendace.

Canadian Journal of Fisheries and Aquatic Science 54, 32e40.

Malve, O., Laine, M., Haario, H., 2005. Estimation of winter respiration rates

and prediction of oxygen regime in a lake using Bayesian inference. Eco-

logical Modelling 182 (2), 183e197.

Malve, O., Salo, S., Verta, M., Forsius, J., 2003. Modeling the transport of

PCDD/F compounds in a contaminated river and the possible influence

of restoration dredging on calculated fluxes. Environmental Science and

Technology 37 (15), 3413e3421.

Omlin, M., Brun, R., Reichert, P., 2001. Biogeochemical model of lake Zurich:

Sensitivity, identifiability and uncertainty analysis. Ecological Modelling

141 (1e3), 105e123.

Saloranta, T., Kamari, J., Rekolainen, S., Malve, O., 2003. Benchmark criteria:

A tool for selecting appropriate models in the field of water management.

Environmental Management 32 (3), 322e333.

Sandgren, C.D., 1988. Strategies of Freshwater Phytoplankton. Cambridge

University Press, Cambridge.

Sarvala, J., Helminen, H., Karjalainen, J., 2000. Restoration of Finnish lakes

using fish removal: changes in the chlorophyll-phosphorus relationship in-

dicate multiple controlling mechanisms. Verhandlungen Internationale

Vereinigung fur Limnologie 27, 1473e1479.

Sarvala, J., Jumppanen, K., 1988. Nutrients and planktivorous fish as regula-

tors of productivity in Lake Pyhajarvi, SW Finland. Aqua Fennica 18,

137e155.

Sarvala, J., Helminen, H., Saarikari, V., Salonen, S., Vuorio, K., 1998.

Relations between planktivorous fish abundance, zooplankton and phyto-

plankton in three lakes of differing productivity. Hydrobiologia 363,

81e95.

Scavia, D., 1980. Uncertainty analysis of lake eutrophication model. Ph.D.

thesis, Environmental and Water Resources Engineering, University of

Michigan.

Vakkilainen, K., Kairesalo, T., Hietala, J., Balayla, D.M., Becares, E., van de

Bund, W., van Donk, E., Fernandez-Alaez, M., Gyllstrom, M.,

Hansson, L.-A., Miracle, M., Moss, B., Romo, S., Rueda, J., Stephen, D.,

2004. Response of zooplankton to nutrient enrichment and fish in shallow

lakes: a pan-European mesocosm experiment. Freshwater Biology 49,

1619e1632.

Ventela, A.-M., Kirkkala, T., Sarvala, J., Mattila, H., 2001. Stopping the eutro-

phication process of Lake Pyhajarvi. In: 9th International Conference on

the Conservation and Management of Lakes, 11-16 November 2001,

Otsu, Japan. Conference Proceedings, Session 3-1, pp. 485e488.

Zeng, X., Rasmussen, T.C., Beck, M.B., Parker, A.K., Lin, Z., 2006. A biogeo-

chemical model for metabolism and nutrient cycling in a southeastern

piedmont impoundment. Environmental Modelling and Software 21 (8),

1073e1095.

Page 187: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

133

V

Publication V

Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a relationshipsin Finnish Lakes. Environmental Science & Technology 40 (24), pp. 7848–7853.DOI: 10.1021/es061359b.

c© 2006 ACS Publications

Page 188: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

134

Page 189: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Estimating Nutrients and Chlorophylla Relationships in Finnish LakesO L L I M A L V E *

Finnish Environment Institute, Helsinki, Finland

S O N G S . Q I A N

Duke University, Nicholas School of the Environment andEarth Sciences, Durham, North Carolina

We model the response of chlorophyll asa surrogate forthe phytoplankton community volumesto variations inlake total phosphorus (TP) and total nitrogen (TN)concentrations. The model is fitted to a large cross-sectional data set from the Finnish Lake monitoring network.The objective is to support the Finnish Government inidentifying management actions to achieve compliance ofthe chlorophyll a concentration standard with a givenconfidence level and to provide tools for the estimation ofcritical (target) loads for nutrients in monitored lakes.We develop a Bayesian hierarchical linear model whichcombines advantages of both the currently preferred non-hierarchical lake-type-specific linear model and lake-specific linear model fitted separately using data from asingle lake. The hierarchical model is less biased at lake-level compared to the lake type model. In contrast tothe lake model, it predicts the lake specific chlorophyll aresponse to nutrients outside the lake specific observationalrange. The hierarchical model is used to calculateprobabilities of chlorophyll a concentration exceeding thestandard under different nitrogen and phosphorusconcentration combinations. These probabilities can beused to estimate acceptable nitrogen-phosphorusconcentration combinations by a lake manager. We discusshow our study can be useful in implementing the EuropeanWater Framework Directive.

1. IntroductionAlthough more and more information on lake hydrology,chemistry, and biology are collected from well-designed andfunded water quality monitoring networks, the lack ofadequate data is often the biggest obstacle in developing awater quality management plan (National Research Council(1)). The complexity of natural processes in lakes andreservoirs makes it difficult to transform routinely monitoreddata into scientific knowledge that can be used for supportinglake-specific management decision making (2). Even withthe increased monitoring effort, lake-specific data are oftensparse. As a result, data from similar lakes are often pooledto increase the sample size to achieve the necessary statisticalpower. Once a set of lakes and reservoirs are classified asbelonging to a same type, lake-specific observations are oftenpooled for analysis. Linear regressions are applied tocombined observations within a type of lakes and reservoirs.For example, EUTROMOD (3) is a collection of empiricaleutrophication models fitted to cross-sectional lake data fromdifferent geographical regions in the U.S.

The assumption of homogeneity within a type is rarelyrealistic. Therefore, neither an empirical model based on thedata of all lakes within a type nor a model based on the dataof a single lake alone are satisfactory. A Bayesian hierarchicallinear model that combines data from multiple types of lakesto predict single lakes within the lake type provides a tradeoffbetween the models fitted to the data of a lake type and tothe data of a single lake. The Bayesian hierarchical modelallows aggregation at different levels and provides a mech-anism for pooling information to strengthen a model’s lake-level predictive power.

Hierarchical and nonhierarchical linear regressions of totalphosphorus (TP) and total nitrogen (TN) on chlorophyll a(Chla) are fitted to the data from the Finnish lake monitoringnetwork. The lakes in the network have been classified intonine lake types for the determination of ecological status (4)by the Finnish Environment Institute (SYKE) (5). The lakeswere classified according to the expert assessment and tolake morphological and chemical metrics such as depth,surface area, and color. It was believed that the accuracy ofthe estimates of the standard (natural) ecological status ofthe lake types, the human impacts, and of the ecologicalstatus of the lakes can be optimized that way.

Updating of parameters and predictive distributions hasbeen proposed for the implementation of the EU Waterframework directive (7) to facilitate the adaptive decisionmaking process. The models may be fitted and the waterquality management decisions refined sequentially over timewhen the new monitoring data becomes available. Becauseof the Bayesian nature of our models, we are able to derivea lake-specific predictive probability distribution of Chlaconcentration, which can be used to estimate the probabilityof water quality standard violation for each lake. In addition,the posterior distributions of model parameters from thisstudy provide prior distributions for future model revisionor for developing lake-specific models.

2. DataNational water quality monitoring of Finnish lakes startedin 1965 after the passage of the Water Act in 1962. Samplingstrategy and analysis methods have been described by Niemiet al. (6). In January 2000, 253 lake sites from the nationalmonitoring network were integrated into the EuropeanEnvironment Agency’s Eurowaternet (8) to produce statisti-cally reliable information that allow the European Com-mission, member states, and the general public to judge theeffectiveness of the environmental policy. Information wasrequired on the status of Europe’s inland water resources,quality and quantity and how the status relates and respondsto pressures on the environment (8).

The use of lake-type-specific Chla response models isbased on the assumption that lakes within each type in theGeomorphological Typology of Finnish Lakes (Table 1) arelikely to have similar Chla response to nutrient variation.

We use 19 248 July and August observations of TP, TN,Chla from 2289 Finnish Lakes from 1988 to 2004. About 42%of the observations are from July and 58% from August.However, observations are unevenly distributed betweenyears, types, and lakes (Tables 3 and 4, Supporting Informa-tion). Of the total of 2289 lakes, 900 lakes have only oneobservation. The average number of observations is eight(s.d. 26) per lake. One lake has 441, and there are 12 lakesthat have more than 150 observations.

Trophic status of the lake types varies (see Table 1,Supporting Information). Humic lake types (5ssmall, humic,deep lakes; 6sdeep, very humic lakes; 8sshallow, humic

* Corresponding author phone: 358 19 40300 359; fax: 358 1940300 391; e-mail: [email protected].

10.1021/es061359b CCC: $33.50 xxxx American Chemical Society VOL. xx, NO. xx, xxxx / ENVIRON. SCI. & TECHNOL. 9 APAGE EST: 6Published on Web 10/24/2006

Page 190: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

lakes; and 9sshallow, very humic lakes) have highest Chlaand nutrient concentrations and lowest TN/TP - ratio.

Linear correlation between predictor variables log(TP) andlog(TN) is high with a correlation coefficient of 0.62 (Table2, Supporting Information) posing a potential problem inseparating the effects of these two predictors. The correlationsbetween log(Chla) and the two predictors are similar. Todiscern the roles of TP and TN in deciding Chla concentration,we use the classification and regression tree (CART) procedure(11) as a variable selection method to identify importantfactors associated with the variation of our response variableChla (12). The CART model is selected based on a cross-validation procedure to minimize a model’s prediction error(11). The variable selected first is the most important one orhas the most effect on Chla. Using log(TP), log(TN), laketype, depth, color, surface area, and year of observation aspotential predictors, the resulting tree (Figure 1) includes TPas the most influential predictor. TN plays a role only whenTP is larger than 42.75 µg L-1.

Even though the lake type is not an important predictorin the CART model for the entire data set, when the CARTprocedure is applied separately to each lake type, we discoverremarkable differences in the nutrient effects between thelake types. In all lake types, both nutrients have an effect, but

in large and deep lakes (types 1, 2, 4, 5, and 6) and in shallow,humic lakes (type 8), TP is more influential. In medium andsmall non-humic lakes and in shallow non-humic and veryhumic lakes (types 3, 7, and 9), TN is more influential. Thesedifferences between the lake types reveals the usefulness ofthe geomorphological typology in the prediction of Chla andin the lake management.

To further corroborate the role of nutrients, we createtwo conditioning plots (coplots) (9). The conditional plots inFigure 1, Supporting Information, shows that the log(Chla)-log(TP) relationship is relatively stable, for all conditions oflog(TN) and lake depth, while the log(Chla)-log(TN) rela-tionship varies as a function of log(TP) and depth. This resultindicate a potential interaction between log(TP) and log-(TN).

3. Methods3.1. Bayesian Hierarchical Linear Model. Bayesian methodsare currently experiencing an increasing popularity in thesciences as a means of probabilistic inference (13). Amongtheir advantages are (1) the ability to incorporate priorinformation, (2) the ease with which Bayesian methods canbe incorporated into a formal decision analytic context, (3)the explicit handling of uncertainty, and (4) the straightfor-ward ability to assimilate new information in contexts suchas adaptive management. Introduction to Bayesian statisticsand data analysis can be found in refs 14, 15, and 16.

In a predictive model for lake Chla concentrations, modelparameters for different lakes in the same lake type may besimilar. Therefore, estimates of these parameters can beexpressed in terms of a common prior distribution of modelparameters. In other words, we assume that lake-specificmodel parameters are random variables from a commondistribution. Computationally, it is natural to model the datahierarchically. That is, individual observations of Chlaconcentration are modeled conditional on lake-specificmodel parameter values, the lake-specific model parametervalues are modeled conditional on lake-type-specific pa-rameters, which are themselves modeled conditional on aparameter distribution of all lakes in Finland (eqs 1-4). Animportant feature of the hierarchical model is that thehierarchical probability distribution imposes dependenceamong parameters which allows a hierarchical model to haveenough parameters to form a realistic model without overfitting the data (14). Details of the Bayesian hierarchicalmodeling approach can be found in ref 14. Qian et al. (17,18) indicated that using a hierarchical modeling approachto pool data from different sources often results in reducedmodel uncertainty and improved accuracy in estimatedmodel parameters.

The hierarchical linear model is summarized as follows:

where log(yijk) is the kth observed log(Chla) value from lakej in type i. X is the model matrix containing observed TP andTN values from lake j in type i, âij ) [â0,ij,â1,ij,â2,ij,â3,ij] is thelake-specific model parameter vector which consists ofintercept (â0,ij) and slopes for log(TP) (â1,ij), log(TN) (â2,ij),and for the combined effect of log(TP) and log(TN) (â3,ij), τ2

is the model error variance, âi ) [â0,i,â1,i,â2,i,â3,i] is a vectorof model parameter means for lake type i, σi

2 ) [σ0,i2 , σ1,i

2 , σ2,i2 ,

TABLE 1. Geomorphological Typology of Finnish LakesSpecified by Finnish Environment Institutea

laketype name characteristics

I large, nonhumic lakes SA > 4,000 Ha, color <30II large, humic lakes SA > 4,000 Ha, color >30III medium and small,

nonhumic lakesSA: 50 - 4,000 Ha, color <30

IV medium area,humic deep lakes

SA: 500 - 4,000 Ha,color: 30-90, D > 3 m

V small, humic,deep lakes

SA: 50 - 500 Ha,color: 30-90, D > 3 m

VI deep, very humic lakes color >90, D > 3 mVII shallow, nonhumic lakes color <30, D < 3VIII shallow, humic lakes color: 30-90, D < 3 mIX shallow, very humic lakes color >90, D < 3 m

a SA ) surface area, D ) depth.

FIGURE 1. Regression tree plot of observed log(Chla) [µg L-1]partitioned with TN and TP concentrations [µg L-1].

log(yijk) ∼ N(Xâij,τ2) (1)

Xâij ) â0,ij + â1,ij × log(TPijk) + â2,ij × log(TNijk) + â3,ij ×log(TPijk) × log(TNijk) (2)

âij ∼ N(âi, σi2) (3)

âi ∼ N(â, σ2) (4)

B 9 ENVIRON. SCI. & TECHNOL. / VOL. xx, NO. xx, xxxx

Page 191: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

σ3,i2 ] is a vector of between lake variance of model parameters

for lake type i, and â ) [â0,â1,â2,â3] and σ2 ) [σ02, σ1

2, σ22, σ3

2]are the means and between type variance. Note that thehierarchical notation in eqs 1-4 indicates conditionaldistributions, i.e., yijk is normally distributed conditional onXâij and τ2, âij is normally distributed conditional on âi, σi

2

and âi is normally distributed conditional on â, σ2. Theinteraction term is added to the model to account for thenon-additive effects of TP and TN indicated by the coplotsin Figure 2, Supporting Information.

The noninformative prior distributions of â, τ,σi, and σ ineqs 1-4 are as shown:

where N(0, 10 000) is the normal distribution of â with mean0 and variance 10 000 and unif (0, 100) is the uniformdistribution of σi, σ, and τ with lower (0) and upper (100)limits. The prior distributions for σi, σ, τi, and â are considered“non-informative” or vague. The width of the 95% priorinterval for â is approximately (200, practically “flat” in theregion of interest. The standard non-informative prior for avariance parameter is p(σ2) ∝ 1/σ2, which arises fromassuming that the log of the variance parameter has a uniformdistribution on (-∞, +∞). This prior is improper, which couldlead to an improper posterior distribution. Instead, we useda uniform distribution for standard deviation, as suggestedby Gelman (14).

3.2. Bayesian Nonhierarchical Linear Models. To com-pare hierarchical model to its classical counterparts anonhierarchical type specific dummy variable model is fitted.We combine data from all nine types and use the lake typeas a dummy variable (23). By using a lake type dummyvariable, the resulting model had type-specific slopes andintercepts and a common model error variance. The commonmodel error variance allows a meaningful model comparisonwith the hierarchical linear model. The Bayesian linear modelwith a dummy variable is summarized as follows:

where yij is the kth observation of Chla concentration fromlakes in type i, X is the model matrix containing observed TPand TN values from all lakes, âi ) [â0,i,â1,i,â2,i,â3,i] is the laketype-specific model parameter vector which consists ofintercept (â0,i) and slopes for log(TP) (â1,i), log(TN) (â2,i), andthe interaction of log(TP) and log(TN) (â3,i), Zi ) [Z0,i,Z1,i,Z2,i,Z3,i]is the 0,1 dummy-coded matrix, τ2 is the model error variance.µâi ) [µâ0,i, µâ1,i, µâ2,i, µâ3,i] and σi

2 ) [σ0,i2 , σ1,i

2 , σ2,i2 , σ3,i

2 ] are themean and the variance for âi. The vague prior distributionsof µâi, τ, σi are as follows:

To see how well lake-specific observations alone identify anon-hierarchical linear regression model, we fit a linearregression model (eqs 12-14) to data sets from selected lakes.We used these lake-specific linear models as a baseline for

the hierarchical model and the non-hierarchical dummyvariable model.

3.3. Computation and Model Comparison. Analyticalsolutions to the parameter estimates of hierarchical modelare not known. Therefore, a Markov chain Monte Carlosimulation (MCMC) method (19) is used for simultaneouslyestimating the distribution parameters by sampling theparameters from their joint posterior distribution. The MCMCmethod is implemented using the freely available WinBUGSsoftware (20) (21). The MCMC method allows us to sampleall unknown parameters (âij,âi,σi

2,σ2,τ2) from their jointposterior distribution. All inferences are made based on theseposterior samples. The number of simulations that the MCMCsimulation needs to converge to the true posterior distributionis called the burn-in period. Samples after the burn-in aresaved for the statistical inference of the posterior distribution.Multiple MCMC chains with different lengths are run and Rstatistics (24) are calculated to select the length of the burn-in period. Burn-in period is long enough if R ≈ 1. The burninperiod for the hierarchical model is 45 000. We take 1000samples for each unknown quantity from the next 45 000MCMC iterations to reduce autocorrelation of the sample.The burn-in period for the non-hierarchical models is 10 000and 1000 samples are taken for each unknown quantity fromthe next 10 000 MCMC iterations.

The two models are compared using the devianceinformation criterion (DIC), a Bayesian measure of modelcomplexity and fit (21). DIC is a sum of the posterior meandeviance D(θ), a Bayesian measure of fit or “adequacy,” anda complexity measure pD, which corresponds to the trace ofthe product of Fisher’s information and the posteriorcovariance. The Bayesian deviance Dh is based on -2log-(likelihood), a measure of residual information in dataconditional on parameter θ or -2log[p(Y|θ)]. In our case, thedata (Y ) log(chla)) are assumed to be from a normaldistribution. The Bayesian deviance is

where f(Y) is the mathematical upper limit of the likelihoodfunction (it is reached when the estimated mean Xâij is equalto the average observed log(chla) values from lake j in typei). The smaller the D(θ), the closer the actual likelihood (p(Y|θ))is to the maximum (hence a better model). The complexitymeasure is the mean deviance minus the deviance evaluatedat the posterior parameter means this:

The DIC is defined as

a Bayesian measure of model fit penalized by an additionalcomplexity term. A smaller DIC indicates a “better” model.

R squared (R2), the statistical measure of how well aregression line approximates real data points, is also used tocompare the mean fit of the models. R2 is computed fromthe ratio of two sources of variation, SStotal and SSerror (eq 18).SStotal is the variability of the log(Chla) about its mean. SSerror

â ∼ N(0, 10 000) (5)

σi,σ,τ ∼ unif (0, 100) (6)

log(yik) ∼ N(XâiZi, τ2) (7)

XâiZi ) ∑i)1

9

[â0,iZ0,i + â1,iZ1,i × log(TPik) + â2,iZ2,i ×log(TNik) + â3,iZ3,i × log(TPik) × log(TNik)] (8)

âi ∼ N(µâi, σi

2) (9)

µâi∼ N(0, 10 000) (10)

σi,τ ∼ unif(0, 100) (11)

log(yk) ∼ N(Xâ,τ2) (12)

Xâ ) â0 + â1 × log(TPk) + â2 × log(TNk) +â3 × log(TPk) × log(TNk) (13)

â ∼ N(µâ, σ2) (14)

D(θ) ) -2 logp(Y|θ)

f(Y)(15)

pD ) D(θ) - D(θh) (16)

DIC ) D(θ) + pD (17)

VOL. xx, NO. xx, xxxx / ENVIRON. SCI. & TECHNOL. 9 C

Page 192: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

is the variability of the log(Chla) about the predicted meanvalues from the model. If SSerror is much smaller than SStotal

(0 << R2 < 1), then we know that the model fits well. Inleast-square estimation, R2 cannot have negative values. Buthere the hierarchical model and the non-hierarchical modelare fitted to a larger data points from the entire lakepopulation and lake type, respectively. Whereas R2 iscalculated for a single lake only. Therefore, SSerror might belarger than SStotal and (R2) negative if the model fit is extremelypoor.

3.4. Posterior Simulations. Posterior simulations areperformed to reveal the effects of the nutrients and todemonstrate the usage of the hierarchical model in lakeeutrophication management. Therefore, posterior prob-abilities of Chla exceeding the criteria are simulated. The TNand TP plane of interest is discretized into 100 × 100 gridcells and the predictive Chla concentration distribution ineach grid cell is calculated. The results are presented as thecontour lines of the 80th percentiles of these predictivedistributions, an arbitrarily selected number to reflect thepotential risk attitude of a lake manager. From these contourlines, we can identify nutrient concentrations that result inan 80th percentile of Chla distribution at 30 µg/L, which isthe nutrient condition to ensure a probability of Chlaconcentration exceeding the criteria below 20%.

Chla concentration as a function of TP and TN concen-tration [µg L-1] for the Lake Paijanne (large humic lake, type2) is also simulated with the hierarchical linear model. First,we vary TP concentration within the observational rangewhile TN was held constant (at 50th percentile). Then, werepeat the simulation holding TP constant at the samepercentile while varying TN within the observational range.

4. Results4.1. Model Fit. The hierarchical model is compared to thenon-hierarchical type specific dummy variable model andto the linear lake specific model. Model fits for four selectedlakes are computed to illustrate the differences between themodels. Lakes are selected to show the effect of the samplesize on the model’s fit and on the predictive confidenceregion. The selected lakes are Lake Onkilampi (shallow humiclake, type 8), Lake Nurmijarvi (large non-humic lake, type 1),Lake Kuhajarvi (shallow non-humic lake, type 7), and LakePaijanne (large humic lake, type 2). The number of observa-tions of the lakes are 3, 7, 22, and 265 respectively. In general,the comparison is overwhelmingly in favor of the hierarchicalmodel compared to the non hierarchical type specific model.Median Chla concentrations predicted by the hierarchicalmodel were usually closer to the observed Chla values thanmeans predicted by non hierarchical dummy variable model(Figure 2 and Figure 3, Supporting Information) suggestingthat the hierarchical model fits the data far better. This isindicated by the R2 which is greater for hierarchical model.Also the deviance and DIC of the hierarchical model aresmaller than of the non hierarchical dummy variable model(Table 2). The DIC indicates that the increased number ofmodel parameters of the hierarchical model is more thancompensated for by the improved model fit.

When using the non hierarchical lake-type-specific dummyvariable model, we treat all lakes within a type as the sameand pool individual observations to form a type-specificmodel. This model represents a weighted average with theweights proportional to each lake’s sample size. That is, thelake-type specific model is heavily weighted by lakes withlarger sample sizes. Consequently, the resulting model can

be grossly biased for lakes with small sample sizes. Thisfeature is clearly illustrated in the four selected lakes (Figure2 and Figure 3, Supporting Information). The hierarchicalmodel treats lakes within the same type as exchangeableand fits lake-specific model parameters. But these parametersare assumed to come from the same prior distributions,thereby pooling information from similar lakes. This poolingof information reduces the bias at lake-level, and reducesmodel error variance as well.

R2 ) 1 -SSerror

SStotal(18)

FIGURE 2. Fit plot. 10, 50(circle), and 90th percentiles of predictedChla concentration [µg L-1] as a function of observed value for fourselected lakes: a. Lake Onkilampi (shallow humic lake, type 8), b.Lake Nurmija1rvi (large non-humic lake, type 1), c. Lake Kuhaja1rvi(shallow non-humic lake, type 7) and d. Lake Pa1ija1nne (large humiclake, type 2). The 45 ° line is the 1-1 line (perfect fit). Percentileshave been calculated with the lake type specific non hierarchicalmodel(type), with the hierarchical linear model (hier) and with thelake specific non hierarchical model (lake). 10 and 90th percentilesare connected with vertical lines (linear: gray, solid line). R2 denotesR squared.

FIGURE 3. 80th percentile contour lines of predictive Chlaconcentration in Lake Pa1ija1nne (large humic lake, type 2) at 15, 30,60, 120 [µg L-1] as a function of observed TP and TN concentrations[µg L-1]. Predictions are simulated with the hierarchical linearmodel. Numbers are observed Chla concentrations [µg L-1].

D 9 ENVIRON. SCI. & TECHNOL. / VOL. xx, NO. xx, xxxx

Page 193: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Lake-specific non-hierarchical linear models are fittedusing only data from a specific lake. Despite of the better fitof the non-hierarchical lake specific model compared to itscounterparts, the model error variance tended to be largewhen sample size is small but decreases heavily as samplesize increases (Figure 2).

4.2. Posterior Predictive Distributions. Lake specific 80thpercentile contour lines for Lake Paijanne (large humic lake,type 2) simulated with the hierarchical model (Figure 3) revealthe usefulness of the posterior simulations in water qualitymanagement. Simulations have been confined within thepart of the observational ranges of TP and TN in large humiclakes (type 2, TP: 2-160, TN: 31-4400) that is below thelake specific maximum values (TP: 150, TN: 2000). Thesimulation in Figure 3 included TN values outside the lakespecific observational ranges (TP: 6-150, TN: 300-2000).However, the extrapolation under the hierarchical setting isreasonable due to the pooling of information within andamong lake types. This is a distinct advantage compared tothe non-hierarchical lake model which can predict only awithin lake specific observational range. For lakes with fewobservations this range can be limited. The contour lines forLake Paijanne are parallel to the y-axes in the observationalrange showing clear TP limitation of Chla with this range.On the contrary, near the low TN boundary and in the highTP range TN limitation seems to prevail. From figures similarto Figure 3, a lake manager can read nutrient concentrationsthat comply with chla standards with a given certainty.

The effects of TP and TN are illustrated also in thepredictive plots (Figure 4). Simulated Chla increases with TPbut not very much with TN. 10-90th percentile predictiveintervals look rather wide at first glance. The predictiveinterval is the predicted credible interval for individualobservations, which is always wider than the commonlypresented fitted confidence interval for the mean. Thepredictive distribution is directly related to the process oflake eutrophication assessment process, while the fitted meanis not.

The collinearity of TP and TN makes it difficult todetermine their effects on Chla from the estimated slopesalone. Therefore, the posterior simulations for the LakePaijanne (large humic lake, type 2) (Figure 3 and 4) werecalculated. Simulations show very clear TP limitation withinthe observational range. This indicates accurate separationof the effects despite the high correlation (0.7) between thecoefficients â1 and â2. The collinearity is not transferred tothe predictions. The MCMC simulation of linear coefficientparameters â0, â1, â2, and â3 together with their correlationenable the separation of the effects by including correlationof the parameters in the predictive simulation of Chla.

Despite the collinearity of nutrients the main effects oftheir standardized normal deviates ((log(TP) - µlog(TP))/(σlog(TP)

2 ) were estimated for the nine lake types with thehierarchical model (Table 3). On average the main effectsagree with CART model results. The effect of TP is twofoldcompared to the effect of TN. However in small humic, deepvery humic and shallow humic/very humic lakes (types 5, 6,8, and 9) TN effects almost equaled TP effects. Here the maineffects are inconsistent with the CART model. The maineffects represents average trend within the observationalrange where as the CART model splits the range into discretedomains. Therefore, the interpretation of the main effects isdifferent.

5. DiscussionTo aid in water quality management of Finnish lakes, therelationship between chlorophyll a and the nutrients wasinvestigated. The Bayesian hierarchical linear model withthree levels (lake, lake type, and all lake types) was fitted toa cross-sectional data set of 19 248 July and August observa-tions from 2289 lakes in the Finnish lake monitoring network.The lakes in the network have been classified into nine laketypes for the determination of ecological status.

The hierarchical approach has advantages over both lake-specific linear models (unbiased with potentially high modeluncertainty) and lake-type specific models (smaller model

FIGURE 4. Chla concentration [µg L-1] as a function of TP and TN concentration [µg L-1] for Lake Pa1ija1nne (large humic lake, type 2)predicted with the hierarchical linear model. (50th percentilesdotted line and 10-90th percentile confidence region solid lines.) WhileTP is varied within the observational range TN is kept constant (50th percentile) and vise versa.

TABLE 2. The Deviance Information Criterion (DIC) andDeviance (D) for the Hierarchical Model (HM) and for theNon-hierarchical Type Specific Dummy Variable (DM) Model.

model DIC D

HM 25 946 28 358DM 31 474 31 515

TABLE 3. Main Effects of the Standardized Normal Deviates ofNutrients for the Nine Lake Types Estimated with theHierarchical Model.

type 1 2 3 4 5 6 7 8 9 mean

TP 1.4 1.2 1.3 1.2 0.9 0.8 1.4 0.8 0.6 1.1TN 0.5 0.3 0.6 0.4 0.5 0.5 0.4 0.6 0.4 0.5

VOL. xx, NO. xx, xxxx / ENVIRON. SCI. & TECHNOL. 9 E

Page 194: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

uncertainty potentially less biased). The hierarchical modelis favored by model evaluation criteria such as R2, DIC andDeviance statistics. The hierarchical model pools data fromdifferent levels of lake hierarchy (lake, lake type, and all laketypes) to reduce uncertainty and potential bias in lake specificpredictions.

The Bayesian hierarchical model was used to predict lakechlorophyll a concentrations and to simulate chlorophyll astandard exceedance probability response surface. Thisprobability response surface is ideally suited for settingnutrient criteria, which can be directly used under a riskassessment framework. Posterior predictive simulations ofa Bayesian approach are more informative than the traditionalregression modeling approach because of the use of Bayesiancredible interval directly conveys the probabilistic risk of amanagement decision. Using the hierarchical modelingapproach allows the model to borrow strength from lakeswithin a lake type, which is important for the lakes with fewobservations and limited observational range. When lake-specific data are unavailable, a lake-type-specific model canbe used to generate a prior for an initial management strategy.

Using lake specific model predictive distributions, we canprovide information on monitoring network optimization tominimize uncertainty for all lakes. The hierarchical modelpresented in this paper is proposed for lake eutrophicationmanagement in Finland to comply with the European UnionWater Framework Directive. The Finish lake monitoringnetwork has produced a large cross-sectional water qualitydata set from lakes that have been classified by type. In thefuture, monitoring effort will be concentrated on the lakesthat are likely to violate current eutrophication-related waterquality standards and lakes with high predictive uncertainty.The lake specific target nutrient loads will be estimated witha hierarchical nutrient retention model together with thehierarchical chlorophyll a model. Fish and zooplanktoneffects will be added to the model after the collection ofnecessary data to predict the effect of bio-manipulation.

Target nutrient load estimated using our hierarchicalmodel will benefit lake management work for lakes with fewobservations or lakes newly added to the monitoring network.Target load estimates for all lakes will be adjusted as additionalmonitoring data are collected annually, such that an adaptivemanagement scheme can be developed to implement theEuropean Water Frame Work directive (7).

AcknowledgmentsO.M. was supported in part by Maa-ja vesitekniikan tuki ry,Center for International Mobility and the EU FP6 ResearchProject REBECCA (Relationships Between Ecological andChemical Status of Surface Waters, Contract SSPICT-2003502158). S.S.Q. was supported by the U.S. EnvironmentalProtection Agency through STAR grant no. RD83244701. Wethank the three reviewers and Ariana Sutton-Grier for theirconstructive comments that improved the manuscriptconsiderably.

Supporting Information AvailableTable 1: Observed log(TP), log(TN), log(Chla), and TN/TP-ratio within Lake types. Table 2: Correlation between log(TP), log(TN), and log(Chla) within Lake types. Table 3: Thenumber of observations per year from 1988 to 2004. Table4: Number of observations within the lake types. Figure 1:Conditioning plot that illustrates the log(Chla) to log(TP)relationship conditioned on log(TN) concentrations anddepth. Figure 2: Conditioning plot that illustrates the log-(Chla) to log(TN) relationship conditioned on log(TP) con-centrations and depth. Figure 3: Fit plot that shows 10, 50,and 90th percentiles of predicted chlorophyll a concentrationas a function of observed value for shallow, very humic lakes,type 9. This material is available free of charge via the Internetat http://pubs.acs.org.

Literature Cited(1) Assessing the TMDL Approach to Water Quality Management;

National Academy Press: Washington, D.C., 2001.(2) Berthouex, P. M.; Brown, L. C. Statistics for Environmental

Engineers; Lewis Publishers: Boca Raton, FL, 2002.(3) Hession, W. C.; Storm, D. E.; Haan, C. T.; Reckhow, K. H.; Smolen,

M. D. Risk analysis of total maximum daily loads in an uncertainenvironment using EUTROMOD. Lake Res. Manage. 1996, 12(3),331-347.

(4) European Commission. Overview of common intercalibrationtypes and guidelines for the selection of intercalibration sites,ecostat working group, technical report 2.a., European Com-mission: Brussels, 2004.

(5) Pilke, A.; Heinonen, P.; Karttunen, K.; Koskenniemi, E.; Lepisto,L.; Pietilainen, O. P.; Rissanen, J.; Vuoristo, H. Finnish draft fortypology of lakes and rivers. In Typology and EcologicalClassification of Lakes and Rivers; Ruoppa, M., Karttunen, K.,Eds.; Nordic Council of Ministers: TemaNord, 2002; pp 42-43.

(6) Niemi, J.; Heinonen, P.; Mitikka, S.; Vuoristo, H.; Pietilainen,O.-P.; Puupponen, M.; Ronka, E. The Finnish Eurowaternet withInformation about Finnish Water Resources and MonitoringStrategies, No 445 in Finnish Environment Institute, Environ-mental Protection, The Finnish Environment; Edita Ltd.: Hel-sinki, Finland, 2001.

(7) Saloranta, T.; Kamari, J.; Rekolainen, S.; Malve, O. Benchmarkcriteria: A tool for selecting appropriate models in the field ofwater management. Environ. Manage. 2003, 32(3), 322-333.

(8) Nixon, S.; Grath, J.; Bøgestrand, J. Eurowaternet, The EuropeanEnvironment Agency’s Monitoring and Information Networkfor Inland Water Resources. Technical Guidelines for Imple-mentation, technical report 7; European Environment Agency:Copenhagen, Denmark, 1998.

(9) Cleveland, W. S. Visualizing Data; Hobbart Press: Summit, NewJersey, 1993.

(10) Devlin, S. J.; Cleveland, W. S.; Grosse, S. J. Regression by localfitting: Methods, prospectives and computational algorithms.J. Econometrics 1988, 37, 87-114.

(11) Breiman, L.; Friedman, J. H.; Olshen, R.; Stone, C. J. Classificationand Regression Trees; Wadsworth International Group: Belmont,CA, 1984.

(12) Qian, S. S.; Anderson, C. W. Exploring factors controlling thevariability of pesticide concentrations in the willamette riverbasin using treebased models. Environ. Sci. Technol. 1999, 33,3332-3340.

(13) Malkoff, D. M. Bayes offers “new” way to make sense of numbers.Science 1999, 286,1460-1464,.

(14) Gelman, A.; Carlin, J. B.; Stern, H. S.; Rubin, D. B. Bayesian DataAnalysis; Chapman & Hall: New York, 1995.

(15) Box, G. E. P.; Tio, G. C. Bayesian Inference in Statistical Analysis;Addison-Wesley: Reading, MA, 1973.

(16) Bernardo, J. M.; Smith, A. F. M. Bayesian Theory; Wiley: WestSussex, England, 1994.

(17) Qian, S. S.; Donnelly, M.; Schmelling, D. C.; Messner, M.; Linden,K. G.; Cotton, C. Ultraviolet light inactivation of protozoa indrinking water: a bayesian metaanalysis. Water Res. 2004, 38,317-326.

(18) Qian, S. S.; Linden, K. G.; Donnelly, M. A bayesian analysis ofmouse infectivity data to evaluate the effectiveness of usingultraviolet light as a drinking water disinfectant. Water Res. 2005,39, 4229-4239.

(19) Gilks, W. R.; Richardson, S.; Spiegelhalter, D. J. BayesianStatistical Modelling; Wiley: West Sussex, England, 2001.

(20) Spiegelhalter, D. J.; Thomas, A.; Best, N.; Gilks, W. R. BUGS 0.5:Bayesian Inference Using Gibbs Sampling Manual; MedicalResearch Council Biostatistics Unit, Institute of Public Health:Cambridge, UK, 1996.

(21) Spiegelhalter, D.; Best, N. G.; Carlin, B. P.; van der Linde, A.Bayesian measures of model complexity and fit. J. Royal Stat.Soc., B 2002, 64, 583-639.

(22) Raftery, A. E.; Lewis, S. M. The number of iterations, convergencediagnostics and generic metropolis algorithms. In PracticalMarkov chain Monte Carlo; Gilks, W. R., Spiegelharter, D. J.,Richardson, S., Eds.; Chapman and Hall: London, U.K., 1995.

(23) Weisberg, S. Applied Linear Regression; Wiley: New York, 1985.(24) Gelman, A.; Rubin, D. B. Inference from iterative simulation

using multiple sequences. Stat. Sci. 1992, 7, 457-511.

Received for review June 7, 2006. Revised manuscript re-ceived September 5, 2006. Accepted September 20, 2006.

ES061359B

F 9 ENVIRON. SCI. & TECHNOL. / VOL. xx, NO. xx, xxxx PAGE EST: 6

Page 195: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Supporting information to Malve & Qian:Estimating nutrients and Chlorophyll a

relationships in Finnish Lakes

September 5, 2006

Lake Type mean TP mean TN mean Chla TN/TP -ratio(percentiles)5 % 95 %

1 11.0 347 4.9 17.1 79.02 18.3 485 9.3 14.5 60.03 13.3 349 5.9 16.0 77.04 20.3 496 11.0 14.1 66.75 32.2 582 18.6 9.3 50.06 34.1 631 21.2 11.7 33.37 20.6 444 12.1 12.1 75.08 39.5 715 25.9 10.8 42.99 52.2 815 33.9 9.6 31.3all lakes 27.8 571 16.4 11.7 57.1

Table 1: Mean of observed TP [µg L−1], TN [µg L−1] and Chla [µg L−1] and 5% &95 % percentiles of TN/TP-ration within Lake types specified by Finnish EnvironmentInstitute.

Type TP - TN TP - Chla TN - Chla1 0.64 0.75 0.692 0.66 0.73 0.553 0.65 0.80 0.684 0.78 0.83 0.775 0.80 0.84 0.796 0.76 0.64 0.607 0.76 0.87 0.748 0.76 0.87 0.749 0.78 0.80 0.78all 0.80 0.83 0.76

Table 2: Correlation between log(TP), log(TN) and log(Chla) within Lake types spec-ified by Finnish Environment Institute.

1

Page 196: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

Year N Year N Year N Year N1988 2 1993 426 1998 1,610 2003 2,2201989 59 1994 1,478 1999 1,533 2004 7741990 66 1995 1,621 2000 2,0291991 78 1996 1,687 2001 1,9721992 71 1997 1,714 2002 2,088

Table 3: The number of observations (N) per year from 1988 to 2004.

Type N Type N Type N1 485 4 3,949 7 3912 6,536 5 1,080 8 2,7293 388 6 1,326 9 2,544

Table 4: Number of observations (N) within the lake types.

Coplots are designed to graphically present a multivariate relationship on a two-dimensional surface, using a series of bivariate scatter plots. Figures 1 and 2 in Sup-porting Information shows the four-dimensional surface of log(Chla) as a function oflog(TP), log(TN), and lake depth. In Figure 1, Supporting Information, each panelillustrates the log(Chla)–log(TP) relationship at different log(TN) and depth values(indicated by the location of the shaded bars on top of panels and on the right handside of the panels). The far left panels have the lowest log(TN) values and the farright panels have the highest log(TN) values. Lowest panels have the shallowest andthe highest panels the deepest lakes. Figure 2, Supporting Information, shows thelog(Chla)–log(TP) relationship is relatively stable, regardless of log(TN) and depth;while the log(Chla)–log(TN) relationship is dependent on log(TP) and depth values.Although the log(Chla)–log(TN) relationship is noisier, the Loess curve between thevariables (follows the trend of the data) tends to increase with log(TP) and decreasewith depth.

2

Page 197: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

−2

02

46

0 2 4 6 0 2 4 6

−2

02

46

−2

02

46

−2

02

46

0 2 4 6

−2

02

46

0 2 4 6 0 2 4 6

log(totp)

log(

chla

)

0 1000 2000 3000 4000

Given : totn

510

1520

Giv

en :

dept

h

Figure 1: Conditioning plot that illustrates the log(Chla) [µg L−1] to log(TP) [µg L−1]relationship conditioned on log(TN) concentrations [µg L−1] and depth [m]. The graysolid line is the Loess curve that follows the trend of the data.

3

Page 198: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

−2

02

46

4 5 6 7 8 4 5 6 7 8

−2

02

46

−2

02

46

−2

02

46

4 5 6 7 8

−2

02

46

4 5 6 7 8 4 5 6 7 8

log(totn)

log(

chla

)

0 100 200 300 400

Given : totp

510

1520

Giv

en :

dept

h

Figure 2: Conditioning plot that illustrates the log(Chla) [µg L−1] to log(TN) [µg L−1]relationship conditioned on log(TP) concentrations [µg L−1] and depth [m]. The graysolid line is the Loess curve.

4

Page 199: WATER QUALITY PREDICTION FOR RIVER BASIN …users.jyu.fi/~thuttula/JSS17_BIO2/thesis_malve_BIO01_jss17.pdfV Malve, O. and Qian, S. 2006. Estimating nutrients and chlorophyll a rela-tionships

0 1 2 3 4 5 6

01

23

45

6

Obs Log(Chl)

Pre

d Lo

g(C

hl)

h/ 9

R2 = 0.73

0 1 2 3 4 5 6

01

23

45

6

Obs Log(Chl)

Pre

d Lo

g(C

hl)

l/ 9

R2 = 0.45

Figure 3: 10 %, 50 % (circle) and 90 % percentiles of predicted Chla concentration[µg L−1]as a function of observed value for shallow, very humic lakes, type 9 . Per-centiles have been calculated with the hierarchical linear model (h/9) and with non-hierarchical type specific dummy variable model. 10 % and 90 % percentiles are con-nected with vertical gray lines. R2 denotes R squared.

5