129
Brigham Young University Brigham Young University BYU ScholarsArchive BYU ScholarsArchive Theses and Dissertations 2020-03-27 Advancing the Accessibility, Reusability, and Interoperability of Advancing the Accessibility, Reusability, and Interoperability of Environmental Modeling Workflows Through Web Services Environmental Modeling Workflows Through Web Services Xiaohui Qiao Brigham Young University Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Engineering Commons BYU ScholarsArchive Citation BYU ScholarsArchive Citation Qiao, Xiaohui, "Advancing the Accessibility, Reusability, and Interoperability of Environmental Modeling Workflows Through Web Services" (2020). Theses and Dissertations. 8889. https://scholarsarchive.byu.edu/etd/8889 This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected].

Advancing the Accessibility, Reusability, and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advancing the Accessibility, Reusability, and

Brigham Young University Brigham Young University

BYU ScholarsArchive BYU ScholarsArchive

Theses and Dissertations

2020-03-27

Advancing the Accessibility, Reusability, and Interoperability of Advancing the Accessibility, Reusability, and Interoperability of

Environmental Modeling Workflows Through Web Services Environmental Modeling Workflows Through Web Services

Xiaohui Qiao Brigham Young University

Follow this and additional works at: https://scholarsarchive.byu.edu/etd

Part of the Engineering Commons

BYU ScholarsArchive Citation BYU ScholarsArchive Citation Qiao, Xiaohui, "Advancing the Accessibility, Reusability, and Interoperability of Environmental Modeling Workflows Through Web Services" (2020). Theses and Dissertations. 8889. https://scholarsarchive.byu.edu/etd/8889

This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected].

Page 2: Advancing the Accessibility, Reusability, and

Advancing the Accessibility, Reusability, and Interoperability

of Environmental Modeling Workflows

Through Web Services

Xiaohui Qiao

A dissertation submitted to the faculty of Brigham Young University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Daniel P. Ames, Chair E. James Nelson Norman L. Jones

Gustavious P. Williams Rollin H. Hotchkiss

Department of Civil and Environmental Engineering

Brigham Young University

Copyright © 2020 Xiaohui Qiao

All Rights Reserved

Page 3: Advancing the Accessibility, Reusability, and

ABSTRACT

Advancing the Accessibility, Reusability, and Interoperability of Environmental Modeling Workflows

Through Web Services

Xiaohui Qiao Department of Civil and Environmental Engineering, BYU

Doctor of Philosophy

Global flood forecasting can benefit developing countries and ungauged regions that lack observational data, computational infrastructure, and human capacity for streamflow modeling. Many technical challenges exist to provide flood predictions on a global scale. First, existing land surface forecasts use coarse resolution grid cells, which provide limited information when used for flood forecasting at local scales. There is, so far, no modeling system that can provide rapid and accurate global flood predictions with low cost. Second, accurate flood predictions often require integrating interdisciplinary models, data sources, and analysis routines into a workflow. Limited accessibility, reusability, and interoperability of models restrict integrated modeling from producing more reliable results. Web services have been demonstrated as an effective way for data and model sharing because of the capability of enabling communication among heterogeneous applications over the internet. However, publishing models or analysis routines as web services is still challenging and, hence, is not commonly done.

To address the above challenges, I present a computational system for global streamflow

prediction, using existing, well-established open source software tools, that quickly downscales the runoff generated from such coarse grid-based land surface models (LSMs) onto high-resolution vector-based stream networks then routes the results using a vector-based river routing model. A set of experiments are conducted to demonstrate the feasibility and credibility of this approach. I also present a tool to publish complex environmental models as web services by adopting the OpenGMS Wrapper System (OGMS-WS) and Docker. The streamflow prediction system is deployed as a web service using this tool, and the service is used to analyze the historical streamflow tendency in Bangladesh. Next, I present a ready-to-use tool called Tethys WPS Server, which provides a simplified and formalized way to expose web app functionality as standardized Open Geospatial Consortium (OGC) Web Processing Services (WPS) alongside a web app’s graphical user interface. Three Tethys web apps are developed to demonstrate how web app functionality(s) can be exposed as WPS using Tethys WPS Server, and to show how these WPS can be coupled to build a complex modeling web app. In sum, this dissertation explores new computational approaches and software tools to advance global streamflow prediction and integrated environmental modeling.

Keywords: global streamflow prediction, environmental modeling, model interoperability, web service, vector-based river routing, container

Page 4: Advancing the Accessibility, Reusability, and

ACKNOWLEDGEMENTS

First and most, I am thankful to Daniel P. Ames who worked hard as my advisor for his

patience, guidance, and generous financial support throughout my PhD study. Dr. Ames is the

most supportive advisor I know. He has given me the most freedom to pursue my research

interests and always provides insightful discussions and helpful suggestions. I especially

appreciate his guidance and being a good role model for me in doing research and writing

papers, which gives me the confidence to continue pursuing my academic career. I am very

grateful to E. James Nelson for inviting me to participate in the global streamflow prediction

research in the last year of my PhD, as well as his guidance and funding support. I also would

like to thank Norman L. Jones, Gustavious P. Williams, and Rollin H. Hotchkiss for their help

and suggestions as members of my advisory committee.

Second, I am thankful to have had the opportunity to collaborate with great colleagues from

BYU Hydroinformatics lab. Especially who worked with me: Chris Edwards, Corey Krewson,

Giovanni Bustamante, Jake Nelson, Jeffrey Sadler, Jiri Kadlec, Jorge Sánchez, Matthew Bayles,

Michael Souffront, Nathan Swain, Sarva Pulla, Scott Christensen, Shawn Crawley, Spencer

McDonald, Stephen Duncan, Rohit Khattar, Wade Roberts, and Yue Shen.

Special thanks to my husband, Zhiyu Li, who has provided generous help and support on

both my academic and personal life. He is always the first person to discuss my research ideas

and the primary resource to get my technical questions answered. Thanks for his faith in me and

his unconditional love and companionship in this long journey. Many thanks to my parents for

helping me take care of my son and for their prompt encouragement during my bad days.

Page 5: Advancing the Accessibility, Reusability, and

Finally, I would like to respectively acknowledge the people and funding support

contributed to each chapter of this dissertation.

The research presented in Chapter 2 and Chapter 3 was supported by NASA ROSES

SERVIR Applied Sciences Team Research Grant No. NNX16AN45G. In Chapter 2, E. James

Nelson coordinated the research. Daniel P. Ames assisted with the research and contributed to

the manuscript. Zhiyu Li provided computational and technical support. Gustavious P. Williams

provided statistical support and greatly improved the manuscript. Wade Roberts, Jorge Sánchez

and Chris Edwards contributed to the experimental studies. Dr. Cédric H. David and Dr. Mir A.

Matin provided valuable suggestions on the manuscript. In Chapter 3, Zhiyu Li and Daniel P.

Ames provided key ideas and contributed to the manuscript. E. James Nelson provided financial

support. Fengyuan Zhang and Min Chen from the OpenGMS group at Nanjing Normal

University provided technical support on deploying the OpenGMS Wrapper System.

The research presented in Chapter 4 was funded by NSF HydroShare Grant No. ACI-

1148453. Zhiyu Li provided significant code contributions. Daniel P. Ames contributed

extensively to the writing. E. James Nelson and Nathan Swain provided significant intellectual

support in terms of the case study and Tethys Platform integration. I also acknowledge the

comments and feedback from Norman L. Jones, Tethys Steering Committee, and BYU

Hydroinformatics Lab on the software.

Xiaohui Qiao

Page 6: Advancing the Accessibility, Reusability, and

v

TABLE OF CONTENTS

LIST OF TABLES ........................................................................................................................ vii LIST OF FIGURES ..................................................................................................................... viii 1 Introduction ............................................................................................................................. 1

Problem Statement .......................................................................................................... 1

Research Objectives ........................................................................................................ 5

Outline of Dissertation .................................................................................................... 7

2 A Systems Approach to Routing Global Gridded Runoff through Local High-resolution Stream Networks for Flood Early Warning Systems ...................................................................... 9

Introduction ..................................................................................................................... 9

Methods......................................................................................................................... 14

2.2.1 Grid-to-vector Mapping ............................................................................................ 14

2.2.2 RAPID River Routing Model ................................................................................... 17

2.2.3 Experimental Design ................................................................................................. 20

2.2.3.1 Comparison with GloFAS......................................................................... 20

2.2.3.2 Sensitivity of Watershed Resolution ......................................................... 22

Results ........................................................................................................................... 24

2.3.1 Comparison with GloFAS......................................................................................... 24

2.3.2 Sensitivity of Watershed Resolution ......................................................................... 29

Discussion and Conclusions ......................................................................................... 32

3 A Container-based Approach for Sharing Complex Environmental Models as Web Services ......................................................................................................................................... 36

Introduction ................................................................................................................... 36

Methods......................................................................................................................... 40

3.2.1 Approach Design ...................................................................................................... 40

3.2.1.1 OpenGMS Wrapper System (OGMS-WS) ............................................... 40

3.2.1.2 Docker ....................................................................................................... 41

3.2.1.3 OGMS-WS-Docker Approach .................................................................. 43

3.2.2 Experimental Design ................................................................................................. 44

3.2.2.1 Streamflow Prediction System .................................................................. 44

3.2.2.2 Streamflow Prediction Service Design ..................................................... 45

3.2.2.3 Service Application in Bangladesh Historical Streamflow Analysis ....... 50

Results ........................................................................................................................... 52

3.3.1 Streamflow Prediction Service ................................................................................. 52

Page 7: Advancing the Accessibility, Reusability, and

vi

3.3.2 Bangladesh Historical Streamflow Analysis ............................................................ 55

Discussion & Conclusion .............................................................................................. 59

4 Simplifying the Deployment of OGC Web Processing Services (WPS) for Environmental Modeling – Introducing Tethys WPS Server ....................................................... 62

Introduction ................................................................................................................... 62

Methods......................................................................................................................... 67

4.2.1 System Design .......................................................................................................... 67

4.2.2 System Implementation ............................................................................................ 71

4.2.3 Experimental Case Study Design.............................................................................. 76

4.2.3.1 HydroProspector Use Case ....................................................................... 76

4.2.3.2 WPS Client Use Case................................................................................ 81

Results ........................................................................................................................... 81

4.3.1 Tethys WPS Server ................................................................................................... 81

4.3.1.1 Interaction with Tethys WPS Server......................................................... 83

4.3.2 Case Study Results .................................................................................................... 85

4.3.2.1 HydroProspector Use Case ....................................................................... 85

4.3.2.2 Third-party WPS Client Use Case ............................................................ 87

4.3.3 Quantitative Analysis ................................................................................................ 88

4.3.4 Qualitative Analysis .................................................................................................. 90

Discussion ..................................................................................................................... 92

Conclusion .................................................................................................................... 95

5 Conclusions ........................................................................................................................... 97

References ................................................................................................................................... 102

Appendix A. Software Availability ........................................................................................... 117

1. New Software Developed in the Dissertation .................................................................... 117

2. Third-party Software Used in the Dissertation .................................................................. 118

Page 8: Advancing the Accessibility, Reusability, and

vii

LIST OF TABLES

Table 2-1: Catchment Basin Properties ........................................................................................ 31

Table 2-2: The Statistical Results of Comparing Streamflow Simulations at Different

Resolutions ............................................................................................................................ 32

Table 3-1 Content of the Model Description Language (MDL) File............................................ 41

Table 3-2: Variable Design of the Streamflow Prediction Service............................................... 48

Table 3-3: Selected Reach Segments ............................................................................................ 56

Table 4-1: Server-Side Implementation of OGC WPS Specification Standard ........................... 72

Table 4-2: Tethys WPS Server HTTP GET + KVP Request APIs .............................................. 84

Table 4-3: Watershed Delineation Process WPS & Reservoir Calculation Process WPS on

Tethys WPS Server................................................................................................................ 87

Table 4-4: Language Composition Comparison ........................................................................... 88

Page 9: Advancing the Accessibility, Reusability, and

viii

LIST OF FIGURES

Figure 2-1: Comparison of grid-based river routing (left) and vector-based river routing (right).

Note: the grid-based routing consists of two steps: first, the runoff in each grid cell

is routed to the nearest downstream river grid cell (black arrows are flow directions

between grids, blue grids represent the river network); then, the water is routed in each

river grid cell (red arrows are the flow directions in the river). In the vector-based

routing, water is directly routed in the river network (blue lines) following the flow

direction (red arrows) ............................................................................................................ 11

Figure 2-2: Area-weighted grid-to-vector mapping method ......................................................... 16

Figure 2-3: The workflow of calculating a weight table for a gridded LSM runoff ..................... 16

Figure 2-4: Schematic diagram of mapping gridded runoff to river networks ............................. 18

Figure 2-5: Schematic diagram of GloFAS reanalysis and ERA- RAPID ................................... 22

Figure 2-6: A map of the stations where RAPID routing results were compared with GloFAS.

Initially 100 stations were selected (all points) however, 20 stations (grey points) were

not used based on contributing area differences between the grid-based and vector-based

stream node. Eighty (80) stations were evaluated (red points) .............................................. 25

Figure 2-7: Comparison of GloFAS reanalysis data with ERA-RAPID results on 35-year

cumulative volume per unit area (CVPUA). ......................................................................... 26

Figure 2-8: The KGE distribution of comparing ERA-RAPID with GloFAS.............................. 26

Figure 2-9: The R distribution of comparing ERA-RAPID with GloFAS ................................... 27

Figure 2-10: Comparison between ERA-RAPID and GloFAS at Dipayal station, Nepal (up)

and Rocky Reach Dam station, United States (down). Red lines represent the results of

routing ERA-Interim/Land data with RAPID, while black lines represent GloFAS

reanalysis results. The left figures show the comparison of 35 years’ streamflow, the

right figures show the comparison of daily average streamflow. .......................................... 29

Figure 2-11: Experimental watersheds in low, medium and high resolutions .............................. 30

Figure 3-1: OGMS-WS workflow diagram .................................................................................. 41

Figure 3-2: The OGMS-WS-Docker approach ............................................................................. 43

Figure 3-3: Workflow of the Streamflow Prediction System (grey boxes are inputs) ................. 45

Page 10: Advancing the Accessibility, Reusability, and

ix

Figure 3-4: Interactions between user, service interface, OGMS-WS, and the model

container ................................................................................................................................ 47

Figure 3-5: The streamflow prediction service deployment package ........................................... 50

Figure 3-6: Continued OGMS-WS GUIs ..................................................................................... 53

Figure 3-7: Streamflow prediction service GUIs .......................................................................... 54

Figure 3-8: High resolution river network in Bangladesh ............................................................ 56

Figure 3-9: Streamflow monthly distribution from 1995 to 2014 ................................................ 57

Figure 3-10: Simulated streamflow of selected reach segments ................................................... 58

Figure 4-1: Tethys WPS Server system design............................................................................. 69

Figure 4-2: General system architecture of Tethys WPS Server .................................................. 70

Figure 4-3: PyWPS workflow ....................................................................................................... 73

Figure 4-4: Integration of PyWPS in Tethys Platform ................................................................. 74

Figure 4-5: WPS access to the algorithm of a Tethys app ............................................................ 75

Figure 4-6: HydroProspector app workflow ................................................................................. 77

Figure 4-7: Watershed delineation process diagram ..................................................................... 78

Figure 4-8: Reservoir calculation process diagram ...................................................................... 79

Figure 4-9: Interactions between User, Template, Controller, and Tethys WPS Server in the

HydroProspector App ............................................................................................................ 80

Figure 4-10: WPS definition template file. The green box contains metadata of the service;

the blue box contains the definitions of inputs and outputs; the red box contains the

actual operation, where can directly call functions in the Tethys app. .................................. 82

Figure 4-11: HydroProspector for Dominican Republic App User Interface ............................... 86

Figure 4-12: The comparison of compute time between HydroProspector app and Storage

Capacity Service app ............................................................................................................. 89

Figure 4-13: Academic status (left) and environmental web app developing experience (right)

of the workshop and survey attendees. .................................................................................. 90

Figure 4-14: Comparison of difficulty of OGC WPS deployment with/without Tethys WPS

Server for survey attendees with different years of web application development

experience (Scale of 1 to 10 where 1 is very easy and 10 is very difficult) .......................... 91

Figure 4-15: Time to deploy WPS processes using Tethys WPS Server for survey attendees

with different years of web application development experience. ........................................ 91

Page 11: Advancing the Accessibility, Reusability, and

1

1 INTRODUCTION

Problem Statement

Flooding is the most frequent natural disasters and a leading cause of natural disaster

fatalities and economic loss worldwide, accounting for over one-third of the total damage and

two-thirds of the impact to people affected by natural disasters (Jonkman 2005; Kousky 2014;

Miao 2019; Tu et al. 2020). Flooding has a long list of impacts, including threatening human life,

causing economic loss, damaging crops (Li et al. 2019; Zhang et al. 2016), triggering fatal

landslides (Martelloni et al. 2012), impacting terrestrial ecosystems (Knapp et al. 2008), stressing

water treatment plants and sewage networks, and affecting public health by inducing outbreaks

of waterborne diseases (Rose et al. 2000). It has been estimated that more than 94 million people

are affected by floods each year globally through property damage, unsafe drinking water,

infrastructure destruction, injury, and loss of life (Emerton et al. 2016). Recent studies have

shown that global climate change is causing more frequent extreme precipitation events in many

regions of the world (Donat et al. 2016; Min et al. 2011; Taylor et al. 2017; Westra et al. 2014),

leading to increased flooding risk. Some studies also have demonstrated a steady increase in

frequency and economic losses of flooding events in recent decades (Hirabayashi et al. 2013;

Schiermeier 2011; Yin et al. 2017).

Effective flood forecasting is becoming increasingly essential for flood management

since it can provide better prediction and timing for people to mitigate the impacts of

Page 12: Advancing the Accessibility, Reusability, and

2

forthcoming hazards. Also, accurate flood forecasting allows for better quantification of

taxpayers’ liabilities, better pricing of crop insurance to enable informed decisions, and assisting

long-term flood control planning and land use management. Effective flood forecasting requires

a precise hydrologic model and sufficient resources, including observation data, computational

infrastructure, and human capacity. Due to limited resources, it is costly or even impossible to

produce accurate and rapid flood predictions in data-scarce areas (Cools et al. 2016; Emerton et

al. 2016). In recent decades, technological advances in earth observations and computational

resources have led to significant improvements in modeling and predicting the hydrologic cycle

at the global scale (Guy J-P. Schumann 2018; Sood and Smakhtin 2015; Ward et al. 2015).

Global runoff modeling results have been used by many researchers to compute river flow and

estimate floods (Alfieri et al. 2013; Hirpa et al. 2018; Wu et al. 2012; Wu et al. 2014). These

studies provide the potential to provide global-scale flood predictions by leveraging global runoff

predictions, filling the need for flood prediction in data-scarce regions.

A practical flood forecasting system requires a high-resolution stream network where

managers can access predictions at local locations. However, global runoff results are typically

generated at relatively low resolutions, which provide very limited information for flood

prediction at local scales. It requires an approach to accurately distribute gridded runoff to flows

at each stream segment and route the resulting flows for streamflow and flood prediction. If the

approach could be developed as an automated system, it could provide data support for

developing countries and ungauged regions that lack sufficient resources to develop the

cyberinfrastructure and human capacity to implement advanced flood forecasting system. There

exist a number of challenges in developing such an automated system at the global scale.

Traditional river-routing models are generally physically-based and grid-based, which require

Page 13: Advancing the Accessibility, Reusability, and

3

significant computing power to perform global scale routing at resolutions fine enough to

provide local-scale solutions (Balazs M. Fekete 2011; Yamazaki et al. 2013). In addition, the

result of traditional routing models is river outflow at each grid cell, which is difficult to convert

to discharge at each river segment for intuitive understanding. Therefore, it is necessary to

advance current methods for large-scale flood forecasting and provide reliable streamflow

predictions at river level for stakeholders and managers to make informed decisions.

A precise model plays an essential role in flood forecasting. With advanced satellite and

computing technologies, there has been growing attention on integrating multidisciplinary data

and models to represent a research question more comprehensively (Bandaragoda et al. 2019;

Chen et al. 2019; Gao et al. 2019). However, it remains challenging to integrate models without

extra efforts because of their heterogeneous formats, execution modes, programming languages,

and supported operating systems. First, environmental models often have high entry barriers. It is

challenging for users to master a new model in a short amount of time, especially models in

unfamiliar domains. Second, low reusability and portability greatly hinder the spreading of

models. Collberg et al. (2014) found that less than 50% of software could even be successfully

installed and model reproducibility significantly reduces year by year because of changing

dependencies and missing third-party resources. Above all, there is an increasing demand to

improve the accessibility, reusability, and interoperability of environmental models (Belete et al.

2017; Voinov and Cerco 2010). Accessibility refers to the ability of a model to be easily

accessed and used. Reusability refers to the ability of a model to remain functional to be used to

develop new models or modeling systems. Interoperability refers to the ability of a model to

work with other models or systems without special effort by the user.

Page 14: Advancing the Accessibility, Reusability, and

4

Model interoperability can be achieved in many ways, such as sharing input and output

files, directly rewriting models into a single software system, or establishing software

architecture principles that facilitate the coupling of independent models (Belete et al. 2017;

Granell et al. 2010). Among these methods, Service-Oriented Architecture (SOA) has gained the

most attention and has been widely used to integrate models and build scientific workflows

(Bosin et al. 2011; Lin et al. 2009; Yue et al. 2016; Zhao et al. 2012b). SOA has significantly

improved dealing with complex problems by decomposing a system into functional components

that communicate via web service (Dubois et al. 2013; Nino-Ruiz et al. 2013; Skøien et al. 2013;

Yang et al. 2009). Each web service in the system can perform an isolated task while the whole

system can address much broader problems (Argent et al. 2006; Castronova et al. 2013;

Schaeffer 2008). The advantages of web services include (1) enabling communication and

interoperation among applications written in different programming languages or running on

different platforms over the Internet (Michaelis and Ames 2009; Nativi et al. 2013); (2)

significantly lowering the entry bar of model execution (Jiang et al. 2017; Xiao et al. 2019;

Zhang et al. 2019).

While numerous environmental web services exist for data distribution and web mapping,

environmental models as web services is still an area of research that has not been widely

investigated and implemented (Belete et al. 2017; Hu et al. 2015; Vitolo et al. 2012; Vitolo et al.

2015). Based on my research, this problem stems from the following three aspects. First, many

studies have demonstrated orchestrating web services for environmental models; however, the

existing literature is more heavily weighted towards specific services for particular models rather

than creating general tools to support publishing models or data analysis workflows as web

services. For example, we wanted to expose existing functions in web apps as web services to

Page 15: Advancing the Accessibility, Reusability, and

5

avoid repetitive work; however, no ready-to-use tools were found for this purpose. Second, there

are some tools can assist publishing analysis workflows as web services, such as GeoServer

(2013), Deegree (Müller 2007), PyWPS (Becchi 2007), 52°North (2018), and Esri ArcGIS.

However, these tools are designed for geospatial analysis in specific programming languages. It

is still challenging to expose complex models as web services, especially models in different

forms, execution modes, programming languages, and supported operating systems. Third, even

with well-designed tools, deploying models can still be a daunting process. Some models are

difficult to set up or even impossible to install in a new environment because of the complex

infrastructures required (network, storage and computing). Some models have conflicts with the

service publishing tool or with each other when installed on the same platform, such as

dependency conflicts, supporting different operating systems/programming languages, etc.

Model isolation management, which refers to isolating each model and preventing the conflicts

between them on the same platform, is still an uninvestigated issue for these service publishing

tools. While these technological problems are particularly evident when considering flood

forecasting, they are also relevant to many other problems in environmental and water resources

engineering including water management, drought modeling and prediction, water resources

planning, environmental restoration, and other related areas. The research objectives presented

below are positioned in terms of flood forecasting but can also be applicable to many, if not all

of these other related areas.

Research Objectives

The following three objectives are defined to address the challenges associated with flood

forecasting, including providing high-resolution global-scale flood predictions and developing

tools to assist web service deployment to facilitate integrated modeling.

Page 16: Advancing the Accessibility, Reusability, and

6

Objective 1: Design and develop a fast and automated method to use runoff from

global/large-scale grid-based land surface models, and to route the runoff into high-resolution

vector-based river networks, ultimately provide river-level streamflow prediction at the global

scale. Based on the limitations of implementing traditional physically-based grid models at the

global scale, I elected to use vector-based river routing models as their demonstrated advantages

in large-scale hydrologic modeling, including computational efficiency, higher resolution

representations of the hydrologic features, and more precise locations for predictions.

Meanwhile, to provide intuitive and useful information for decision-makers, a fast and automated

method is needed to provide reliable streamflow predictions at river level.

Objective 2: Design and develop a tool that can (1) publish heterogeneous complex

environmental models as web services; (2) isolate each model and prevent the conflicts between

them; (3) keep the model(s) functional with changing environment and dependencies. The

developed tool will significantly lower the barrier to publishing models that are different in

execution modes, programming languages, and operating systems as web services, especially for

users with limited programming skills.

Objective 3: Design and develop a tool to simultaneously publish web app functionality

as Open Geospatial Consortium (OGC) Web Processing Services (WPS) when developing a web

app. OGC WPS is the most commonly recognized and has been demonstrated as an efficient

technology for publishing geospatial processes as web services. The developed tool will make

the process of WPS development almost transparent to developers. In this way, users can focus

on algorithm design and web app development and less focus less on the details of the OGC

WPS specification and implementation. In addition, the published WPS services can be directly

Page 17: Advancing the Accessibility, Reusability, and

7

used in developing new web apps or by third-party applications to facilitate integrated

environmental modeling.

Outline of Dissertation

The remainder of this dissertation is structured following a manuscript-based approach as

approved by the BYU Office of Graduate Studies where the following three core chapters are

essentially standalone research articles that have been prepared and submitted for publication in

high quality archival scientific journals. Specifically, Chapters 2 and 4 were published as

standalone papers in Environmental Modelling and Software respectively (Qiao et al. 2019a;

Qiao et al. 2019b) and Chapter 3 is being submitted to International Journal of Digital Earth. A

brief summary of the following chapters is as follows:

Chapter 2 presents the design and development of a new automated computational

system, using existing, well-established open source software tools, that quickly downscales (or

maps) the runoff generated from coarse grid-based land surface models (LSMs) onto high-

resolution vector-based stream networks then routes the results using a vector-based river routing

model. Simulations are conducted with the new system using the ERA-Interim/Land reanalysis

data – a 35-year retrospective gridded runoff data product from the European Center for

Medium-range Weather Forecasts (ECMWF). The simulation results are compared with Global

Flood Awareness System (GloFAS) – a well-established gridded routing model using the same

forcing to assess the credibility of this new system.

Chapter 3 presents a tool to improve publishing complex environmental models as web

services. The OpenGMS Wrapper System (OGMS-WS) is elected as the service publishing tool.

Docker container is used as the model isolation tool to keep the model functional and portable. A

Page 18: Advancing the Accessibility, Reusability, and

8

streamflow prediction service is deployed using this tool and used to analyze the historical

streamflow tendency in Bangladesh.

Chapter 4 presents the development and testing of a ready-to-use WPS implementation

called Tethys WPS Server, which provides a formalized way to expose web app functionality as

standardized WPS alongside a web app's graphical user interface. The WPS server is created

based on Tethys Platform by leveraging PyWPS. Three Tethys web apps are developed to

demonstrate how web app functionality(s) can be exposed as a WPS using Tethys WPS Server,

and to show how these WPSs can be coupled to build a complex modeling web app.

Chapter 5 provides a summary and conclusion on all the work presented in this

dissertation and outlines opportunities for future research.

Software availability is presented in the appendix to describe all the software and web

apps which have been developed and presented in each chapter and all the third-party software

which has been used in this dissertation.

Page 19: Advancing the Accessibility, Reusability, and

9

2 A SYSTEMS APPROACH TO ROUTING GLOBAL GRIDDED RUNOFF

THROUGH LOCAL HIGH-RESOLUTION STREAM NETWORKS FOR

FLOOD EARLY WARNING SYSTEMS

Introduction

Flood early warning systems (FEWS) are important tools for flood management since

they can provide greater prediction and timing of forthcoming hazards so that populations are

more prepared before and more resilient after flood events. Effective FEWS that provide flood

warnings with a reasonable lead time are mostly established at catchment-scale, usually in areas

where sufficient data, technology and human resources are available (Basha and Rus 2007; Guy

J-P. Schumann 2018; Latt and Wittenberg 2014). Due to limited resources, it is costly or even

impossible to build effective and sustainable FEWS in data-scarce areas (Cools et al. 2016;

Emerton et al. 2016). In recent decades, the considerable advances in satellite technology and

high-performance computing has led to significant improvements in modeling and predicting the

hydrologic cycle at the global scale (Guy J-P. Schumann 2018; Sood and Smakhtin 2015; Ward

et al. 2015). The hydrologic processes are usually land surface models (LSMs) that simulate the

water and energy interactions between the atmosphere and earth surface. The LSMs use the

meteorological forcing from weather prediction models as input and provide output as water

balance parameters from the earth’s surface including evapotranspiration, soil moisture, snow,

and runoff (Pitman 2003).

Page 20: Advancing the Accessibility, Reusability, and

10

Among these land surface results, runoff has received most attention because of the

importance of streamflow to society (Bai et al. 2016). Many researchers have performed river

routing using runoff forecasts from LSMs to compute river flow and estimate floods (Alfieri et

al. 2013; Hirpa et al. 2018; Wu et al. 2012; Wu et al. 2014). These studies demonstrate the

potential to develop global or large-scale FEWS by leveraging global runoff predictions. If these

approaches could be automated, they could provide data support for developing countries and

ungauged regions that lack sufficient resources to develop the cyberinfrastructure and human

capacity to implement advanced flood prediction systems. However, the relatively low

resolutions of these LSMs are not practical for flood prediction at local scales. Developing

practical flood prediction systems requires higher resolution stream networks where managers

can select predictions at relevant locations. The development of such systems presents a number

of challenges. First, river routing models have traditionally used a gird-based scheme in which

river segments (calculation units) are discretized as grids (Balazs M. Fekete 2011; Yamazaki et

al. 2013). Significant computational resources are required to perform global or large-scale grid-

based routing at resolutions fine enough to provide local-scale solutions. Second, to better

simulate the hydrodynamic process in river channels, river routing models have been improved

increasingly to include more and more complex physical processes (Shaad 2018; Yamazaki et al.

2013). It is still challenging to implement complex physically-based river routing models at the

global scale because they have more parameters and higher requirement for observation data and

computing power to run and calibrate (Younis and De Roo 2010), which is especially difficult in

data scarce areas. Last, the result of grid-based routing models is river discharge at each grid cell.

It is a challenge to convert from gridded outflow volumes to estimated river discharge at a local

scale so that the benefits of the global models can be maximized. For these systems to be useful

Page 21: Advancing the Accessibility, Reusability, and

11

for flood prediction, it is essential to provide streamflow at a fine resolution or at a river-level

scale.

Vector-based river routing modeling is receiving more attention as it facilitates large-

scale hydrological predictions at finer resolutions and some large-scale models have started to

shift from a grid-based environment towards to a vector-based environment for predictions

(David et al. 2011b; David et al. 2013; Lin et al. 2018). The difference between vector-based

river routing and grid-based river routing is shown in Figure 2-1.

Figure 2-1: Comparison of grid-based river routing (left) and vector-based river routing (right). Note: the grid-based routing consists of two steps: first, the runoff in each grid cell is routed to the nearest downstream river grid cell (black arrows are flow directions between grids, blue grids represent the river network); then, the water is routed in each river grid cell (red arrows are the flow directions in the river). In the vector-based routing, water is directly routed in the river network (blue lines) following the flow direction (red arrows)

The grid-based river routing defines the river network as a set of connected grid cells and

performs the streamflow routing simulation in each river grid cell. In the vector-based river

Page 22: Advancing the Accessibility, Reusability, and

12

routing, the river network is represented as lines and the calculations are implemented in each

river segment. In both cases (grid-based and vector-based) river routing calculations require

parameterization of the computational elements including estimates of cross-sectional geometry,

surface roughness, and slope. For grid-based routing, these parameters are estimated for each

grid cell, whereas for vector-based routing, each stream segment is parameterized separately.

Several recent studies have implemented vector-based river routing with large-scale

LSMs and demonstrated its feasibility and flexibility. David et al. (2011a) replaced the grid river

routing scheme in the SIM-France model with Routing Application for Parallel computatIon of

Discharge (RAPID) and obtained comparably accurate simulations but higher model efficiency.

Lehner and Grill (2013) developed a vector river routing model, HydroROUT, and coupled it

with a global river network database, HydroSHEDS, to support global-scale eco-hydrological

modeling. Mizukami et al. (2016) developed a river network routing tool, mizuRoute that post-

processes the runoff outputs from LSMs and performs continental-scale streamflow simulations.

This tool can use both grid-based and vector-based river networks. The authors demonstrated its

capability to produce streamflow on a vector-based river network over the contiguous United

States (CONUS). Snow et al. (2016) implemented RAPID on the runoff generated by the

ECMWF model (Molteni et al. 1996) to develop a US national-scale streamflow prediction web

application. Tavakoly et al. (2017) performed river flow modeling in Mississippi river basin

using RAPID and high-resolution NHDPlus river data. Their validation results showed that

RAPID has a satisfactory performance in continental-scale river routing. Lin et al. (2018)

integrated RAPID with the community WRF-Hydro framework for continental-scale flood

discharge modeling and demonstrated its computational efficiency and reasonable accuracy in

predicting flood discharge during Hurricane Ike in 2008. In general, the advantages of vector-

Page 23: Advancing the Accessibility, Reusability, and

13

based routing models compared with grid-based models, especially at a large-scale, include

computational efficiency, higher resolution representations of the hydrological features, and

more precise locations for predictions. Research has shown that routing through a vector-based

representation of a river network has more advantages and flexibilities for large spatial domains

than routing through a grid-based representation of river networks at coarse resolutions (Lehner

and Grill 2013; Lin et al. 2018; Mizukami et al. 2016; Singh et al. 2015).

To date, global LSMs have typically been generated at such coarse resolutions that, at

least for streamflow, they provide little value at local scales when used for flood prediction and

warning. At the same time, developing regions of the world lack observational data,

computational infrastructure, and the human capacity to create actionable streamflow, and by

extension, flood forecasts. Global gridded runoff models could fill the need for flood prediction

in developing regions if the large-scale gridded results could be made useful at local scales. This

requires the development and integration of a system that can take output from the LSMs,

accurately distribute flows to vector-based stream segments, and route the resulting flows for

streamflow and flood prediction.

I have designed and developed a computational system, using existing, well established

open source tools, that routes runoff generated from large scale grid-based modeling into high-

resolution vector-based river networks ultimately providing FEWS support for flood

management in data-scarce regions. The contribution of this work to this field is in developing

the tools and data structures to create the automated system, and to develop new algorithms to

distribute grid flow to vector stream segments. This chapter presents both the new system to

provide automated flood forecasts and the method for distributing grid-based flows to vector

stream segments. Experimental studies are conducted to demonstrate the feasibility of the system

Page 24: Advancing the Accessibility, Reusability, and

14

and its flexibility in providing useful flood prediction over large regions. The results have

demonstrated that this system can produce more useful predictions as they can be computed at

more precise locations than grid-based systems.

Section 2.2 presents the method for grid-to-vector flow mapping and describe the

experimental design I used to validate this method. Section 2.3 presents the results of this

integrated system and validation experiments. Section 2.4 provides a detailed discussion on the

results and describes the benefits of this integrated system for flood prediction in data scarce

regions. Section 2.4 also provides conclusions and potential areas for future work. This chapter

describes an integrated system for using LSM data to create useful flood predictions at a local

scale. The contributions of this work are developing the integrated system and creating a new

method for distributing course resolution grid cell flows to high-resolution vector stream

segments. As part of developing this system, I extended the RAPIDpy vector routing model

(Snow et al. 2017) to accept input from additional LSM models.

Methods

2.2.1 Grid-to-vector Mapping

To develop a system that couples course-resolution LSMs with high-resolution vector-

based river routing models is to determine how to distribute the grid-generated runoff data to the

correct river segment. This is relatively straightforward when coupling grid-based run-off with

grid-based river routing models, where runoff from the LSM grid is distributed as river inflow to

the nearest downstream river grid cell. For vector-based routing, unlike grid-based river routing

methods, no such one-to-one relationship exists to distribute the gridded runoff and with the

Page 25: Advancing the Accessibility, Reusability, and

15

correct vectorized river segments. For example, one grid cell might contribute to multiple river

segments or one river segment might accept runoff from multiple grid cells. These complex

relationships make runoff distribution complicated. One solution, based on physical processes, is

to determine the contributing catchment for each river segment and use the runoff from those

grid cells to calculate the water inflow of each river segment.

Studies have adopted different methods for coupling gridded LSMs with vector-based

routing models. David et al. (2013) mapped NLDAS2 gridded runoff to the United States

NHDPlus river network using a catchment centroid-based method. The inflow of a catchment

was calculated using the runoff value of the grid cell where the catchment centroid was located.

Snow et al. (2016) adopted an area-weighted method to convert global runoff depths generated

by the ECMWF model to the runoff volume of each NHDPlus catchment. Lin et al. (2018)

compared the area-weighted method with the catchment centroid-based method and concluded

that the model is more sensitive to the grid-to-vector coupling interface than the grid resolution.

The area-weighted coupling exhibits better results for high-resolution gridded forcing data,

especially when the grid cells of the forcing are much smaller than the catchments.

I used an area-weighted grid-to-vector method, adapted from Snow et al. (2016) shown

as Figure 2-2. This method employs a weight table to convert the calculated gridded runoff

depths to the inflow of each catchment. The weight table contains the area ratio of each LSM

grid cell to the intersected catchments. This area ratio is referred as “weight” in this context.

Figure 2-3 shows the workflow for calculating a weight table for gridded LSM runoff data. First,

a Thiessen polygon feature is computed based on the coordinates (latitude and longitude

attributes) stored with the runoff data in the NetCDF file. Figure 2-2 shows example Thiessen

polygons as grids squares. Each polygon is a square with the given coordinates as centroids.

Page 26: Advancing the Accessibility, Reusability, and

16

Next the algorithm intersects the polygon with data from a drainage file that contains an

identifier for each river segment (the river identifier is referred to as COMID in this research). In

this way, we can obtain a polygon feature that has the attributes of geodesic area and the

intersecting catchment COMID. Finally, the weight table is generated from the intersected

polygon features.

Figure 2-2: Area-weighted grid-to-vector mapping method

Figure 2-3: The workflow of calculating a weight table for a gridded LSM runoff

Page 27: Advancing the Accessibility, Reusability, and

17

Then, the Inflow of a catchment is computed as the sum of all the runoff products from

each intersected LSM grid cell [runoff (i,j)], the area of the LSM grid cell (L2, and L refers to the

grid size), and the weightk of each intersected grid cell [shown as Eq. (2-1), k refers to the

number of grid cells intersected with the catchment].

𝐼𝑛𝑓𝑙𝑜𝑤 = ∑ [𝑟𝑢𝑛𝑜𝑓𝑓 (𝑖, 𝑗) × 𝐿2 × 𝑤𝑒𝑖𝑔ℎ𝑡𝑘]𝑛𝑘=1 (2-1)

2.2.2 RAPID River Routing Model

RAPID is a vector-based river routing model developed by David et al. (2011a) that uses

a matrix-based version of the Muskingum method to simulate the water flow through a vector-

based river network that can range from watershed-scale to global-scale (David et al. 2011b).

The Muskingum routing method has been widely used in vector-based river routing because of

its simplicity and low computational cost compared with other methods (Gill 1978; Tung 1985).

Since its first formal release, RAPID has been used and verified in a number of studies including

continental-scale, high-resolution flow modeling and operational flood forecasting, computation

of river height at the regional scale and other hydrological topics (David et al. 2016). A set of

assisting tools exists to lower the barrier to operating RAPID by users from different disciplines,

including an ArcGIS toolset (Ding 2016) developed by Esri for RAPID input data preprocessing,

and an open source software tool RAPIDpy developed by Snow et al. (2017) to assist in

preparing inputs in the required formats and running RAPID. RAPIDpy supports several large-

scale LSMs, including ECMWF, ERA-Interim (Dee et al. 2011), NLDAS (Xia et al. 2012) and

GLDAS (Lorenz et al. 2015). I extended RAPIDpy to support more models, including ERA5

(Karl Hennermann 2018), HIWAT (Gatlin et al. 2018), and COSMO (Rockel et al. 2008).

Page 28: Advancing the Accessibility, Reusability, and

18

Figure 2-4: Schematic diagram of mapping gridded runoff to river networks

RAPID is developed in FORTRAN and compiled as a dynamic link library (DLL) whose

methods can be called or executed from external software. The RAPID required inputs (shown as

the gray boxes in Figure 2-4) include (1) a set of forcing files of LSM surface and subsurface

runoff; (2) a model initialization file describing the streamflow of each river at the start time of

the simulation; (3) a file describing the water inflow from surface and subsurface runoff into the

upstream point of each river reach; (4) a file documenting the river network's topological

connectivity; (5) two files defining the Muskingum routing parameters (k and x), and other basic

information such as the internal time step and duration of the simulation. All the information of

Page 29: Advancing the Accessibility, Reusability, and

19

required inputs and simulation settings are documented in a “namelist” text file for RAPID to

read at runtime.

River connectivity and Muskingum parameters can be calculated by third-party GIS

software, such as ArcGIS (proprietary), TauDEM (open source) and others. The ArcGIS

extension, Arc Hydro has a tool called “Dendritic Terrain with Unknown Stream Location” that

can delineate watersheds and generate a stream network shapefile from a flow direction file and

a flow accumulation file. These files are directly generated from a DEM file through the “Flow

direction” tool and “Flow accumulation” tool in ArcGIS. Arc Hydro has a tool called “Calculate

the Muskingum Parameters”, which can estimate the Muskingum k and x values for each stream

segment. The value of k is associated with the flow travel time through a stream and calculated

using Eq. (2-2) by multiplying the user-given factor λk by the length of the stream segment over

the flow wave velocity. The value of x for each stream segment is assigned based on the user-

given factor λx.

𝑘𝑗 = 𝜆𝑘 ×𝐿𝑗

𝑣; 𝑥𝑗 = 𝜆𝑥 × 0.1 (David et al. 2013) (2-2)

Where kj and xj are the Muskingum parameters for reach j, Lj is the length of a river

reach and v is flow wave celerity (default value in this tool is 1 km/h or 0.28 m/s). k and x are

two multiplying factors later determined by the RAPID optimization procedure (default value of

k is 0.35, default value of λx is 3). The files of river connectivity and Muskingum parameters

files only need to be generated once for a river network. Then, the weight table, described in

section 2.2.1, is generated through the ArcGIS RAPID preprocessing toolset. All the above steps

only need to be performed once for a river network and an incorporated LSM.

Page 30: Advancing the Accessibility, Reusability, and

20

Once these data have been generated, RAPIDpy is used to write the Muskingum and river

connectivity files in the required formats from the ArcGIS-processed results, create catchment

inflow files with the weight table, populate the “namelist” file, and run RAPID. When RAPIDpy

is used in a forecasting mode, after the simulation is complete, RAPIDpy generates an

initialization file from the streamflow results for the next forecast model run. This offers users

the flexibility to extract different data from the initialization file, including the result of a specific

day or seasonal averages.

2.2.3 Experimental Design

2.2.3.1 Comparison with GloFAS

The Global Flood Awareness System (GloFAS), developed by ECMWF and the Joint

Research Centre of the European Commission, is a coupled hydro-meteorological model that

generates global streamflow predictions for large-scale river basins (Alfieri et al. 2013). GloFAS

provides daily streamflow forecasts of up to 30 days by using the LISFLOOD hydrological

model forced by the surface and subsurface runoff from the ECMWF Integrated Forecast System

(IFS) meteorological forecasts. LISFLOOD is a grid-based routing model that can separately

simulate different hydrological processes that occur in large river basins (Younis and De Roo

2010). The LISFLOOD processes activated in GloFAS include the simulation of groundwater

storage, groundwater flow, and flow routing in river channels (Hirpa et al. 2018). GloFAS also

provides a long-term reanalysis dataset (1980/01 to 2017/12) with daily streamflow at the global

scale with a gridded spatial resolution of 0.1. This reanalysis dataset uses the same hydrological

model but is forced by the runoff from ERA-Interim/Land. ERA-Interim/Land is a global land

surface reanalysis dataset with parameterization-improved HTESSEL land surface model

Page 31: Advancing the Accessibility, Reusability, and

21

(Balsamo et al. 2009; E. L. Wipfler 2011) driven by meteorological forcing from the ERA-

Interim atmospheric reanalysis and observed precipitation adjustments (G. Balsamo 2015). As

shown in Figure 2-5, the HTESSEL model is implemented to simulate the water and energy

fluxes between land surface and atmosphere and estimate the surface and subsurface runoff

required for river routing (Caixin Wang 2018; Clement Albergel 2018; Dee et al. 2011). The

resolution of ERA-Interim/Land surface and subsurface runoff are both 80 km. In GloFAS, the

LISFLOOD model is set up on global coverage with horizontal grid resolution of 0.1 to better

represent the hydrological process at large river basin scale (Shaw et al. 2005). So the gridded

surface and subsurface runoff from ERA-Interim/Land is resampled from 80 km to 0.1 to be

used as input of the LISFLOOD model.

It has been demonstrated that GloFAS can skillfully detect hazardous events in large river

basins and also provide a reasonable streamflow forecast in most parts of the world (Alfieri et al.

2013; Hirpa et al. 2018). To evaluate the performance of this system, using RAPIDpy, for

global-scale streamflow prediction, I routed the surface and subsurface runoff of 35-year

(1980/01-2014/12) ERA-Interim/Land data and compared the resulting routed streamflow with

GloFAS reanalysis data (shown as Figure 2-5). The primary benefit of using this RAPIDpy-

based system with vector-based stream routing instead of relying only on GLoFAS, is that the

gridded surface and subsurface runoff is mapped to a high-resolution river network that includes

streams in much smaller basins than GloFAS. This means that flood predictions can be computed

at more precise locations. Section 2.3.1 provides the results of this experiment.

Page 32: Advancing the Accessibility, Reusability, and

22

Figure 2-5: Schematic diagram of GloFAS reanalysis and ERA- RAPID

2.2.3.2 Sensitivity of Watershed Resolution

Watershed boundaries and river networks are the basis for vector-based river routing. For

many countries and regions in the world, it is difficult and expensive to implement in-situ

measurements to collect hydrographic data, especially at a large spatial scale. For large-scale or

global hydrological modeling, the watershed and river network maps are normally delineated

from digital elevation model (DEM) files. For this method, the resolution and accuracy of the

river network entirely depends on the resolution of the DEM file. In recent years, tremendous

improvements have been made in the availability, quality, and resolution of large-scale

hydrographic datasets due to the availability of large-scale or global earth data obtained from

satellite remote sensing technologies, this includes high-resolution DEM data. The most well-

Page 33: Advancing the Accessibility, Reusability, and

23

known versions of large-scale hydrographic datasets include the US National Hydrography

Dataset Plus Version 2 (McKay 2012), the Australian Hydrological Geospatial Fabric (Atkinson

2008), the European catchments and Rivers network system (Ecrins) (Agency 2012), and the

global-scale hydrographic dataset - HydroSHEDS (Gong et al. 2011). In summary, several

hydrographic datasets exist at different resolutions allowing users to choose the appropriate

dataset based on data availability and study purposes.

To satisfy river continuum (e.g., mass balance), streamflow at each location of the river

network is influenced by upstream processes (Vannote et al. 1980). In RAPID, each reach

segment or catchment is a modeling unit. Each reach segment gets lateral inflow determined

from the catchment runoff and routes these flows to downstream segment. How these catchment

runoff flows are distributed to the stream segments depends on the algorithm used, the size of the

catchment, and the resolution of the river segments. The same outlet of a watershed can have

different numbers of upstream segments under different hydrographic data resolutions (shown as

Figure 2-11). I performed an experiment to evaluate whether the discharge at the same outlet is

affected by the number of upstream segments it has. In other words, the sensitivity of a vector-

based river routing model to the resolution of the hydrographic data. I selected several

watersheds in CONUS with the following criteria: (1) cover an area of several hundred square

miles, (2) have an outlet that corresponds with a stream gauge, and (3) a be a relatively

unregulated area (i.e., with no major reservoirs or diversions). I delineated each watershed at

low, medium, and high resolutions. Then I routed the 35-year ERA-Interim/Land reanalysis data

using RAPID for each of the watersheds defined at different resolutions. The simulated results

are compared and presented in section 2.3.2.

Page 34: Advancing the Accessibility, Reusability, and

24

Results

2.3.1 Comparison with GloFAS

Our group has implemented this system for South Asia, Africa, South America, and

CONUS. To validate the model performance, I randomly selected 100 GloFAS reporting stations

across these areas from GloFAS website (http://www.globalfloods.eu/glofas-forecasting/). Figure

2-6 shows the selected stations. The reason for using GloFAS reporting stations is that those

stations provide the upstream area used in the GloFAS routing model. One challenge I faced

comparing GloFAS reanalysis data with the ERA-RAPID simulated results is that GloFAS uses

grid-based river routing, while the ERA-RAPID system uses vector-based river routing. The

challenge is to correctly match GloFAS grid cells with the vector representation of the streams

used in the RAPID routing scheme. The method for matching the outlet points was to select,

from RAPID model, the largest stream among the streams that interact with the GloFAS grid

cell. Then I calculated the upstream area of the selected stream using watershed delineation

procedures using the same DEM data that generated the streams. I compared the upstream areas

calculated using this method with the upstream areas of these stations provided by GloFAS. I

assumed that stations with an upstream area difference less than 10% between these two models

are matched. I based this threshold by considering that the resolution of GloFAS is 0.1 and the

streams used in this experiment are delineated from 3-arc-sec HydroSHEDs DEM data. I initially

identified 100 GloFAS stations for comparison (all the points in Figure 2-6), of these, 20 points

did not meet the selection criteria (red points in Figure 2-6). Based on above criteria, I selected

and evaluated 80 stations (shown as grey points in Figure 2-6). For each station, I calculated 35-

year cumulative volume per unit area (CVPUA) from GloFAS reanalysis data and ERA-RAPID

simulation results. I compared the 35-year CVPUA of 80 stations from the two models, shown in

Page 35: Advancing the Accessibility, Reusability, and

25

Figure 2-7. The CVPUA values of two models are highly correlated with a high R2 of 0.9818.

This statistic provides additional evidence that the GloFAS and RAPID locations are good

matches at these 80 stations.

Figure 2-6: A map of the stations where RAPID routing results were compared with GloFAS. Initially 100 stations were selected (all points) however, 20 stations (grey points) were not used based on contributing area differences between the grid-based and vector-based stream node. Eighty (80) stations were evaluated (red points)

To compare the two streamflow simulations, I calculated two statistical skill scores,

including the Kling-Gupta efficiency (KGE2012) and the Pearson correlation coefficient (R), for

each station both for the GloFAS reanalysis and the ERA-RAPID. These two metrics are

computed using the Hydrostats library of the HydroErr package (Jackson et al. 2019; Roberts et

al. 2018), a python package that provides statistical analysis for hydrological prediction models.

Page 36: Advancing the Accessibility, Reusability, and

26

Figure 2-7: Comparison of GloFAS reanalysis data with ERA-RAPID results on 35-year cumulative volume per unit area (CVPUA).

Figure 2-8: The KGE distribution of comparing ERA-RAPID with GloFAS

Page 37: Advancing the Accessibility, Reusability, and

27

Figure 2-9: The R distribution of comparing ERA-RAPID with GloFAS

For these results, the median of KGE is 0.73 and the median of R is 0.92. The two

simulations are highly correlated for 89% of the stations (with R > 0.7) and well-fitted for 70%

of the stations (with KGE > 0.5). Figure 2-8 and Figure 2-9 respectively show the distribution of

these two skill scores in the research regions. The simulations show consistency between the two

methods in South Asia, most of South America and Africa, and the west coast and east coast of

CONUS. There are a number of stations in the middle part of CONUS with negative KGE and

low R values. I found these stations all have relatively low CVPUA values, therefore the poor fit

in these stations might because small volume of water that was routed with different models. It

also might be due to the fact that the comparison node for the grid-based system is at the center

of the cell, and the node for the vector-based system is on the stream line, even though I used

Page 38: Advancing the Accessibility, Reusability, and

28

contributing area comparisons to limit these discrepancies, at low flows differences in gauge

locations could create higher variance.

The distributions indicate that ERA-RAPID and GloFAS results are generally consistent;

however, the degree of consistency varies across regions. Figure 2-10 shows the hydrographs

that compare ERA-RAPID with GloFAS at two stations with different level-of-skill scores. The

first station, Dipayal station in Nepal, shows very good match between two simulations. The

second station, Rocky Reach Dam station in the United States, shows a significant difference

between the two datasets. The vector-based ERA-RAPID outputs a lower peak flow than grid-

based GloFAS. The disparity might due to the difference between these two routing models,

GloFAS considers all processes in the grid-based routing, including overland surface routing,

subsurface storage and routing, while RAPID simplifies the explicit representation of these

processes and only focuses on the reach-scale streamflow response. The disparity could also be

because in variations in the location of the comparison points change the reported flow. Overall,

ERA-RAPID results are comparable to GloFAS results, in other words, the higher-resolution

vector-based routing model generated streamflow estimates that are generally consistent with the

results from the GloFAS provides using a gridded-data routing model. I can assume with some

confidence that this high-resolution stream segment forecasts are at least as reliable as the well-

established low-resolution GloFAS results.

Page 39: Advancing the Accessibility, Reusability, and

29

Figure 2-10: Comparison between ERA-RAPID and GloFAS at Dipayal station, Nepal (up) and Rocky Reach Dam station, United States (down). Red lines represent the results of routing ERA-Interim/Land data with RAPID, while black lines represent GloFAS reanalysis results. The left figures show the comparison of 35 years’ streamflow, the right figures show the comparison of daily average streamflow.

2.3.2 Sensitivity of Watershed Resolution

For sensitivity studies, I selected five watersheds across the CONUS, including Meramec

River near Sullivan, MO; East Branch Delaware River at Margaretville, NY; Alsea River near

Tidewater, OR; White River near Fort Apache, AZ; and North Fork Clearwater River near

Canyon Ranger Station, ID (see Figure 2-11). These watersheds are summarized in Table 1.

These watersheds were selected because of the availability of ground-truth flow data; the US

Geological Survey (USGS) provides sufficient historical streamflow observations at these

Page 40: Advancing the Accessibility, Reusability, and

30

locations (more than 35 years) that we can use to evaluate whether the different stream densities,

resulting from different DEM resolutions, affects the simulation performance. All the selected

watersheds have a USGS stream gauge located at an outlet. For comparison, each watershed was

delineated into 3, 7, and around 20 catchment sub-basins for the low, medium, and high

watershed resolutions, respectively (shown as Figure 2-11, detailed information of the sub-basins

sees Table 2-1). For each watershed, I ran the 35-year ERA-Interim/Land reanalysis data (1980-

2014, 80 km resolution, see Section 2.2.3.1 for more dataset information) as forcing at each

watershed resolution and the generated flowrates at the basin mouth.

Figure 2-11: Experimental watersheds in low, medium and high resolutions

Page 41: Advancing the Accessibility, Reusability, and

31

Table 2-1: Catchment Basin Properties

Basin Area [sq. km]

USGS Stream Gauge ID (outlet)

Average Sub-Basin Area [sq. km]

Low

(3 sub-basins)

Medium

(7 sub-basins)

High

(20 sub-basins)

MO 3848 07014500 1283 550 183

NY 422 01413500 141 60 25

OR 857 14306500 286 122 43

AZ 1632 09494000 544 233 71

ID 3356 13340600 1119 479 177

Average 2023 --- 674 289 100

I compared the results for the entire watershed, based on sub-basins delineated at

different resolutions, with each other. Table 2-2 shows the statistical comparison of the 35-year,

model-simulated streamflow at the same watershed outlet generated with different stream

densities. I adopted the following statistical metrics to comprehensively and quantitatively assess

the sensitivity of watershed resolution: KGE, R, and mean absolute average (MAE). Table 2-2

shows that all the comparisons resulted in a high R (average of 0.9849) and KGE (average of

0.9616), and a low MAE (average of 1.605 m3). This small variation among the results from the

different resolution sub-basins demonstrates that the model generates similar streamflow

forecasts using different watershed resolutions. I found that the comparisons between medium

resolution and high-resolution watersheds resulted in a higher R and KGE and lower MAE than

the comparisons of low resolution and medium resolution. This suggests that the simulated

streamflow at the basin mouth changes less as the watershed resolution increases. Essentially, the

watershed resolution has a negligible effect on the simulated streamflow. Since slight differences

exist in the results simulated from different watershed resolutions, we can conclude that the

Page 42: Advancing the Accessibility, Reusability, and

32

prediction performance of the model, while negligible, is slightly affected by the watershed

resolution.

Table 2-2: The Statistical Results of Comparing Streamflow Simulations at Different

Resolutions

Comparison R MAE[m3] KGE

MO: Low vs. Med Res 0.9375 7.3748 0.8985

MO: Med vs. High Res 0.9979 1.2713 0.9948

MO: Low vs. High Res 0.9232 8.0400 0.8928

NY: Low vs. Med Res 0.9831 0.7209 0.8954

NY: Med vs. High Res 0.9999 0.0355 0.9963

NY: Low vs. High Res 0.9813 0.7556 0.8911

OR: Low vs. Med Res 0.9939 0.9949 0.9841

OR: Med vs. High Res 0.9987 0.4330 0.9985

OR: Low vs. High Res 0.9872 1.4190 0.9799

AZ: Low vs. Med Res 0.9943 0.1244 0.9789

AZ: Med vs. High Res 0.9976 0.0844 0.9744

AZ: Low vs. High Res 0.9849 0.2014 0.9522

ID: Low vs. Med Res 0.9982 0.8860 0.9965

ID: Med vs. High Res 0.9996 0.4632 0.9949

ID: Low vs. High Res 0.9964 1.2706 0.9959

Average: Low vs Med 0.9814 2.0202 0.9507

Average: Med vs High 0.9988 0.4575 0.9918

Average: Low vs High 0.9746 2.3373 0.9424

Total Average 0.9849 1.6050 0.9616

Discussion and Conclusions

The motivation of this study was to create and test an enhanced method and develop a

system to route coarse gridded runoff generated from global or national LSMs onto a high-

Page 43: Advancing the Accessibility, Reusability, and

33

resolution vector-based river network and provide flood prediction at local scales. There are

many regions in the world that are hydrologic-data poor and have little or no observed data or

working models to provide the backbone of a regional or national streamflow forecasting system

or FEWS. Leveraging global models can be an efficient way to provide baseline hydrologic

information and supplement whatever other resources are available, but there remains the

question of whether or not the information is useful enough at local scales to be able to make

informed decisions. Managers need flood predictions at specific locations on local stream

networks. I demonstrated the ability to distribute grid-based runoff data for routing on high-

resolution vector-based stream networks that meet this need. This represents a significant step

forward because international development organizations like the World Bank can reinvest

dollars focused on leveraging such hydrologic information into applications and have more

informed decisions by using these tools.

I presented and tested a system, including an innovative downscaling method for

mapping large scale LSM grid cells to high-resolution stream networks, by leveraging the

RAPID vector-based river routing model. I compared routing 35-year ERA-Interim/Land

reanalysis data in RAPID and the results from GloFAS reanalysis data and demonstrated that this

RAPID-based system has comparable performance with GloFAS. However, using results from

this vector-based stream network, we can provide streamflow predictions at much higher

resolutions. This is done with much less computational cost than creating high-resolution gridded

data, as it inherits all the benefits of vector-based routing models. By comparing routing results

using the same forcing data with different river network resolutions of the same watershed, I

found that the river network resolution has a negligible effect on the simulated streamflow using

this system. This means that using this system, including the downscaling method and using

Page 44: Advancing the Accessibility, Reusability, and

34

RAPID model for routing, can produce consistent results regardless of the resolution of

hydrographic datasets the user chooses. This is in major contrast to the performance of grid-

based routing models which is significantly affected by the grid resolution. The system is

developed with open source tools and is available for others to use and extend. Our group have

implemented this system in a “Streamflow Prediction Tool” web app (detailed information sees

Software Availability) that aims to provide global streamflow prediction by mapping gridded

ECMWF runoff forecasts to the global vector-based high-resolution river network. We have

made available the implementation for the South Asia, Africa, South America, and CONUS

regions. We are working to expand this application it to cover the globe.

The primary benefit of this work is to streamline the process of mapping and routing

gridded runoff into high-resolution river network and provide the possibility to simultaneously

produce streamflow forecasts from current existing LSM prediction models. It can provide flood

management support to data-scarce regions by enabling the streamflow estimates at specific

locations using various global or national forecasting systems to generate runoff or forcing data.

This work presents a new method for distributing grid-based runoff to vector stream networks

and serves as a method and guidance for coupling LSMs with other vector-based routing models.

I demonstrated that this mapping process can efficiently convert gridded runoff data to intuitive

and useful river discharge at local scales, provide an easy way to access various global or

national runoff forecasts simultaneously and make informed decisions. I expect that this work

and outcomes can motivate contribution on streamlining different disciplinary models to

maximize current outcomes and facilitate environmental modeling research.

The next steps of this work include introducing approaches to optimize RAPID

parameters, such as flow travel time estimation approach to calculate Muskingum parameter k;

Page 45: Advancing the Accessibility, Reusability, and

35

incorporating data assimilation methods to improve initial states by leveraging observations or

some large-scale LSM modeling systems like GLDAS; incorporating web processing services to

facilitate process interoperability and data processing (Qiao et al. 2019a).

Page 46: Advancing the Accessibility, Reusability, and

36

3 A CONTAINER-BASED APPROACH FOR SHARING COMPLEX

ENVIRONMENTAL MODELS AS WEB SERVICES

Introduction

The complexity of environmental modeling has expanded in parallel with the

technological advances in earth observations and computational resources. Presently, there is a

focus on improving models to represent more details and integrating models from different

disciplines, including atmospheric, hydrologic, ecologic, geomorphic and human processes

(Bandaragoda et al. 2019; Chen et al. 2019; Gao et al. 2019). The increasing model complexity

raises entry barriers. It is challenging for users to master and execute a new model in a short

amount of time, especially when working with models in unfamiliar domains. The model users

may need to expend considerable time acquiring enough knowledge to execute the model in a

correct way. Some models are even difficult to set up as they require many dependencies in

specific versions. Improving the reusability and interoperability of environmental models is

becoming more important and gaining momentum due to increased desire to better understand

and investigate environmental systems (Belete et al. 2017; Voinov and Cerco 2010). Reusability

refers to the ability of a model to remain functional to be used to develop new models or

modeling systems. Interoperability is the ability to work with other models or systems without

special effort by the user.

Page 47: Advancing the Accessibility, Reusability, and

37

To improve the reusability and interoperability of complex models, many studies have

focused on lowering the entry bar of model execution. The primary methods include developing

web-based modeling approaches with simple graphical user interface (Luo et al. 2004; Rajib et

al. 2016), linking heterogenetic models through Application Programmer Interfaces (API)

(Gregersen et al. 2007; Peckham et al. 2013), and publishing models as accessible web services

(Jiang et al. 2017; Xiao et al. 2019; Zhang et al. 2019). All those methods can save considerable

time in organizing hardware and installing software. Compared with the first two methods, the

loosely-coupled, service-oriented approach has a prominent advantage that it enables models

written in different programming languages or running on different platforms communicate over

the Internet through web services (Castronova et al. 2013; Michaelis and Ames 2009). Many

software tools have been developed to assist publishing environmental models or analysis

workflows as web services, such as GeoServer (2013), Deegree (Müller 2007), PyWPS (Becchi

2007), 52°North (2018), Esri ArcGIS, etc. Qiao et al. (2019a) developed a Tethys WPS Server to

publish web app functionality as OGC Web Processing Service to improve the reusability and

interoperability of Python-based functions or models. However, these tools are more designed

for geospatial analysis in specific programming languages. It remains challenging to publish

environmental models in various formats, execution modes, programming languages, and

supported operating systems as web services without extra efforts. Zhang et al. (2019) have

developed a service-oriented wrapper system called OpenGMS Wrapper System (OGMS-WS)

that shares complex models over the Internet by deploying them as web services via a standard

encapsulated method. OGMS-WS also provides a user interface that allows users to search and

run the services, monitor and control the executions (detailed in Section 4.2.1.1). Xiao et al.

Page 48: Advancing the Accessibility, Reusability, and

38

(2019) have demonstrated its flexibility by deploying the Storm Water Management Model

(SWMM) as a web service-oriented system through OGMS-WS.

When publishing a model as a web service through OGMS-WS, the first step is to install

the model on the server where OGMS-WS is deployed. However, deploying models in a new

environment can be a daunting process because some models lack reproducibility and portability.

First, some models are difficult to set up due to required complex infrastructures (network,

storage and computing), and some models are even impossible to install due to the changing

computational environment and dependencies. Third-party resources and dependencies are not

static, and they are updated regularly to fix bugs, add new features, and remove old features. Any

of these changes can make the model unable to execute. Collberg et al. (2014) found that less

than 50% of software could even be successfully installed. Zhao et al. (2012a) found that the

ability and success of re-executing scientific workflows significantly reduces year by year

because of changing dependencies and missing third-party resources. Second, as the processing

power and capacity of servers increases, the need of consolidating various models onto a single

server is becoming more important. It requires the models work well across different server

environments. However, some models have conflicts when installed to the same platform, such

as they both require the same dependency but in different versions (third-party libraries,

programming languages, etc.). Model users have to either fix the conflicts (if the model source

code is available and under control) or implement one model on another platform. Model

isolation management, which refers to isolating each model and preventing the conflicts between

them on the same platform, is not only essential for service publishing tools like OGMS-WS that

require multiple models installed in the same server, but also significant for improving model

reproducibility in general. Many technologic methods have been developed for model isolation

Page 49: Advancing the Accessibility, Reusability, and

39

or process isolation. For running applications that require different operating systems on the

same server, virtual machines (VMs) can capture the operating system and everything running on

it, provide functionality of a physical computer, and be transferred and shared as files. For

running isolated applications with conflicts in the same operating system, containers (e.g.

Docker) can encapsulate an application and all the related dependencies in a virtual container

that can run as a lightweight, standalone, and executable software. Compared to VMs, containers

are more lightweight, portable, and high-performing because they don’t contain an operating

system but share the operating system kernel with the host machine (Boettiger 2014).

In this chapter, I propose an approach for sharing and reusing complex models over the

Internet as web services by leveraging OGMS-WS and Docker. Docker is adopted as the model

isolation tool to encapsulate the model and keep the model reusable and portable. Then the

containerized model is published as accessible web services through OGMS-WS. I choose Linux

operating system for implementation because it is the leading and most commonly used

operating system on servers, but the concept of using containers to publish complex models as

web services can be applied to other operating systems as well. The remainder of this chapter is

organized as follows. First, the system design of this approach is shown in Section 3.2 with the

background introduction on OGMS-WS and Docker. This section also shows the design of an

experimental study on publishing a streamflow prediction system as a web service. In Section

3.3, the system implementation of this approach is presented together with the experimental

study results. Finally, a Discussion and Conclusion section summarizes the benefits of this

approach and outlines opportunities for future research.

Page 50: Advancing the Accessibility, Reusability, and

40

Methods

3.2.1 Approach Design

3.2.1.1 OpenGMS Wrapper System (OGMS-WS)

OpenGMS is a platform for sharing geographic and environmental data and models

among multi-disciplinary users to solve complex geographic problems and conduct integrated

simulation. OGMS-WS is a tool in OpenGMS platform for publishing and sharing models over

the Internet as web services. It provides a standard encapsulated method to deploy models as

RESTful web services and a friendly user interface that allows users to interact with model

services, including search, execute, monitor and control.

To deploy a model as a web service through OGMS-WS (shown as Figure 3-1), a Model

Description Language (MDL) file is required to standardize the model communication with the

wrapper system. The MDL file (shown in Table 3-1) describes the model following an

encapsulation standard, which summarizes heterogeneous models in three interfaces: model

description, model execution, and model deployment. The MDL file provides all the essential

information to encapsulate and deploy the model. In addition, it is used to map the model to a

graphical user interface (GUI) that allows users to manually interact with the model service and a

RESTful API that can be used by third-party applications or clients. A detailed instruction on

documenting the MDL file can be found in Xiao et al. (2019). Then, a model encapsulation file is

required to encapsulate the model. The model encapsulation file defines the interaction between

users and the model service, including accepting requests from users and extracting model

inputs, invoking and executing the model, and returning model outputs as responses to users

when completed.

Page 51: Advancing the Accessibility, Reusability, and

41

Table 3-1 Content of the Model Description Language (MDL) File

Interface Node Information

Model description AttributeSet name, version, abstract, keywords, category

Model execution Behavior inputs, outputs, data formats

Model deployment Runtime entry function, hardware configuration, software dependencies

Figure 3-1: OGMS-WS workflow diagram

3.2.1.2 Docker

Docker is an open source project designed to develop, deploy, and run applications by

using isolated containers (Merkel 2014). Containers allow users to bundle an application with all

of the parts it needs as one package. Containers are more lightweight than virtual machines

because they share the operating system kernel with the host machine. A typical desktop

computer could run no more than a few virtual machines at once but would have no trouble

running 100 Docker containers (Boettiger 2014). Docker was initially designed to package

Page 52: Advancing the Accessibility, Reusability, and

42

applications in Linux systems. Docker images (a container is an instance of an image) share the

Linux kernel of the host machine, which means that Docker images must be based on Linux

system with Linux-compatible software. But Docker Linux containers can be installed and run

locally not only on Linux, but also on platforms that are not based on the Linux Kernel (macOS

and Windows), this is accomplished through the use of a small VirtualBox-based VM running on

the host OS. In recent years, Docker released Windows containers for packaging Windows

applications. Docker Windows containers share the Windows kernel with the host machine and

currently only support deploying in Windows 10 system.

There are many advantages of using Docker in environmental modeling. First, Docker

makes models reusable by providing isolated images in which all the software and dependencies

are already installed, configured and tested. Second, Docker supports most major platforms

(Linux, Windows and macOS). With Docker, models can be deployed easily across platforms. In

this way, model developers can focus on the model design and coding without worrying the

platform on which the model will ultimately run, while users can retain their familiar platform

without considering the compatibility between the model and system. Third, a Docker image is

created through reading a Dockerfile, which is a simple script that defines all the commands,

necessary dependencies with detailed versions, and the OS to assemble the image. The

Dockerfile is a small plain text file that can be easily shared. Docker also provides a public

repository (Docker Hub, https://hub.docker.com/) for publishing and sharing docker images. All

of these Docker capabilities significantly improve model sharing and versioning. Last but not

least, Docker allows users to link any directory on the host OS to the running Docker container.

This allows users to directly use data saved on the host OS and rely on familiar tools and

environments for data collecting, preparing and editing, while still executing code inside the

Page 53: Advancing the Accessibility, Reusability, and

43

controlled development environment of the container. This avoids data transferring across

different platforms.

3.2.1.3 OGMS-WS-Docker Approach

Figure 3-2 shows the design of the OGMS-WS-Docker approach. On the OGMS-WS

server, each model is a deployed as a Docker container in which the model is installed,

configured and tested.

Figure 3-2: The OGMS-WS-Docker approach

The metadata of the model is defined in the MDL file with setting the encapsulation file

as the entry point. The encapsulation file first extract inputs from users’ requests and save them

on the server. The bind mounts method is used to transfer input files and output files between the

server and the container. Then the model can be executed through simple Docker command line

Page 54: Advancing the Accessibility, Reusability, and

44

“docker run”. This approach ensures each model be isolated from one another. The models can

be deployed on all the operating systems that OGMS-WS supports (Windows, Linux, and

macOS) and remain executable no matter how much the sever environment changes.

3.2.2 Experimental Design

This section presents the experimental design of a case study that publishes a streamflow

prediction system as a web service through the OGMS-WS-Docker approach. The aim of this

case study is to demonstrate that this approach can lower the barrier to publishing complex

modeling systems as web services.

3.2.2.1 Streamflow Prediction System

We developed an automated streamflow prediction system to map the runoff generated

from large-scale grid-based LSMs onto high-resolution vector-based stream networks, then route

the results using a vector-based river routing model to provide streamflow forecasts at the river

level (Qiao et al. 2019b). The main benefit of this streamflow prediction system is providing

flood management support to data-scarce regions by enabling the streamflow estimates at

specific locations using global climate forecasts. It has been demonstrated that this system has

comparable accuracy with the well-established GloFAS but provides streamflow predictions on

significantly higher resolution stream networks with much less computational cost and as a data

service. Figure 3-3 shows the workflow of the system. It adopts the RAPID model for river

routing to simulate the water flow through the river network (David et al. 2011b). RAPID is

developed in FORTRAN and compiled as a dynamic link library (DLL). RAPIDpy, an open

source Python-based software tool, is included for preparing inputs in the required formats and

running RAPID (Snow et al. 2017). This system currently supports multiple large-scale LSMs,

Page 55: Advancing the Accessibility, Reusability, and

45

including ERA-Interim (Dee et al. 2011), NLDAS (Xia et al. 2012), GLDAS (Lorenz et al.

2015), ERA5 (Karl Hennermann 2018), HIWAT (Gatlin et al. 2018), and COSMO (Rockel et al.

2008).

Figure 3-3: Workflow of the Streamflow Prediction System (grey boxes are inputs)

3.2.2.2 Streamflow Prediction Service Design

I met some difficulties when implementing the streamflow prediction system directly

through OGMS-WS. First, the RAPID model only supports the Ubuntu 14.04 system and

Page 56: Advancing the Accessibility, Reusability, and

46

requires a number of dependencies in specific versions to be compiled. Any difference would

result in failure to compile the model. It is a daunting process to deploy it in a new platform.

Second, RAPIDpy is recommended to be installed through Conda instead of manually through

the source code because it requires a lot of dependencies in specific versions. Conda is a package

manager that allows users to create an isolated environment for a package with its necessary

dependencies in specific versions. However, when I install RAPIDpy through Conda and run it

through the wrapper system. It can’t work because the wrapper system uses the Python

environment on the host machine by default, while RAPIDpy requires to use the Python

environment in the Conda environment. In this chapter, the streamflow prediction system is

published as a web service through the OGMS-WS-Docker approach. In this way, users can

directly use the service without having to set up the streamflow prediction system locally.

For each service, OGMS-WS creates a GUI that consists of a brief description of the

service and the interactions between users and the service, including interfaces to upload inputs,

buttons to run and cancel service, buttons to download and visualize outputs, and execution

monitoring. Figure 3-4 shows the interactions between user, service GUI, OGMS-WS, and

Docker container throughout the service execution process. First, the user is asked to provide all

the required input data. The service becomes executable when all the required input files are

provided. After clicking on the “Run service” button, a service execute request with all the input

data is sent to OGMS-WS. All the services in OGMS-MS support asynchronous execution by

default, so the service GUI receives a response immediately after submitting an execute request.

The first response confirms that the request is received and accepted by the server, a processing

job has been created and will be run. After verifying all the required input data, the processing

job begins. A Docker container is created based on the Docker image of the streamflow

Page 57: Advancing the Accessibility, Reusability, and

47

predication system. The system in the container is launched and executed with the input data

saved on the server. The service GUI receives the execution response by repeatedly checking the

execution status until the processing job has completed. After that, the execution status on the

service GUI changes to “succeed” and the user can download the result by clicking on the

activated output download button.

Figure 3-4: Interactions between user, service interface, OGMS-WS, and the model container

Based on above process, the first step to implement the streamflow prediction service in

OGMS-WS is to dockerize the streamflow prediction system. Next, I need to design the service,

Page 58: Advancing the Accessibility, Reusability, and

48

including inputs, outputs, and the encapsulation function, and encode the MDL file and the

encapsulation file accordingly. In the end, I need to package all the required files (see Figure 3-5)

and deploy it through OGMS-WS. The design of service variables and the encapsulation function

are described as follows.

(1) Service variables design

The entry point function of the streamflow prediction system has three required

parameters: (1) full path of the LSM surface and subsurface runoff forcing files (in NetCDF

format); (2) full path of the RAPID model input files, including a file documenting the river

network's topological connectivity, two files defining the Muskingum routing parameters (k and

x), a weight table describes the area ratio of each LSM grid cell to the intersected catchments that

are delineated upon the upstream point of each river reach and a model initialization file

describing the streamflow of each river at the start time of the simulation; (3) full path where to

save the simulation result. Consequently, I set three input variables and one output for the service

as Table 3-2.

Table 3-2: Variable Design of the Streamflow Prediction Service

Service variable Data Format

Input 1: lsm_data LSM Runoff NetCDF files Zip file

Input 2: rapid_io_files RAPID input files

River connectivity file

Weight table

Muskingum k file

Muskingum x file

Initialization file (optional)

Zip file

Input 3: python_file A Python script to launch the system Python file

Output: streamflow Simulated streamflow result NetCDF

Page 59: Advancing the Accessibility, Reusability, and

49

(2) Encapsulation function design

The functionalities of the encapsulation file include extracting model inputs from a

request, executing the model, and returning outputs as a response when the simulation is

completed. OGMS-WS provides an encapsulation template file that contains scripts for the

interactions between users and the service. Service developers are responsible for encoding the

model execution steps. For the streamflow prediction service, it includes the following steps to

execute the model: 1) build a Docker container from the Docker image of the streamflow

prediction system, 2) mount the folder with user input files to the Docker container, and 3) run

the model. According to Docker Command Line Interfaces, a command line like the following

can mount a folder to a Docker container and execute the model file in it.

docker run --name [container name] -w [path to the model file in container] -

v [source folder]:[target folder in container] [Docker image] [command to run

the model file]

On the OGMS-WS server, each service instance creates a separate folder. In each service

instance folder, each input variable creates a separate folder that holds the corresponding

uploaded input files. In this way, the service instance folder can be mounted into the Docker

container and the model can be executed by running the mounted file. The command line is:

docker run --name rapidpy -w /home/python_file -v ~/mainProcess:/home

xhqiao89/rapidpy:1.0 python run_rapid.py

In the end, I packaged all the required files following a standard structure shown in

Figure 3-5 and deployed it as a web service through the OGMS-WS user interface.

Page 60: Advancing the Accessibility, Reusability, and

50

Figure 3-5: The streamflow prediction service deployment package

3.2.2.3 Service Application in Bangladesh Historical Streamflow Analysis

This section aims to demonstrate the benefits of publishing complex models as web

services by using the streamflow prediction service to solve a hydrological research problem.

Bangladesh is regularly devastated by flooding due to its low sea level and large rivers. Several

catastrophic floods happened during the past 30 years. Over 75% of the total area of the country

was flooded in the 1998’s flood. However, it is difficult to build an effective flood early warning

system to manage floods for Bangladesh because lacking hydrologic data, technology and human

resources. Long term coherent hydrologic data is essential for building an effective flood early

warning system, such as calculating flood stages and return periods for different flooding levels,

investigating flooding process, model calibration, and assisting planning and decision making for

future extreme floods (Lee et al. 2017). For data-scarce countries like Bangladesh, that lack

sufficient equipment, technology, and human resources to maintain a stable hydrologic data

Page 61: Advancing the Accessibility, Reusability, and

51

collection and management system. In this situation, leveraging satellite data and global models

can be an efficient way to obtain baseline hydrologic information and supplement whatever other

resources are available.

ERA-Interim/Land is a global land surface reanalysis dataset with parameterization-

improved European Center for Medium-range Weather Forecasts (ECMWF) model driven by

meteorological forcing from the ERA-Interim atmospheric reanalysis and observed precipitation

adjustments (Balsamo et al. 2009; E. L. Wipfler 2011; G. Balsamo 2015). ERA-Interim/Land

provides integrated and coherent daily land surface estimates from 1980/01/01 to 2014/12/31. It

has been widely used in hydrological research to provide hydrologic baseline information for

data-scarce regions and used as the initialization of weather prediction models (Clement Albergel

2018; Dee et al. 2011; G. Balsamo 2015). However, the low resolution (80 km) of the ERA-

Interim/Land dataset makes it difficult to provide usable streamflow information that people

would expect to obtain at specific locations on local stream networks instead of the total runoff

volume per grid cell (6400 km2).

In this chapter, I used the streamflow prediction service to convert the low-resolution

ERA Interim/Land runoff reanalysis data into streamflow at local rivers to provide historical

streamflow estimates for Bangladesh. I ran the service with 20 years (1995-2014) ERA-Interim

runoff reanalysis data to investigate the streamflow levels and changes in the main streams of

Bangladesh during this period.

Page 62: Advancing the Accessibility, Reusability, and

52

Results

3.3.1 Streamflow Prediction Service

I have successfully deployed the streamflow prediction model in OGMS-WS and

published it as a web service

(http://cosmo.byu.edu:8060/modelser/preparation/5deef20a230f6b6e9bbae443). The service

requires three inputs: a zip file of the LSM runoff NetCDF files, a zip file of the RAPID input

files, and a “run_rapid.py” Python file to run the model. It returns a NetCDF file with the

simulated streamflow in each river segment. The service can be consumed through either the

service GUI or the service API.

OGMS-WS (see Figure 3-6a) provides multiple modules for different functions. Through

the home page or the left navigation bar, users can check the available local services hosted in

the current OGMS-WS. Users can deploy their own models through the “Deployment” module.

After invoking a service, users can check its execution status in the “Instances” module. When

the service execution is completed, users can check the service configuration, log info, and

results of historical services executions in the “Records” module. Users can check the historical

uploaded data in the “Data cache” module. OGMS-WS also provides other user services like

notifications, system information, linking remote services, etc. After the streamflow prediction

system was deployed in OGMS-WS, the service is listed in the local services list showing its

name, version, type, status, permission, accessibility and allowed operation options (as Figure 3-

6b). Users can click on the “invoking” button to enter the service GUI (as Figure 3-7). The

service GUI provides a brief introduction on the service and all the interactions required to

execute the service, including buttons and popup windows to upload input data files, run and

Page 63: Advancing the Accessibility, Reusability, and

53

cancel the service (as Figure 3-7b). The GUI also shows the service execution status. After the

execution is successfully completed, users can directly download the output through the

download button (as Figure 3-7c).

(a)

(b)

Figure 3-6: Continued OGMS-WS GUIs

Page 64: Advancing the Accessibility, Reusability, and

54

(a)

(b)

(c)

Figure 3-7: Streamflow prediction service GUIs

Page 65: Advancing the Accessibility, Reusability, and

55

Another method to consume the service is through API. The services hosted in OGMS-

WS are RESTful web services. They support all the CRUD (create, retrieve, update, delete)

operations and can be invoked by the HTTP methods (HTTP GET, POST, DELETE, and PUT)

and URL-based requests that RESTful APIs support. The HTTP GET method with a URL

composed of a series of request parameters and input data is commonly used to retrieve

information, send simple requests or requests with simple inputs. To exeute time-consuming

services or services with complex inputs, it is recommanded to use the service client Software

Development Kit (SDK) developed by the OpenGMS group

(https://gitee.com/geomodeling/NJModelServiceSDK).

3.3.2 Bangladesh Historical Streamflow Analysis

I ran the streamflow predictive service for Bangladesh with the ERA-Interim/Land runoff

reanalysis data from 1995/01/01 to 2014/12/31. The OpenGMS service client SDK was used to

interact with the service, including sending requests with input files to the service and receiving

responses with outputs from the service. Through the service, I obtained 20 years daily simulated

streamflow in the high-resolution river network of Bangladesh (shown as the blue lines in Figure

3-8). I selected 4 reach segments (shown as the red circles in Figure 3-8) located in four major

rivers to investigate the streamflow level changes from 1995 to 2014. Reach A locates at the

Ganges river and reach B belongs to the Jamuna river. The Ganges river ang Jamuna river join

together to form the Padma river, where reach C locates. Reach D is the last segment of the

Meghna river before joining with the Padma river. The Meghna river is the longest river in

Bangladesh.

Page 66: Advancing the Accessibility, Reusability, and

56

Figure 3-8: High resolution river network in Bangladesh

Table 3-3: Selected Reach Segments

Reach COMID Located river

A 281577 Ganges

B 273300 Jamuna

C 288491 Padma

D 292364 Meghna

Figure 3-9 shows the boxplots of their monthly average streamflow in these 20 years.

From these results, I found that the rivers in Bangladesh have clear flood and dry seasons every

year. The streamflow varies a lot through the year and the differences between flood season and

Page 67: Advancing the Accessibility, Reusability, and

57

day season can be enoumrous. The river flood season is mainly from June to October. The

streamflow increases significantly and remians high from July to September.

Figure 3-9: Streamflow monthly distribution from 1995 to 2014

Figure 3-10 shows the simulated streamflow of the selected four reach segments from

1995 to 2014. From them, I found that there are a lot of flash floods happened during the flood

season with the streamflow increased rapidly from a couple of hundreds to tens of tousands cubic

meters per second in a few days and then quickly fell down. Flash flood is the most destructive

disaster in Bangladesh as it happens with little or no advance warning for preparation and

evacuation, and it also carries more and coarser sediment than normal floods. In Figure 3-10, I

found significant high streamflow around 1995, 1998, 2004 and 2005, which is consistant with

Page 68: Advancing the Accessibility, Reusability, and

58

the catastrophic flood records of Bangladesh (Wikipedia 2019). Above all, global datasets are

certainly not as accurate as observations. But for hydrologic data poor regions like Bangladesh,

they can quickly convert global datasets into streamflow at rivers by using the service and obtain

some baseline hydrologic information, hence assisting flood management and research.

Figure 3-10: Simulated streamflow of selected reach segments

Page 69: Advancing the Accessibility, Reusability, and

59

Discussion & Conclusion

The motivation of this section is to create an approach to improve publishing complex

environmental models as web services, thereby facilitating model sharing and reuse. Complex

environmental models always have high entry barriers and require complex infrastructure and

dependencies. A lot of research focuses on developing tools for publishing web services,

however; challenges still exist when publishing complex models as web services, such as the

high complexity and low reproducibility of the models, platform compatibility, dependencies

conflicts, and so on. Some models have conflicts with each other when deployed on the same

platform, while some models can’t even be installed or executed due to the changing

environment and dependencies. I designed and presented an approach to publish complex

environmental models as web services by leveraging OGMS-WS and Docker. OGMS-WS was

chosen as the service publishing tool because it is lightweight, low entry bar, and the success in

working with complex geospatial models. But OGMS-WS lacks isolation management of

deployed models, which makes it difficult or impossible to deploy incompatible models on the

same platform. Docker was included as the model isolation tool to resolve conflicts and keep

each model reusable and portable. I published a complex streamflow prediction system as a web

service using this approach and used the service to generate 20 years (1995-2014) high-

resolution streamflow data in Bangladesh from the ERA-interim/Land global land surface

reanalysis dataset to analyze the historical flood trends in Bangladesh.

Based on the experimental case study and our experience, the key outcomes of this

approach for environmental modeling include:

1) Further lowering the barrier to publishing complex environmental models or modeling

systems as long-term functional web services. It has been demonstrated that OGMS-WS can

Page 70: Advancing the Accessibility, Reusability, and

60

significantly lower the barrier to publishing complex environmental models as web services

by providing task management, asynchronous execution, data storage, etc. However, there

still exist two main challenges to implementing web services through OGMS-WS. First,

some models may be difficult or even impossible to be installed and executed in a new and

different environment. Second, it is still a daunting process for users to prepare the model

deployment package for OGMS-WS, including writing an encapsulation function to invoke

the model and packaging all required resources in the correct way. This approach allows

users to set up models in Docker containers with their familiar environments. The

encapsulation function is also unified to Docker command lines instead of scripts to execute

the model, which is much simpler and more uniform.

2) Avoid deployment conflicts. This approach keeps each model isolated to avoid the potential

issues such as dependency or environment conflicts with OGMS-WS or other existing

models. For example, different versions of the same model can be deployed on the same

computer platform as separate services.

3) Cross-platform. I deployed this approach in the Linux system as an example to demonstrate

its utility, it can also be used in Windows and macOS because both OGMS-WS and Docker

are cross-platform software.

4) Facilitate model sharing and reuse. Once a model is published as web service, it can be

directly requested and executed through URLs, which facilitates a model to be accessed

through the Internet without being downloaded and saves users time and effort in model

installation and configuration. The services can be directly integrated with online data

services to perform continuous analysis. In addition, web services provide a standard method

for models written in different programming languages or running on different computer

Page 71: Advancing the Accessibility, Reusability, and

61

platforms to communicate with each other over the Internet, so models can be easily and

flexibly integrated to complex environmental modeling systems through web services.

In addition, this work has demonstrated the benefits of using containers for model

isolation management to avoid the potential conflicts in the service publishing process. This

work can also serve as guidance for applying container on other service publishing systems, and

as a demonstration for using containers for model isolation on servers. I expect that this work

and outcomes can motivate contribution on complex environmental model web services to

maximize current outcomes and facilitate environmental modeling research. I have provided only

a streamflow prediction service case study as a proof of the benefits of publishing complex

environmental models as web services using this approach. The next step of this work is to

implement more web services for complex models to fully demonstrate the utility and understand

the limitations of this approach.

Page 72: Advancing the Accessibility, Reusability, and

62

4 SIMPLIFYING THE DEPLOYMENT OF OGC WEB PROCESSING SERVICES

(WPS) FOR ENVIRONMENTAL MODELING – INTRODUCING TETHYS WPS

SERVER

Introduction

Environmental modeling can be extremely complex because it includes dynamic physical,

chemical, biological, and human processes that can vary at different spatial and temporal scales.

An accurate environmental modeling system often requires integrating multiple models, data

sources, and analysis routines into a workflow to address a research question, and requires some

level of cooperation among experts with different backgrounds. The concept of chaining

interoperable model components is gaining momentum for modeling because such a chain can

potentially answer more questions than the individual models alone (Castronova et al. 2013;

Dubois et al. 2013). Generally, we can achieve model interoperability by sharing input and

output files, directly rewriting models into a single software system, or establishing software

architecture principles that facilitate the coupling of independent models (Belete et al. 2017;

Granell et al. 2010). In the model coupling approach, models are written in a modular way in

which each model performs an isolated task while the whole workflow addresses a much broader

problem. This approach enables each model to remain as flexible, extensible, and reusable as

possible (Argent et al. 2006; Castronova et al. 2013; Schaeffer 2008). However, integrating

multidisciplinary heterogeneous models still involves barriers such as different programming

Page 73: Advancing the Accessibility, Reusability, and

63

languages, operating platforms, and user interfaces. It is also difficult for researchers to

understand models from other disciplines (Jiang et al. 2017; Yue et al. 2016). Hence, setting up a

system to make heterogeneous models easily connected remains a challenge.

Service-Oriented Architecture (SOA) has been widely used to integrate models and build

scientific workflows (Bosin et al. 2011; Lin et al. 2009; Yue et al. 2016; Zhao et al. 2012b). This

standards-based, loosely-coupled, and service-oriented approach emphasizes the decomposition

of a system into functional components that communicate via web services (Castronova and

Goodall 2010). Web services enable communication and interoperation among various

applications over the network by using open standards and protocols, which means applications

written in different programming languages or running on different platforms can seamlessly

exchange data over the Internet through web services (Michaelis and Ames 2009; Nativi et al.

2013). The model developer can therefore focus more on how to structure the internal algorithms

(Goodall et al. 2011). It has been shown that SOA-based applications can address the issues of

data accessibility and service interoperability for environmental models (Granell et al. 2010;

Huang et al. 2011; Mason et al. 2014; Zhao et al. 2012b). Also, it has been demonstrated that

SOA has significantly improved dealing with complex problems by making data and models

available on the Internet as interoperable services (Dubois et al. 2013; Nino-Ruiz et al. 2013;

Skøien et al. 2013; Yang et al. 2009). While numerous environmental web services exist for

water data distribution [e.g. CUAHSI HIS (Maidment et al. 2009), HydroShare (Horsburgh et al.

2015)] and for web mapping (Blower et al. 2013; Palomo et al. 2017), environmental models as

web services is still an area of research that has not been widely investigated and implemented

(Belete et al. 2017; Hu et al. 2015; Vitolo et al. 2012; Vitolo et al. 2015).

Page 74: Advancing the Accessibility, Reusability, and

64

The Open Geospatial Consortium (OGC) has promulgated the Web Processing Service

(WPS) specification, which has been demonstrated as an efficient technology for publishing

geospatial processes and constructing integrated model chains (Castronova et al. 2013; Schaeffer

2008; Tan et al. 2015). OGC is the foremost organization guiding the development of standards

and specifications for geospatial services, and it manages a global consensus process that results

in approved interface and encoding standards that enable interoperability among and between

diverse applications. OGC WPS is a commonly recognized specification for publishing processes

as web services. It specifies a standard method for processing data as a request from client to

server, using eXtensible Markup Language (XML) for communication through the Internet

(Schut and Whiteside 2007). OGC WPS also standardizes the interoperation and communication

between WPS, which provides flexibility to create workflows for different purposes. It has also

enabled openly sharing models over the Internet to address the needs for complex simulations

that involve diverse data and algorithms (Kumar and Saran 2014).

Many recent studies have demonstrated orchestrating OGC WPS into environmental

workflows. Feng et al. (2011) have developed five ecosystem disciplinary models that comply

with the OGC WPS specification, and a Geospatial Model Sharing Interface (GeoMSI) that

provides a service-interactive interface for sharing/accessing models through WPS. A modeling

system that implements a time-driven process simulation of wetland ecosystems has been

established through integrating these five remote services. Vitolo et al. (2012) implemented a

web service-based modeling system for hydrological modeling on the basis of the FUSE

modeling framework (Clark et al. 2008) exposing hydrological processes using an open source

library called PyWPS. Castronova et al. (2013) have built a WPS process of the TOPMODEL

hydrologic model and used it within a client application. They advanced PyWPS to maintain

Page 75: Advancing the Accessibility, Reusability, and

65

state and store data values on the server, in order for the WPS to perform time-dependent

calculations. Kumar and Saran (2014) designed and implemented a WPS process that fetches

data from an OGC Web Coverage Service (WCS) server and performs the Normalized

Difference Vegetative Index (NDVI) calculation. Astsatryan et al. (2015) developed several

vegetation index computation WPS processes using PyWPS, and a user-friendly portal using the

Joomla content management system to enable users to compute either a single vegetation index

or a combination of vegetation indices as a workflow. Some studies also include cloud

computing and parallel computing to improve the performance of complex WPS processes. Tan

et al. (2015) constructed an elastic parallel OGC WPS on a cloud-based cluster to provide high-

performance computation. Astsatryan et al. (2015) designed and applied HPC clusters with an

effective parallelization algorithm to speed up the execution time of the vegetation index

computation WPS.

Compared with web services for water time series data (e.g. CUAHSI WaterOneFlow)

and spatial data (e.g. OGC web mapping and web feature services), the implementation of WPS

for environmental modeling and data analysis is still not particularly common. Indeed, the

existing literature is more heavily weighted towards specific services for particular

environmental workflows rather than general tools to support creating WPS services. One

obvious exception to this observation is the readily available capabilities of the commercial

Environmental Systems Research Institute (Esri) ArcGIS Server software, which can easily

publish data analysis and modeling services. ArcGIS Server uses a proprietary format for

services consumed by the ArcGIS desktop applications, and it can also serve WPS services to be

consumed in web applications and other desktop software (Esri 2018). Notable recent examples

Page 76: Advancing the Accessibility, Reusability, and

66

of data processing services developed and deployed using ArcGIS Server include (Leonard et al.

2014; Yue et al. 2015).

I posit that there is much value in exposing existing models and data analysis workflows

via WPS in addition to GUI for optimal flexibility. This work presents the design, development,

and testing of an OGC WPS system on the Tethys Platform – a collection of existing open source

GIS tools wrapped in a Django/Python framework. I elected to use Tethys Platform as the

development environment because it is open source and provides a complete and relatively

uncomplicated environment for web-based environmental modeling applications development.

Tethys platform is intended for the development and deployment of web “apps” that are

characterized as generally single-purpose, lightweight web applications that are independently

developed, and are installed and uninstalled from a web app portal – mirroring the app paradigm

of mobile devices. (Swain et al. 2016) demonstrate that Tethys Platform successfully lowers the

barrier to developing web apps in the environmental domain by providing (1) a suite of software

that meet the spatial and computational capabilities commonly required for environmental

modeling, including tools for geoprocessing of spatial data, distributed computing, database

management, map rendering and visualization; (2) a programmatic means to use each of the

recommended software systems in a single programming language, Python; (3) a reduction to the

web development skills required to develop web apps; and (4) a web-safe mechanism for

deploying the finished web apps that is flexible enough to work on the most common means for

obtaining hardware (Swain et al. 2016).

The research goal of this chapter was to generate an approach of simultaneously exposing

web app functionality as WPS when developing a web app. This approach makes the process of

WPS development almost transparent to developers. In this way, environmental modelers and

Page 77: Advancing the Accessibility, Reusability, and

67

developers can focus on creating web apps, and Tethys Platform takes care of creating the

corresponding WPS services. In addition, exposing apps’ functions as WPS processes can

improve the interoperability and reusability of apps. It also facilitates the implementation of

complex modeling, including chaining apps for integrated modeling.

The remainder of this chapter is organized as follows. The system design and

implementation of Tethys WPS Server is shown in Section 4.2. I also present the design of a

“HydroProspector” hydrological modeling system to demonstrate how Tethys WPS Server

enables and simplifies environmental workflow modeling. The results of this research are

described in Section 4.3. A detailed discussion on the benefits of this research for environmental

modeling and how it successfully facilitates complex environmental modeling web app

development are presented in Section 4.4. Section 4.5 provides conclusions and opportunities for

future work in this area.

Methods

4.2.1 System Design

The fundamental purpose of Tethys WPS Server is to provide a simple method to expose

function(s) of environmental web apps as WPS. This allows environmental modelers and

developers to focus more on web app development and less on the details of the OGC WPS

specification. A Tethys app project is organized using the Model View Controller (MVC)

software architectural pattern (Leff and Rayfield 2001; Swain 2015). The Model is the app’s data

structure. It directly manages the data, logic and rules of the app. The View is the visualization of

the data. The Controller handles user input and performs interactions on the data model or view.

Page 78: Advancing the Accessibility, Reusability, and

68

A Tethys app receives input datasets (the data Model) from users through a GUI, performs

computation in the backend Controller, and returns outputs to the web browser (the View). The

core part of a Tethys app is its controller functions, which are implemented in Python. Similar to

Tethys apps, the mechanism of WPS also includes accepting input parameters from a client,

executing processes in the backend, and returning outputs to the client. Hence, it is feasible to

disseminate Tethys app functions as WPS processes. The primary difference between Tethys

apps and WPS processes is each Tethys app defines its own specific formats of inputs and

outputs, while the inputs and outputs of WPS processes are uniformly defined in XML format.

Each Tethys app typically contains its own data analysis or modeling workflow to

provide a solution for one specified task or several tasks as shown in Figure 4-1. Here we use the

term, “workflow” to indicate the overall functionality of a web app. A workflow is comprised of

one or more computational processes. Such processes can be hosted within the app itself or can

be included as web processing services provided by other apps. For example, the workflow of a

runoff prediction app includes: a “snap” function that moves a user-defined point to the nearest

river; a “watershed delineation” function that generates watershed boundaries based on the

snapped point; and a "runoff calculation” function that calculates runoff within this watershed

area using the latest precipitation forecasts. Each function of this workflow can be part of other

water resources modeling tasks. With Tethys WPS Server, these functions can be exposed as

OGC WPS processes and reused in the workflow of other Tethys apps or other third-party clients

that support the OGC WPS specification.

Page 79: Advancing the Accessibility, Reusability, and

69

Figure 4-1: Tethys WPS Server system design

The general system architecture of Tethys WPS Server and its relationship with Tethys

Platform is shown in Figure 4-2. Tethys Platform consists of three major components: Tethys

Software Suite, Tethys Software Development Kit (SDK), and Tethys Portal. Tethys Software

Suite synthesizes several Free and Open-Source Software (FOSS) projects that are needed for

developing environmental modeling applications, including GeoServer (2013) for spatial datasets

publishing, 52°North (2018) for geoprocessing tools, PostgreSQL (2018) database with PostGIS

(2018) extension for spatial datasets storage, HTCondor (2018) for distributed computing, and a

number of JavaScript libraries like OpenLayers (2018) and HighCharts (2018) to help

visualization. Tethys SDK provides multiple Python APIs for each supporting software element.

This makes it possible to orchestrate the functionality of each software using Python in web app

Page 80: Advancing the Accessibility, Reusability, and

70

scripts. Tethys Portal is a web portal used to host web apps that developed with Tethys Platform

(Christensen 2016; Swain 2015).

Figure 4-2: General system architecture of Tethys WPS Server

In this study, Tethys WPS Server is implemented as a plugin component that has no

effect on other components and functions of Tethys Platform. The purpose of this design is to (1)

leave apps developed based on former versions of Tethys Platform unaffected and (2) make

exposing WPS a flexible option, which means the app can execute normally with or without

Page 81: Advancing the Accessibility, Reusability, and

71

exposing internal functions as WPS processes. After exposing various functions/processes as

WPS, clients are able to send requests to the server to fetch related information about specific

processes or execute processes. Once the WPS server is invoked by a request, it will

automatically accept input data or fetch data from specified databases (generally with URLs),

perform computation with the data, encode the result into XML and send it back to the client.

4.2.2 System Implementation

The first challenge of implementing Tethys WPS Server is to integrate an existing

software product into Tethys Platform to provide server-side implementation of OGC WPS

interface specification. By “server-side implementation”, I refer specifically to implementing a

WPS server on a web server to handle requests from clients and responses back to clients. By

using existing OGC WPS implementation software, we can avoid having to build this

functionality from scratch and can take advantage of the debugging and testing of the software

by others. Of course, this is the self-evident advantage of using any third party module. As

Tethys Platform is Python-based (Jones et al. 2014) and I aim to directly disseminate app

functions as WPS without too much modification, the implementation project is required to

support Python scripting. Doing so can also facilitate code debugging when developing WPS

processes. Table 4-1 lists several popular software packages that support server-side OGC WPS

implementation. GeoServer WPS and Deegree WPS (Müller 2007) are both Java-based

frameworks and only support Java scripting. 52°North WPS is Java-based implementation.

According to its documentation, 52°North supports Python scripting. However, the Python

support in 52°North was originally designed for use within the Esri ArcGIS Python library in the

Windows operating system – limiting cross-platform usage (Boerboom 2013). There is also a

52°North module for exposing ILWIS GIS (2001) functions as WPS services. Unfortunately,

Page 82: Advancing the Accessibility, Reusability, and

72

there does not appear to be support in 52°North for exposing native Python WPS functions on a

Linux server. ArcGIS Server WPS refers to its ModelBuilder module that supports exposing

ArcGIS geoprocessing services as OGC WPS. PyWPS (2009) is a Python-based server side

implementation of WPS, allowing integration, publishing and execution of Python processes via

the WPS standard. It was originally developed for implementing GRASS GIS (2018a) operations

as OGC WPS processes and has shown to be well-suited for integration with Tethys Platform.

Table 4-1: Server-Side Implementation of OGC WPS Specification Standard

Software Code Base WPS Process Development

Copyright

GeoServer WPS Java Java Open source

52°North WPS Java Java, R, Python(Windows) Open source

ArcGIS Server WPS C++ (ArcObjects) Python (ArcTools) Proprietary

Deegree WPS Java Java Open source

PyWPS Python Python Open source

PyWPS version 4 utilizes Werkzeug (2011), an open source Web Server Gateway

Interface (WSGI) utility library, to make the WPS functionalities written in Python be accessible

over the Internet. There are four core classes in PyWPS: Service, Process, WPSRequest, and

WPSResponse. The Service class streamlines the request pre-processing, process execution, and

the response post-processing into a complete Hypertext Transfer Protocol (HTTP) request-

response cycle. A complete HTTP request-response cycle begins with a client sending

a request message to a server; the server handles the request and sends a response message back

to the client. The Service class is the top-level object representing the WPS service; it can run as

a WSGI application. The Process class specifies the attributes attached to a generalized WPS

Page 83: Advancing the Accessibility, Reusability, and

73

process (including metadata terms and a handler function to hold process logic implementation)

and a set of functions to operate on a Process object. Each WPS process accepts a WPSRequest

argument and returns a WPSResponse object. The workflow of PyWPS is shown in Figure 4-3.

When the WPS server is invoked by a HTTP request, PyWPS utilizes Werkzeug to create a

Werkzeug Request object containing metadata about the HTTP request, and then packages it to a

PyWPS WPSRequest object. OGC WPS specifies three operations that can be requested by

clients and performed by a WPS server, including GetCapabilities, DescribeProcess, and Execute

(see Table 2). For “GetCapabilities” and “DescribeProcess” requests, PyWPS directly returns

Werkzeug response objects. For “execute” requests, PyWPS loads the appropriate WPS process

and passes the WPSRequest to the function. The function is responsible for returning a

WPSResponse object, which can then be converted to a Werkzeug response and sent back to the

web server. Tethys Platform is powered by Django framework, of which the primary deployment

platform is WSGI. When a Tethys page is requested, Django creates a Django HttpRequest

object including the request metadata. Then Django passes the HttpRequest to the specified

function and returns a Django HttpResponse object.

Figure 4-3: PyWPS workflow

Page 84: Advancing the Accessibility, Reusability, and

74

Because Tethys Platform and PyWPS are both WSGI-compliant applications, they can be

deployed on the same web server and coexist as two separate web applications running in

parallel with different Python interpreter instances. However, for a seamless integration it is

preferable for the two applications to use the same Python interpreter instance and thereby share

Python-level objects (variables and functions) as one web application. To allow for this seamless

integration, I choose to ‘downgrade’ PyWPS from a WSGI application to a pure Python package.

This was accomplished by removing the Werkzeug decorator Request application from the

callable functions in the Service class and the WPSResponse class. The slightly modified

PyWPS still has the full implementation of WPS functionality that can be consumed as normal

Python classes or functions from within Tethys. In addition, since the interfaces of PyWPS

classes and functions are unchanged, they still only accept Werkzeug Request objects as input

and return Werkzeug Response objects as output. A conversion between Werkzeug-versioned

Request/Response objects and Django-versioned Request/Response objects is required. The

general workflow is depicted in Figure 4-4.

Figure 4-4: Integration of PyWPS in Tethys Platform

Page 85: Advancing the Accessibility, Reusability, and

75

Figure 4-5: WPS access to the algorithm of a Tethys app

After integrating PyWPS with Tethys Platform, the next step was to establish a method to

connect Tethys apps with the WPS server. In PyWPS, each WPS process is defined as a Process

class in a Python script containing metadata configuration and a handler function holding

detailed execution algorithm. The handler method can directly call outside functions as a whole

or part of the WPS process. To support this behavior, I created a WPS definition file (as shown

in Figure 4-5) that is included with the Tethys app source code, containing a PyWPS Process

class definition template. The app’s functions can be called in the handler function to be

disseminated as WPS processes. In PyWPS, all the WPS processes should be located in the same

directory to be imported in the server. To implement this capability, Tethys Platform

automatically collects all the WPS definition files in each app hosted on the server and creates

Page 86: Advancing the Accessibility, Reusability, and

76

symbolic links of them in a location identified in the WPS server configuration. Under this

approach, the WPS definition files located in deployed Tethys apps are imported to the WPS

server. As shown in Figure 4-5, the app’s algorithm can be written in a separate file that can be

used by both the app GUI and the WPS process. The WPS feature is thereby a new route to

access the app’s algorithm and any update to the algorithm will be automatically available in

both the app GUI and the WPS process.

4.2.3 Experimental Case Study Design

This section presents design of use cases to illustrate (1) bi-directional communication

between Tethys apps and Tethys WPS Server, including how the functions of a Tethys app can

be exposed as WPS processes on Tethys WPS Server and how services hosted on it can be

included in another Tethys app as components of the workflow; (2) consuming services hosted

on Tethys WPS Server with third party WPS clients. The aim of these use cases is to test and

demonstrate how Tethys WPS Server can simplify environmental OGC WPS development and

significantly lower the barrier to developing complex web apps.

4.2.3.1 HydroProspector Use Case

Consider “HydroProspector” as a reservior management model that calculates the storage

capacity curve of a reservior to help decision-makers choose the best dam location. The storage

capacity curve, which refers to the relationship between dam height and reservior storage

capacity, is important for dam planners, designers and operators (Issa et al. 2017). To calculate

the storage capacity curve, the first step is delineating a watershed based on a dam location. The

watershed basin defines the maximum reservior size and works as a boundary for latter

calculation. Then the reservoir’s storage capcity are calculated for different dam heights. Finally,

Page 87: Advancing the Accessibility, Reusability, and

77

the storage capacity curve is plotted using different dam heights with corresponding calculated

storage capcity. The HydroProspector model contains a complex geoprocessing workflow that

requires a relatively long processing time (about 2 minutes, see compute time discussion in

Section 4.3.3). Challenges with developing this app include but are not limited to, complex

geoprocessing coding, handling asynchronous execution, and job management. Furthermore, the

watershed delineation process and reservoir storage calculation process are both essential and

commonly-used in environmnetal modeling. Exposing these processes as web services can make

them more easy to be reused in other complex models. To demonstrate this, I developed a

service-oriented Tethys app that performs HydroProspector model in the Dominican Republic

Country (workflow shown as Figure 4-6).

Figure 4-6: HydroProspector app workflow

Instead of coding from scratch, the app consumes two WPS processes hosted on Tethys

WPS Server: watershed delineation WPS and reservoir calculation WPS. These two WPS

processes are respectively exposed from two Tethys apps:Watershed Delineation for Dominican

Page 88: Advancing the Accessibility, Reusability, and

78

Republic App and Reservoir Calculation for Dominican Republic App. Both of these two Tethys

apps are functionally independent with their own graphical user interfaces.

Both of these two WPS processes use GRASS GIS functions for geoprocessing

calculation. GRASS GIS provides GIS functions that can be accessed via Python code (Neteler

et al. 2008). The watershed delineation service (Figure 4-7) accepts coordinates of a point and

returns the snapped point at nearest stream and a watershed in geographic markup language

(GML) format. The service first performs a GRASS GIS function, r.watershed, to generate a set

of maps indicating flow accumulation and flow direction. Then a point indexing function is

performed to link the given outlet to the nearest stream network via a shortest snap function.

Finally, a watershed basin is calculated with the snapped outlet point and the flow direction map.

Figure 4-7: Watershed delineation process diagram

The reservoir calculation service (Figure 4-8) calculates the boundary and volume of a

reservoir, which is generated based on coordinates of a pour point and a designed water level at

the pour point. A maximum boundary polygon is required as input to subset the digital elevation

model (DEM) file to reduce the execution time of raster calculation process. The service uses the

GRASS GIS function, r.lake, to generate a reservoir raster from the dam point. The resulting

Page 89: Advancing the Accessibility, Reusability, and

79

reservoir raster, based on which the reservoir boundary and volume are generated, contains cells

with values representing reservoir depth and NULL for all other cells beyond the reservoir. Both

the output of the watershed delineation service and one input of the reservoir calculation service

are polygons in the GeoJSON format. Having variables in the same format enables the

connection of these two services. Each of the services hosted in these Tethys WPS Server

instances can be executed using the OGC WPS specification.

Figure 4-8: Reservoir calculation process diagram

The HydroProspector app user interface consists of a user interactive map and a chart

(app code flow shown as Figure 4-9). The user first can add a point anywhere on the map with a

mouse click. After selecting a point, the coordinates of the user-clicked point are passed to the

app controller using Asynchronous JavaScript (AJAX). The app controller reads in the

coordinates and sends it as a request to the watershed delineation service on Tethys WPS Server.

As the watershed delineation service execution is finished, the controller extracts the watershed

and snapped point results and passes them to the map. Using AJAX, the map periodically checks

the controller for progress status and adds the watershed and snapped point to the map once the

Page 90: Advancing the Accessibility, Reusability, and

80

processing is finished. Next, the user defines dam height and interval, then clicks the calculate

button. Each gage height with the watershed and snapped point from the first step are passed to

the controller using AJAX . Then the controller submits a request to the reservoir calculation

service on Tethys WPS Server. Once the service execution is finished and the response is

returned to the controller function, the controller function extracts the reservoir boundary and

volume and passes them to the GUI. To sequentially get the proposed reservoir boundary and

volume results, the data with updated gage height is passed until the former processing is

finished. After each AJAX function is completed, the resevoir boundary is added to the map and

the reservoir volume is drawn in the chart. The process is completed when the storage capacity

values for all gage heights are calculated.

Figure 4-9: Interactions between User, Template, Controller, and Tethys WPS Server in the HydroProspector App

Page 91: Advancing the Accessibility, Reusability, and

81

4.2.3.2 WPS Client Use Case

One advantage of Tethys WPS Server is that the WPS services hosted on it can be

consumed not only by Tethys apps, but also by various types of clients set up in accordance with

the OGC WPS implementation specification. The simplest client of a WPS service is a web

browser. The WPS services can be consumed through a web browser using HTTP GET method

with Key Value Pair (KVP) encoded requests. Responses and exceptions are then returned to the

web browser. However, there are also some third-party clients and libraries capable of

consuming OGC WPS services. In the HydroProspector app, I use OWSLib (2017) to interact

with Tethys WPS Server, including sending requests to the server and receiving responses from

services. OWSLib is a WPS client library that provides a common API for accessing and

executing OGC WPS services. I also used an open source desktop GIS software, QGIS (2018b)

as a WPS client to demonstrate that the WPS processes hosted on Tethys WPS Server conform to

the OGC WPS standard and are able to be consumed by other third-party software that supports

OGC WPS.

Results

4.3.1 Tethys WPS Server

I developed an API for exposing Tethys app functions as OGC WPS processes. When a

new Tethys app project is created, a WPS definition file (Figure 4-10) is automatically included

in the app source code. The WPS definition file provides a template that allows app developers

to expeditiously define a WPS process. Each WPS process is defined as a PyWPS Process class,

which contains an initialization function and a handler function. The initialization function

Page 92: Advancing the Accessibility, Reusability, and

82

includes the definition of inputs, outputs, service metadata, and server-based parameters. The

handler function specifies the actual operation of the WPS process. Other outside functions

defined in the Tethys app can be called in the handler method to become a WPS process or part

of a WPS process. This design allows the WPS process to be updated simultaneously with any

modification of the app.

Figure 4-10: WPS definition template file. The green box contains metadata of the service; the blue box contains the definitions of inputs and outputs; the red box contains the actual operation, where can directly call functions in the Tethys app.

The services described in the WPS definition file are not activated automatically. They

must be activated (or deactivated) using command line statements created to help manage WPS

processes on Tethys WPS Server. These include:

• tethys wps list – Lists all published and unpublished WPS processes. This lists

the available published and unpublished processes on the Linux command line with no

further interaction required from the user.

Page 93: Advancing the Accessibility, Reusability, and

83

• tethys wps publish – Publishes selected WPS processes. This interactive

command results in a prompt to the user requesting the name of a specific app and

associated service to be published.

• tethys wps remove – Removes selected WPS processes. This interactive command

results in a prompt to the user requesting the name of the service to be removed.

The benefits of using command lines include simplifying common management tasks

related to WPS and allowing app developers the flexibility to modify WPS processes without

affecting the apps’ existing operation through its graphical user interface.

4.3.1.1 Interaction with Tethys WPS Server

OGC WPS specifies three operations that can be requested by clients and performed by a

WPS server, including GetCapabilities, DescribeProcess, and Execute (Table 4-2). Supported

data formats for inputs and outputs include literal data (character, string, value, date, etc.),

complex data (vector, raster), and bounding box data (bounding box area with coordinate

reference system). WPS servers can be invoked using the HTTP GET method with a KVP

encoded request or the HTTP POST method using XML format for the request. Generally, the

KVP encoding method is accepted for GetCapabilities and DescribeProcess requests, while the

Execute requests often adopt XML encoding. The KVP encoded requests are written as URLs

composed of a series of request parameters and input data. They are commonly used for simple

requests or requests with simple inputs (like literal data). While sending an Execute request that

includes complex data (vector, raster) or bounding box data, the XML coding method is

recommended over KVP. The inputs can be included directly in the request, or they can

Page 94: Advancing the Accessibility, Reusability, and

84

reference web accessible resources. The outputs can be returned in the form of an XML response

document, either embedded within the response document or stored as web accessible resources.

Tethys WPS Server, as an OGC WPS implementation, supports executing processes

either synchronously or asynchronously. For the synchronous execution, a WPS client sends an

execute request to Tethys WPS Server and keeps listening for a response until the process has

completed and returned the result. In this situation, the client and WPS server should keep

connected. For the asynchronous execution, a WPS client receives a response immediately after

sending an execute request to Tethys WPS Server. The response confirms that the request is

received and accepted by the server, a processing job has been created and will be run. The

response contains execution status and a job identifier, with which the client can check the

execution status. It also contains the location, usually as URL, where the execution result can be

found after the processing job has completed. Typically, asynchronous execution is

recommended for WPS processes, as geoprocessing processes usually contain complex and time-

consuming calculations.

Table 4-2: Tethys WPS Server HTTP GET + KVP Request APIs

GetCapabilities /tethys_wps?service=WPS&request=GetCapabilities

Generic WPS service metadata; the list of processes available on the server.

DescribeProcess /tethys_wps?service=WPS&version=1.0.0&request=DescribeProcess&identifier=<identifier>

Full description of a process including input and output parameters.

Execute /tethys_wps?service=WPS&version=1.0.0&request=Execute&identifier=<identifier>&datainputs=<input1=;input2=…>

Execution of a process using supplied inputs, returning outputs as requested

Page 95: Advancing the Accessibility, Reusability, and

85

4.3.2 Case Study Results

4.3.2.1 HydroProspector Use Case

I deployed three Tethys web apps at the website (http://cosmo.byu.edu/apps): Watershed

Delineation for Dominican Republic App, Reservoir Calculation for Dominican Republic App,

and HydroProspector for Dominican Republic App. From the first two apps, I respectively

exposed a watershed delineation WPS and a reservoir calculation WPS on Tethys WPS Server

(details in Table 4-3). These processes were exposed through the Linux command line statements

described previously. The HydroProspector App was developed by coupling these two WPS

processes into the workflow shown in Figure 4-6. OWSLib is used to interact with the WPS

processes on Tethys WPS Server. The HydroProspector app user interface (Figure 4-11) includes

an interactive map to show watershed and reservoir boundaries, along with a chart to show the

storage capacity curve of designed dams. The user is required to (1) click on the map to select an

outlet point for watershed delineation, and then (2) define a dam height and interval. The app

will dynamically draw the storage capacity curve in the chart and show corresponding reservoir

boundaries with different dam heights. When selecting a data point in the chart, the

corresponding reservoir boundary is shown on the map.

The service-oriented approach presented here is particularly suited to complex or time-

consuming workflows where information must persist beyond server-client transactions. For

example, when directly implementing the HydroProspector model in a single web app, the model

execution time often exceeded the typical HTTP request-response timeout setting and failed to

reply to the request made from the client. The browser timeout can be extended long enough to

account for this, or can be disabled completely; however, this can affect the normal execution of

other web browser requests. In addition, interruption or error can be caused by conflicting

Page 96: Advancing the Accessibility, Reusability, and

86

requests by different clients. To overcome these barriers, a job queue system should be included

to implement asynchronous execution and provide real-time monitoring of different requests. A

database or other persistent data store mechanism is also required for the storage of job

information and results to allow users to check the job status and retrieve result. This approach is

complicated and time-consuming, and it requires advanced web development skills. Tethys WPS

Server, as an OGC WPS implementation, supports storing the execution response and outputs as

web-accessible resources that can be retrieved by the client. It also supports asynchronous

execution, which can keep the execution status updated and accessible while the request is being

processed. By using Tethys WPS Server, app developers can focus more on model design rather

than on web development.

Figure 4-11: HydroProspector for Dominican Republic App User Interface

Page 97: Advancing the Accessibility, Reusability, and

87

Table 4-3: Watershed Delineation Process WPS & Reservoir Calculation Process WPS on

Tethys WPS Server

WPS Watershed Delineation Process

Reservoir Calculation Process

Identifier watersheddelineationprocess reservoircalculationprocess

Abstract This process snaps a given outlet to the nearest stream

and performs watershed delineation within the Dominican Republic

country.

This process returns the boundary and storage capacity of a reservoir based on a given pour

point location, an aimed water level, and maximum reservoir

boundary polygon.

Inputs Outlet longitude: outlet_x (float)

Pour point longitude: point_x (float)

Pour point latitude: point_y (float)

Outlet latitude: outlet_y (float)

Water level at pour point: water_level (float)

Maximum reservoir boundary: max_boundary (text/xml)

Outputs Delineated watershed: watershed (text/xml)

Reservoir volume: lake_volume (float)

Snapped point: snappoint (text/xml)

Reservoir boundary: lake_boundary (text/xml)

status_supported True True

store_supported True True

4.3.2.2 Third-party WPS Client Use Case

I choose QGIS as a WPS client to demonstrate that the WPS processes hosted on Tethys

WPS Server can be used by other third-party clients that support OGC WPS specification. QGIS

was chosen because it is open source and is a comprehensive GIS environment. QGIS has a WPS

Python client plug-in, through which QGIS can directly consume WPS services on WPS servers.

The interactions include listing WPS capabilities, sending request to execute a WPS process and

Page 98: Advancing the Accessibility, Reusability, and

88

visualizing outputs in QGIS interface. I successfully connected to Tethys WPS Server, executed

the Watershed Delineation WPS and the Reservoir Calculation WPS through the QGIS WPS

Client, and displayed the results in the QGIS map. This effort proved to be uncomplicated and

self-explanatory with no unusual results or unpredicted outcomes, hence no further discussion of

this test is given here.

4.3.3 Quantitative Analysis

An analysis of programming language composition was performed to measure how

Tethys WPS Server overcomes the hurdle of complex web app development in environmental

modeling. I compared the HydroProspector app with another Tethys app called Storage Capacity

Service (https://github.com/xhqiao89/storage_capacity_service.git), which implements the same

functionality of the HydroProspector model fully natively without using WPS services. I used the

Count Lines of Code (CLOC) Linux package to count the number of code lines, excluding blank

lines, comment lines, and third party libraries included. One aim of Tethys WPS Server is

lowering the barrier to complex environmental web app development.

Table 4-4: Language Composition Comparison

Note: Front end contains HTML, JavaScript and CSS.

Language

HydroProspector App Storage Capacity Service App

Lines Percent Lines Percent

Python 204 29% 603 55%

Front end 494 71% 500 45%

Total 698 100% 1103 100%

Page 99: Advancing the Accessibility, Reusability, and

89

This is evident by the reduced amount of Python used in the HydroProspector app shown

in Table 4-4, 204 lines compared to 603 lines for the Storage Capacity Service app. In addition,

the Storage Capacity Service app adopts Celery (Solem 2007) for asynchronous execution and

task job management, and PostGIS database for job results storage – none of which is required in

the WPS based HydroProspector.

In addition, I selected 10 hydropower dam sites around Dominican Republic country and

calculated the storage capacity curve with the HydroProspector app and the Storage Capacity

Service app respectively. The dams were all 30 meters high and the interval for calculating the

storage capacity curve was set to 10 meters. The total compute time for each dm site was

recorded as Figure 4-12.

Figure 4-12: The comparison of compute time between HydroProspector app and Storage Capacity Service app

Page 100: Advancing the Accessibility, Reusability, and

90

The average compute time of using the HydroProspector app is 147.2 seconds, while

103.8 seconds with the Storage Capacity Service app. The HydroProspector app takes a longer

time because it calls two independent WPS services, which contains more steps than the pure

storage capacity curve calculation process, such as setting up GRASS GIS environment, creating

GRASS GIS projects, input data preprocessing and outputs post-processing.

4.3.4 Qualitative Analysis

As a qualitative assessment of how Tethys WPS Server lowers the barrier of OGC WPS

deployment, I hosted a workshop and collected results from a questionnaire survey focusing on

the ease of operation and the required time to setup WPS processes with Tethys WPS Server. I

had 10 attendees with various knowledge and experience on environmental web app

development participate the workshop and survey (see Figure 4-13).

Figure 4-13: Academic status (left) and environmental web app developing experience (right) of the workshop and survey attendees.

Page 101: Advancing the Accessibility, Reusability, and

91

Figure 4-14: Comparison of difficulty of OGC WPS deployment with/without Tethys WPS Server for survey attendees with different years of web application development experience (Scale of 1 to 10 where 1 is very easy and 10 is very difficult)

Figure 4-15: Time to deploy WPS processes using Tethys WPS Server for survey attendees with different years of web application development experience.

Page 102: Advancing the Accessibility, Reusability, and

92

The survey generally focused on (1) level of difficulty to deploy a WPS service with

Tethys WPS Sever compared with from creating a WPS using other means; (2) duration of time

to convert an existing app function into a WPS process through defining the WPS definition file

and publishing it through command line functions. Based on the survey results, all the attendees

agree that it is easier to publish app functions as WPS using Tethys WPS Server. Figure 4-14

shows the difficulty level (scale 1 to 10 where 1 is very easy and 10 is very difficult) of

deploying WPS with/without using Tethys WPS Server for the attendees with different years of

web application development experience. Based on the different programming level of the

survey attendees, most of them spend less than 15 minutes to convert an existing app function

into a WPS process and less than 3 minutes to publish it through command lines (shown as

Figure 4-15). Approximate 60% of the attendees estimate that this WPS deployment process

takes less than 5% of the total app development time, while the other 40% estimate it may take

5% to 20%. Overall, with Tethys WPS Server, the time of exposing an app function as a WPS

service is negligible.

Discussion

The motivation of this work was to create a ready-to-use tool to expose web app

functions to standard OGC WPS, thereby facilitating the development of web apps containing

complex environmental models. PyWPS was chosen for server-side OGC WPS implementation

because it is Python-based and supports Python scripts. Tethys Platform was chosen as the

development environment because of its demonstrated advantages in developing web-based

environmental apps. It enables those with limited web development skills to develop web-based

environmental apps using a pre-packaged web app development environment, including

geoprocessing tools for spatial data, map rendering and visualization, distributed computing, and

Page 103: Advancing the Accessibility, Reusability, and

93

database management (Jones et al. 2014). An API was developed for exposing Tethys app

functions as WPS processes. A set of command line functions was developed for WPS

management on the server.

Based on the case studies presented here and my own experience with Tethys WPS

Server, the key outcomes of this work for app developers using Tethys platform include:

1) Ease of implementation - It is possible to add WPS processes without needing to

change well-written codes in existing web apps.

2) Easier maintenance - A WPS process is based on and connected with a web app, so

these services can be debugged, tested, and updated together with the web app, enabling simpler

maintenance of the services.

3) Avoidance of code duplication – As OGC WPS can be used across applications with

different programming languages over different computational platforms, WPS processes hosted

on Tethys WPS server can be used within other Tethys apps, or by third-party applications or

clients which support OGC WPS specification.

4) Lowering the barrier to environmental OGC WPS development and deployment –

Tethys Platform has been demonstrated to successfully lower the barrier to developing and

deploying environmental web apps. Tethys WPS Server allows developers to convert existing

app functions into WPS processes, making the WPS generation process almost transparent to

developers. A WPS process is automatically deployed along with the web app from which it is

defined and can be flexibly activated or deleted by command lines. Therefore, the barrier to

developing OGC WPS processes should be lowered.

Page 104: Advancing the Accessibility, Reusability, and

94

5) Facilitating complex environmental web app development - Implementing complex

environmental models to web apps can be a challenging task because complex models are

normally time-consuming processes, which require task management, asynchronous execution,

data storage, and other related issues. As demonstrated in the case study, decomposing the model

to loosely coupled WPS processes can avoid many web development issues, especially

benefiting environmental scientists and engineers with limited programming skills. Moreover, a

WPS process can contain a series of other WPS processes, which is necessary for environmental

modeling as many complex environmental models contain multiple steps. For example, some

models may first decompose the water system into atmosphere, land surface, and subsurface.

Each component may contain a list of more detailed process-level services, such as land surface

including infiltration, evaporation, runoff, and others, and each process can be decomposed to

multiple smaller services.

In addition to creating a Tethys-specific implementation of WPS and demonstrating its

utility, this work also serves as a general methodological experience and guidance for alternative

implementations of WPS, which may not use the Tethys/Django framework. I have shown that

disaggregation of a complex workflow into multiple modular WPS instances can help avoid

overloading a single server and does yield many of the benefits reported by Castronova et al.

(2013) and Goodall et al. (2011), especially enabling models to be flexibly extended and reused

by clients from different systems. Others can learn from and emulate my success in deploying

GIS-enabled web applications that include both a front-end GUI for end users and a back-end

WPS service, which can be activated and deactivated through simple server side commands.

Maintaining the integrity of PyWPS (with only minor changes) as a separate package that is

linked into the Tethys infrastructure, we gain the benefit of being able to readily upgrade to

Page 105: Advancing the Accessibility, Reusability, and

95

newer versions of this library that may become available in the future. In addition, my

experience with the PyWPS library specifically is highly positive and serves as a

recommendation for this library in similar use cases where Python is the primary server-side

technology.

I expect that these observations and outcomes will have broader application for future

environmental web application developers beyond the Tethys user and developer community. I

also note a few challenges that remain with respect to this work. First, it still requires minor

extra-coding to integrate functions of an app into WPS processes. App developers must manually

define WPS processes following the API, which also causes some level of code duplication in

the app, including request and response data processing. Second, the services hosted on Tethys

WPS Server currently are configured to be accessible to anyone without any authorization or

authentication requirements. The third challenge is not a shortcoming of Tethys WPS Server, but

is a shortcoming of WPS in general and is related to moving large datasets between services,

which can cause network latency. One way to minimize this effect is to avoid passing large sizes

of messages when designing services. This means even though Tethys WPS Server provides an

easy environment to develop WPS, there remains a high requirement for developers' ability to

decompose a complex modeling system into a set of representative services, and design services

with a minimum size of input and output data.

Conclusion

In this chapter, I developed an open source platform called Tethys WPS Server using

PyWPS, which can help web app developers expose app functions to standard OGC WPS

processes. A set of command line functions was created for flexible WPS processes

Page 106: Advancing the Accessibility, Reusability, and

96

management, including publishing, listing, and removal. A Tethys web application coupling two

WPS processes was developed to demonstrate the advantages and benefits of Tethys WPS Server

for complex environmental modeling systems. Through Tethys WPS Server, one or more

functionalities in each Tethys app can be exposed as OGC WPS processes through several

simple steps and deployed along with the Tethys app. With the work presented in this chapter,

Tethys WPS Server can lower the barrier to OGC WPS development and deployment and further

improve complex modeling web applications development in the environmental domain.

The next steps for Tethys WPS Server include upgrading it from a Tethys internal

application to a Django application that can be easily coupled with any Django-based system,

handling the problem of security and user authentication, and providing more examples and

guidance on well-designed modularization and decomposition of complex problems to avoid

large-data-passing problems. Furthermore, I have provided here only a HydroProspector web app

to serve as a proof of benefits for using Tethys WPS Server in complex environmental web app

development. More complicated web apps are needed to fully test the pros and cons of Tethys

WPS Server.

Page 107: Advancing the Accessibility, Reusability, and

97

5 CONCLUSIONS

The primary goals of this dissertation include creating an automated and efficient method

to generate global high-resolution flood forecasting and developing software tools to share

geospatial analysis workflows and environmental models as web services, facilitating complex

environmental modeling. There are three research objectives:

(1) create and test an enhanced method to route coarse gridded runoff generated from

global or national LSMs onto a high-resolution vector-based river network and provide flood

prediction at local scales.

(2) develop a tool to facilitate publishing complex environmental models as web services,

thereby facilitating model sharing and reuse.

(3) develop a ready-to-use tool to expose web app functionality as standard OGC WPS,

thereby facilitating the development of web apps containing complex environmental models.

Chapter 2 addressed the first objective and presented the design and development of a

new automated computational system that quickly downscales the runoff generated from coarse

grid-based LSMs onto high-resolution vector-based stream networks by leveraging the RAPID

vector-based river routing model. I compared routing 35-year ERA-Interim/Land reanalysis data

in this system and the results from GloFAS reanalysis data and demonstrated that this system has

comparable performance with GloFAS. The new system has comparable accuracy with GloFAS

Page 108: Advancing the Accessibility, Reusability, and

98

but provides streamflow predictions on higher resolution stream networks with much less

computational cost. I found that the river network resolution has a negligible effect on the

simulated streamflow from the new system. In other words, we can forecast streamflow for very

small stream segments and potentially improve local flood awareness and response much more

successfully than previously possible using readily available climate forcing from LSMs. The

system has been extended to work with multiple LSM models, which allows developing

countries and data-scarce regions to generate streamflow estimations simultaneously from

various LSM products to guide flood management.

Chapter 3 addressed the second objective and presents a tool to improve publishing

complex environmental models as web services. OGMS-WS was chosen as the service

publishing tool because of its lightweight, low entry bar, and the success in working with

complex geospatial models. Docker was included as the model isolation tool to resolve conflicts

and keep each model functional with changing environments. I published the streamflow

prediction system in Chapter 2 as a web service using this tool. Then, the service was used to

generate 20 years (1995-2014) high-resolution streamflow data in Bangladesh to analyze its

historical flood trends. We demonstrated that this tool can significantly lower the barrier to

publishing complex models as long-term functional web services, avoid deployment conflicts,

and facilitate model sharing and reuse. This work also has demonstrated the benefits of using

containers for model isolation management to sidestep the potential conflicts in the service

publishing process. The tool supports different operating systems, including Linux, Windows

and macOS.

Chapter 4 addressed the third objective and presents the development and testing of a

ready-to-use WPS implementation called Tethys WPS Server, which provides a formalized way

Page 109: Advancing the Accessibility, Reusability, and

99

to expose web app functionality as standardized WPS alongside a web app's graphical user

interface. The WPS server is created based on Tethys Platform by leveraging PyWPS. Tethys

Platform was chosen as the development environment because of its demonstrated advantages in

developing web-based environmental apps, especially for users with limited web development

skills. PyWPS was chosen for server-side OGC WPS implementation because it is Python-based

and supports Python scripts. An API was developed for exposing Tethys app functions as WPS

processes. A set of command-line functions was developed for WPS management on the server.

Three Tethys web apps are developed to demonstrate how web app functionality(s) can be

exposed as WPS using Tethys WPS Server, and to show how these WPS can be coupled to build

a complex modeling web app. I demonstrated that the services hosted on Tethys WPS Server

follow OGC standards correctly and can be used successfully by third-party applications and

clients that support the OGC WPS specification. The advantages of this tool include ease of

implementation and maintenance, avoidance of code duplication, lowering the barrier to

environmental OGC WPS development and deployment, and facilitating complex environmental

web app development.

The work in this dissertation contributes to environmental research in several aspects.

First, the work in Chapter 2 has demonstrated the feasibility to map global gridded runoff data on

high-resolution vector-based stream networks, and the credibility to use global LSMs to provide

flood predictions at local stream locations to meet managers’ needs. By leveraging existing long-

term land surface reanalysis data from global models, this approach can efficiently provide

baseline hydrologic information and supplement for regions that are hydrologic-data poor and

have little or no observed data or working models. The system can produce consistent results

regardless of watershed resolution, this also demonstrated the advantages of using vector-based

Page 110: Advancing the Accessibility, Reusability, and

100

river routing model in large-scale streamflow modeling compared with traditional grid-based

models, which is significantly affected by the grid resolution. Moreover, this work represents a

significant step forward in global or large-scale flood forecasting by providing rapid and accurate

streamflow forecasts with low computational costs. International development organizations like

the World Bank can reinvest dollars focused on leveraging such hydrologic information into

applications and making more informed decisions. In sum, this work provides a new pathway to

exploit the ever-increasing runoff products from land surface models. In the era of big data, I

expect that this work can motivate contribution to developing systems to better utilize earth

observations and global modeling products, improving our understanding of environmental

systems.

The works in Chapter 3 and Chapter 4 have provided solutions to web service

deployment respectively for environmental models and analysis workflows in web apps. The

biggest advantage of web services is enabling being used by clients/applications with different

programming languages over different platforms. The OGMS-WS-Docker approach for complex

models overcomes the shortcoming of current exiting service-publishing tools --- model

reproducibility, isolation, and system compatibility. Tethys WPS Server can publish existing web

app functions as WPS processes through simple command lines, then the service can be used

within other Tethys apps, or by third-party applications or clients which support OGC WPS

specification. These two outcomes significantly lower the barrier to publishing environmental

models and analysis workflows as web services by providing simple API/UI, task management,

asynchronous execution, etc. Others can learn from and emulate my success in model isolation

management and WPS implementations. I expect these works and outcomes can motivate

Page 111: Advancing the Accessibility, Reusability, and

101

contribution to environmental web service development to facilitate integrated environmental

modeling in the future.

The future work includes (1) optimizing the streamflow prediction performance by model

calibration, including reservoir simulation, and improving initial states through absorbing

satellite observations; (2) upgrading Tethys WPS Server to a Django application that can be

easily coupled with any Django-based system, handling the problem of security and user

authentication; (3) conducting more cases to fully test the pros and cons of the OGMS-WS-

Docker approach and Tethys WPS Server.

Page 112: Advancing the Accessibility, Reusability, and

102

REFERENCES

52°North, 2018. Home - 52°North Initiative for Geospatial Open Source Software. In. http://52north.org/.

Alfieri, L., P. Burek, E. Dutra, B. Krzeminski, D. Muraro, J. Thielen & F. Pappenberger, 2013. GloFAS - global ensemble streamflow forecasting and flood early warning. Hydrology and Earth System Sciences 17(3):1161 doi:http://dx.doi.org/10.5194/hess-17-1161-2013.

Argent, R. M., A. Voinov, T. Maxwell, S. M. Cuddy, J. M. Rahman, S. Seaton, R. Vertessy & R. D. Braddock, 2006. Comparing modelling frameworks–a workshop approach. Environmental Modelling & Software 21(7):895-910.

Astsatryan, H., A. Hayrapetyan, W. Narsisian, A. Saribekyan, S. Asmaryan, A. Saghatelyan, V. Muradyan, Y. Guigoz, G. Giuliani & N. Ray, 2015. An interoperable web portal for parallel geoprocessing of satellite image vegetation indices. Earth Science Informatics 8(2):453-460.

Atkinson, R., Power, R., Lemon, D. and O’Hagan, R. , 2008. The Australian Hydrological Geospatial Fabric – Development Methodology and Conceptual Architecture Water for a Healthy Country Flagship Report. Australian Government: Bureau of Meteorology.

Bai, P., X. Liu, T. Yang, K. Liang & C. Liu, 2016. Evaluation of streamflow simulation results of land surface models in GLDAS on the Tibetan plateau. Journal of Geophysical Research: Atmospheres 121(20):12,180-12,197 doi:10.1002/2016jd025501.

Balazs M. Fekete, C. J. V., and Richard B. Lammers 2011. Scaling gridded river networks for macroscale hydrology: Development, analysis, and control of error. Water Resources Research 37(7):13.

Balsamo, G., A. Beljaars, K. Scipal, P. Viterbo, B. v. d. Hurk, M. Hirschi & A. K. Betts, 2009. A Revised Hydrology for the ECMWF Model: Verification from Field Site to Terrestrial

Page 113: Advancing the Accessibility, Reusability, and

103

Water Storage and Impact in the Integrated Forecast System. Journal of Hydrometeorology 10(3):623-643 doi:10.1175/2008jhm1068.1.

Bandaragoda, C., A. Castronova, E. Istanbulluoglu, R. Strauch, S. S. Nudurupati, J. Phuong, J. M. Adams, N. M. Gasparini, K. Barnhart, E. W. H. Hutton, D. E. J. Hobley, N. J. Lyons, G. E. Tucker, D. G. Tarboton, R. Idaszak & S. Wang, 2019. Enabling Collaborative Numerical Modeling in Earth Sciences using Knowledge Infrastructure. Environmental Modelling & Software:104424 doi:https://doi.org/10.1016/j.envsoft.2019.03.020.

Basha, E. & D. Rus, Design of early warning flood detection systems for developing countries. In: 2007 International Conference on Information and Communication Technologies and Development, 15-16 Dec. 2007 2007. p 1-10.

Becchi, J. C. a. L., 2007. Geospatial Processing via Internet on Remote Servers - PyWPS. OSGeo Journal 1:5.

Belete, G. F., A. Voinov & G. F. Laniak, 2017. An overview of the model integration process: From pre-integration assessment to testing. Environmental Modelling & Software 87:49-63 doi:https://doi.org/10.1016/j.envsoft.2016.10.013.

Blower, J. D., A. Gemmell, G. H. Griffiths, K. Haines, A. Santokhee & X. Yang, 2013. A Web Map Service implementation for the visualization of multidimensional gridded environmental data. Environmental Modelling & Software 47:218-224.

Boerboom, J., 2013. Implementing the WPS Standard - Case Study for Dissemination of Coastal and Marine Tools. Delft University of Technology.

Boettiger, C., 2014. An introduction to Docker for reproducible research, with examples from the R environment. ACM SIGOPS Operating Systems Review - Special Issue on Repeatability and Sharing of Experimental Artifacts 49(1):9.

Bosin, A., N. Dessì & B. Pes, 2011. Extending the SOA paradigm to e-Science environments. Future Generation Computer Systems 27(1):20-31 doi:https://doi.org/10.1016/j.future.2010.07.003.

Wang, C., R. M. G., Wang, K., Sebastian Gerland, Mats A. Granskog, 2018. Comparison of ERA5 and ERA-Interim near surface air temperature and precipitation over Arctic sea ice: Effects on sea ice thermodynamics and evolution. The Cryosphere Discussion.

Page 114: Advancing the Accessibility, Reusability, and

104

Castronova, A. M. & J. L. Goodall, 2010. A generic approach for developing process-level hydrologic modeling components. Environmental Modelling & Software 25(7):819-825.

Castronova, A. M., J. L. Goodall & M. M. Elag, 2013. Models as web services using the open geospatial consortium (ogc) web processing service (wps) standard. Environmental Modelling & Software 41:72-83.

Chen, M., S. Yue, G. Lü, H. Lin, C. Yang, Y. Wen, T. Hou, D. Xiao & H. Jiang, 2019. Teamwork-oriented integrated modeling method for geo-problem solving. Environmental Modelling & Software 119:111-123 doi:https://doi.org/10.1016/j.envsoft.2019.05.015.

Christensen, S. D., 2016. A Comprehensive Python Toolkit for Harnessing Cloud-Based High-Throughput Computing to Support Hydrologic Modeling Workflows. Brigham Young University.

Clark, M. P., A. G. Slater, D. E. Rupp, R. A. Woods, J. A. Vrugt, H. V. Gupta, T. Wagener & L. E. Hay, 2008. Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models. Water Resources Research 44(12).

Clement Albergel, E. D., Simon Munier, Jean-Christophe Calvet, Joaquin Munoz-Sabater, Patricia de Rosnay, Gianpaolo Balsamo, 2018. ERA-5 and ERA-Interim driven ISBA land surface model simulations: which one performs better? Hydrology & Earth System Sciences 22(6):18.

Collberg, C., T. Proebsting, G. Moraila, A. Shankaran, Z. Shi & A. M. Warren, 2014. Measuring Reproducibility in Computer Systems Research. University of Arizona, http://reproducibility.cs.arizona.edu/tr.pdf, 37.

Cools, J., D. Innocenti & S. O’Brien, 2016. Lessons from flood early warning systems. Environmental Science & Policy 58:117-122 doi:https://doi.org/10.1016/j.envsci.2016.01.006.

David, C. H., J. S. Famiglietti, Z.-L. Yang, F. Habets & D. R. Maidment, 2016. A decade of RAPID—Reflections on the development of an open source geoscience code. Earth and Space Science 3(5):226-244 doi:10.1002/2015EA000142.

David, C. H., F. Habets, D. R. Maidment & Z.-L. Yang, 2011a. RAPID applied to the SIM-France model. Hydrological Processes 25(22):3412-3425 doi:10.1002/hyp.8070.

Page 115: Advancing the Accessibility, Reusability, and

105

David, C. H., D. R. Maidment, G.-Y. Niu, Z.-L. Yang, F. Habets & V. Eijkhout, 2011b. River Network Routing on the NHDPlus Dataset. Journal of Hydrometeorology 12(5):913-934 doi:10.1175/2011jhm1345.1.

David, C. H., Z.-L. Yang & S. Hong, 2013. Regional-scale river flow modeling using off-the-shelf runoff products, thousands of mapped rivers and hundreds of stream flow gauges. Environmental Modelling & Software 42:116-132 doi:https://doi.org/10.1016/j.envsoft.2012.12.011.

Dee, D. P., S. M. Uppala, A. J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. Andrae, M. A. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A. C. M. Beljaars, L. van de Berg, J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A. J. Geer, L. Haimberger, S. B. Healy, H. Hersbach, E. V. Hólm, L. Isaksen, P. Kållberg, M. Köhler, M. Matricardi, A. P. McNally, B. M. Monge-Sanz, J. J. Morcrette, B. K. Park, C. Peubey, P. de Rosnay, C. Tavolato, J. N. Thépaut & F. Vitart, 2011. The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society 137(656):553-597 doi:10.1002/qj.828.

Ding, D., 2016. Esri ArcGIS Python toolbox for runing RAPID. doi:https://github.com/Esri/python-toolbox-for-rapid.git.

Donat, M. G., A. L. Lowry, L. V. Alexander, P. A. O’Gorman & N. Maher, 2016. More extreme precipitation in the world’s dry and wet regions. Nature Climate Change 6:508 doi:10.1038/nclimate2941. https://www.nature.com/articles/nclimate2941#supplementary-information.

Dubois, G., M. Schulz, J. Skøien, L. Bastin & S. Peedell, 2013. eHabitat, a multi-purpose Web Processing Service for ecological modeling. Environmental Modelling & Software 41:123-133.

E. L. Wipfler, K. M., J. C. van Dam, R. A. Feddes, E. van Meijgaard, L. H. van Ulft, B. van den Hurk,S. J. Zwart4,5, and W. G. M. Bastiaanssen, 2011. Seasonal evaluation of the land surface scheme HTESSEL against remote sensing derived energy fluxes of the Transdanubian region in Hungary. Hydrology and Earth System Sciences 15:15 doi:10.5194/hess-15-1257-2011.

Emerton, R. E., E. M. Stephens, F. Pappenberger, T. C. Pagano, A. H. Weerts, A. W. Wood, P. Salamon, J. D. Brown, N. Hjerdt, C. Donnelly, C. A. Baugh & H. L. Cloke, 2016. Continental and global scale flood forecasting systems. Wiley Interdisciplinary Reviews: Water 3(3):391-418 doi:10.1002/wat2.1137.

Page 116: Advancing the Accessibility, Reusability, and

106

Esri, 2018. WPS services - Documentation | ArcGIS Enterprise. In. http://enterprise.arcgis.com/en/server/latest/publish-services/windows/wps-services.htm Accessed 12-03 2018.

European Environment Agency, 2012. European catchments and Rivers network system (Ecrins). https://www.eea.europa.eu/data-and-maps/data/european-catchments-and-rivers-network.

Feng, M., S. Liu, N. H. Euliss, C. Young & D. M. Mushet, 2011. Prototyping an online wetland ecosystem services model using open model sharing standards. Environmental Modelling & Software 26(4):458-468.

G. Balsamo, C. A., A. Beljaars, S. Boussetta, E. Brun, H. Cloke, D. Dee, E. Dutra, J. Muñoz-Sabater,F. Pappenberger, P. de Rosnay, T. Stockdale, F. Vitart, 2015. ERA-Interim/Land: a global land surface reanalysis data set. Hydrology Earth System Sciences 19:18 doi:10.5194/hess-19-389-2015.

Gao, F., P. Yue, C. Zhang & M. Wang, 2019. Coupling components and services for integrated environmental modelling. Environmental Modelling & Software 118:14-22 doi:https://doi.org/10.1016/j.envsoft.2019.04.003.

Gatlin, P. N., J. L. Case, J. Bell, W. A. Petersen & D. Cecil, 2018. Monitoring Intense Thunderstorms in the Hindu-Kush Himalayan Region. NASA Marshall Space Flight Center; Huntsville, AL, United States.

GeoServer, 2013. GeoServer Web Processing Service Home Page. In. http://docs.geoserver.org/2.3.5/user/extensions/wps/index.html.

Gill, M. A., 1978. Flood routing by the Muskingum method. Journal of Hydrology 36(3):353-363 doi:https://doi.org/10.1016/0022-1694(78)90153-1.

Gong, L., S. Halldin & C.-Y. Xu, 2011. Global-scale river routing—an efficient time-delay algorithm based on HydroSHEDS high-resolution hydrography. Hydrological Processes 25(7):1114-1128 doi:10.1002/hyp.7795.

Goodall, J. L., B. F. Robinson & A. M. Castronova, 2011. Modeling water resource systems using a service-oriented computing paradigm. Environmental Modelling & Software 26(5):573-582 doi:https://doi.org/10.1016/j.envsoft.2010.11.013.

Page 117: Advancing the Accessibility, Reusability, and

107

Granell, C., L. Díaz & M. Gould, 2010. Service-oriented applications for environmental models: Reusable geospatial services. Environmental Modelling & Software 25(2):182-198.

GRASS Development Team, 2018a. Geographic Resources Analysis Support System (GRASS) Software. In. Open Source Geospatial Foundation. https://grass.osgeo.org.

Gregersen, J. B., P. J. A. Gijsbers & S. J. P. Westen, 2007. OpenMI: Open modelling interface. Journal of Hydroinformatics 9(3):175-191 doi:10.2166/hydro.2007.023 %J Journal of Hydroinformatics.

Guy J-P. Schumann, P. D. B., Heiko Apel, Giuseppe T. Aronica, 2018. Global Flood Hazard Mapping, Modeling, and Forecasting: Challenges and Perspectives.

Highsoft Data Visualization Company, 2018. Interactive JavaScript charts for your webpage | HighCharts. In. https://www.highcharts.com/ Accessed 12-03 2018.

Hirabayashi, Y., R. Mahendran, S. Koirala, L. Konoshima, D. Yamazaki, S. Watanabe, H. Kim & S. Kanae, 2013. Global flood risk under climate change. Nature Climate Change 3(9):816-821 doi:10.1038/nclimate1911.

Hirpa, F. A., P. Salamon, H. E. Beck, V. Lorini, L. Alfieri, E. Zsoter & S. J. Dadson, 2018. Calibration of the Global Flood Awareness System (GloFAS) using daily streamflow data. Journal of Hydrology 566:595-606 doi:10.1016/j.jhydrol.2018.09.052.

Horsburgh, J. S., M. M. Morsy, A. M. Castronova, J. L. Goodall, T. Gan, H. Yi, M. J. Stealey & D. G. Tarboton, 2015. Hydroshare: Sharing diverse environmental data types and models as social objects with application to the hydrology domain. JAWRA Journal of the American Water Resources Association.

Hu, Y., X. Cai & B. DuPont, 2015. Design of a web-based application of the coupled multi-agent system model and environmental model for watershed management analysis using Hadoop. Environmental Modelling & Software 70:149-162 doi:https://doi.org/10.1016/j.envsoft.2015.04.011.

Huang, M., D. R. Maidment & Y. Tian, 2011. Using SOA and RIAs for water data discovery and retrieval. Environmental Modelling & Software 26(11):1309-1324 doi:https://doi.org/10.1016/j.envsoft.2011.05.008.

Page 118: Advancing the Accessibility, Reusability, and

108

ILWIS, 2001. Integrated land and water information system user’s guide. International Institute for Aerospace Survey and Earth Sciences (ITC).

Issa, I. E., N. Al-Ansari, G. Sherwany & S. Knutsson, 2017. Evaluation and modification of some empirical and semi-empirical approaches for prediction of area-storage capacity curves in reservoirs of dams. International Journal of Sediment Research 32(1):127-135 doi:https://doi.org/10.1016/j.ijsrc.2015.12.001.

Jackson, E. K., W. Roberts, B. Nelsen, G. P. Williams, E. J. Nelson & D. P. Ames, 2019. Introductory overview: Error metrics for hydrologic modelling – A review of common practices and an open source library to facilitate use and adoption. Environmental Modelling & Software 119:32-48 doi:https://doi.org/10.1016/j.envsoft.2019.05.001.

Jiang, P., M. Elag, P. Kumar, S. D. Peckham, L. Marini & L. Rui, 2017. A service-oriented architecture for coupling web service models using the Basic Model Interface (BMI). Environmental Modelling & Software 92(Supplement C):107-118 doi:https://doi.org/10.1016/j.envsoft.2017.01.021.

Jones, N., J. Nelson, N. Swain, S. Christensen, D. Tarboton & P. Dash, Tethys: A Software Framework for Web-Based Modeling and Decision Support Applications. In: 7th International Congress on Environmental Modelling and Software, DP Ames, NWT Quinn, and AE Rizzoli (Editors) iEMSs, San Diego, California, 2014. p 170-177.

Jonkman, S. N., 2005. Global Perspectives on Loss of Human Life Caused by Floods. Natural Hazards 34(2):151-175 doi:10.1007/s11069-004-8891-3.

Karl Hennermann, P. B., 2018. What are the changes from ERA-Interim to ERA5? In: European Centre for Medium-Range Weather Forecasts. https://confluence.ecmwf.int/pages/viewpage.action?pageId=74764925 2019.

Knapp, A. K., C. Beier, D. D. Briske, A. T. Classen, Y. Luo, M. Reichstein, M. D. Smith, S. D. Smith, J. E. Bell, P. A. Fay, J. L. Heisler, S. W. Leavitt, R. Sherry, B. Smith & E. Weng, 2008. Consequences of More Extreme Precipitation Regimes for Terrestrial Ecosystems. BioScience 58(9):811-821 doi:10.1641/B580908 %J BioScience.

Kousky, C., 2014. Informing climate adaptation: A review of the economic costs of natural disasters. Energy Economics 46:576-592 doi:https://doi.org/10.1016/j.eneco.2013.09.029.

Kralidis, T., 2017. OWSLib 0.16.0 documentation. In. http://geopython.github.io/OWSLib/.

Page 119: Advancing the Accessibility, Reusability, and

109

Kumar, K. & S. Saran, 2014. Web based geoprocessing tool for coverage data handling. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40(8):1139.

Latt, Z. Z. & H. Wittenberg, 2014. Improving Flood Forecasting in a Developing Country: A Comparative Study of Stepwise Multiple Linear Regression and Artificial Neural Network. Water Resources Management 28(8):2109-2128 doi:10.1007/s11269-014-0600-8.

Lee, K., H. Gao, M. Huang, J. Sheffield & X. Shi, 2017. Development and Application of Improved Long-Term Datasets of Surface Hydrology for Texas %J Advances in Meteorology. 2017:13 doi:10.1155/2017/8485130.

Leff, A. & J. T. Rayfield, Web-application development using the model/view/controller design pattern. In: Enterprise Distributed Object Computing Conference, 2001 EDOC'01 Proceedings Fifth IEEE International, 2001. IEEE, p 118-127.

Lehner, B. & G. Grill, 2013. Global river hydrography and network routing: baseline data and new approaches to study the world's large river systems. Hydrological Processes 27(15):2171-2186 doi:10.1002/hyp.9740.

Leonard, L., C. J. J. E. m. Duffy & software, 2014. Automating data-model workflows at a level 12 HUC scale: Watershed modeling in a distributed computing environment. 61:174-190.

Li, Y., K. Guan, G. D. Schnitkey, E. DeLucia & B. Peng, 2019. Excessive rainfall leads to maize yield loss of a comparable magnitude to extreme drought in the United States. Global Change Biology 25(7):2325-2337 doi:10.1111/gcb.14628.

Lin, C., S. Lu, X. Fei, A. Chebotko, D. Pai, Z. Lai, F. Fotouhi & J. Hua, 2009. A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution. IEEE Transactions on Services Computing 2(1):79-92 doi:10.1109/TSC.2009.4.

Lin, P., Z.-L. Yang, D. J. Gochis, W. Yu, D. R. Maidment, M. A. Somos-Valenzuela & C. H. David, 2018. Implementation of a vector-based river network routing scheme in the community WRF-Hydro modeling framework for flood discharge simulation. Environmental Modelling & Software 107:1-11 doi:https://doi.org/10.1016/j.envsoft.2018.05.018.

Page 120: Advancing the Accessibility, Reusability, and

110

Lorenz, C., M. J. Tourian, B. Devaraju, N. Sneeuw & H. Kunstmann, 2015. Basin-scale runoff prediction: An Ensemble Kalman Filter framework based on global hydrometeorological data sets. Water Resources Research 51(10):8450-8475 doi:10.1002/2014WR016794.

Luo, W., K. Duffin, E. Peronja, J. Stravers & G. Henry, 2004. A web-based interactive landform simulation model (WILSIM). Computers & Geosciences - COMPUT GEOSCI 30:215-220 doi:10.1016/j.cageo.2004.01.001.

Maidment, D. R., R. P. Hooper, D. G. Tarboton & I. Zaslavsky, 2009. Accessing and Sharing Data Using CUAHSI Water Data Services. In Cluckie, I., et al. (eds) HydrologyHydrogeology and Water Resources. IAHS Publ. 331, Proceedings of Symposium JS4 held in Hyderabad, India, 213-223.

Martelloni, G., S. Segoni, R. Fanti & F. J. L. Catani, 2012. Rainfall thresholds for the forecasting of landslide occurrence at regional scale. 9(4):485-495 doi:10.1007/s10346-011-0308-2.

Mason, S. J. K., S. B. Cleveland, P. Llovet, C. Izurieta & G. C. Poole, 2014. A centralized tool for managing, archiving, and serving point-in-time data in ecological research laboratories. Environmental Modelling & Software 51:59-69 doi:https://doi.org/10.1016/j.envsoft.2013.09.008.

McKay, L., Bondelid, T., Dewald, T., Johnston, J., Moore, R., and Rea, A., 2012. NHDPlus Version 2: User Guide.

Merkel, D., 2014. Docker: lightweight Linux containers for consistent development and deployment. 2014(239 %J Linux J.):Article 2.

MetaCarta, 2018. OpenLayers - Home. In. https://openlayers.org/.

Miao, Q., 2019. Are We Adapting to Floods? Evidence from Global Flooding Fatalities. 39(6):1298-1313 doi:10.1111/risa.13245.

Michaelis, C. D. & D. P. Ames, 2009. Evaluation and implementation of the OGC web processing service for use in client-side GIS. Geoinformatica 13(1):109-120.

Min, S.-K., X. Zhang, F. W. Zwiers & G. C. Hegerl, 2011. Human contribution to more-intense precipitation extremes. Nature 470(7334):378-381 doi:10.1038/nature09763.

Page 121: Advancing the Accessibility, Reusability, and

111

Mizukami, N., M. P. Clark, K. Sampson, B. Nijssen, Y. Mao, H. McMillan, R. J. Viger, S. L. Markstrom, L. E. Hay, R. Woods, J. R. Arnold & L. D. Brekke, 2016. mizuRoute version 1: a river network routing tool for a continental domain water resources applications. Geosci Model Dev 9(6):2223-2238 doi:10.5194/gmd-9-2223-2016.

Molteni, F., R. Buizza, T. N. Palmer & T. Petroliagis, 1996. The ECMWF Ensemble Prediction System: Methodology and validation. Quarterly Journal of the Royal Meteorological Society 122(529):73-119 doi:10.1002/qj.49712252905.

Müller, M., 2007. deegree – Building Blocks for Spatial Data. OSGeo Journal 1:4.

Nativi, S., P. Mazzetti & G. N. Geller, 2013. Environmental model access and interoperability: The GEO Model Web initiative. Environmental Modelling & Software 39(Supplement C):214-228 doi:https://doi.org/10.1016/j.envsoft.2012.03.007.

Neteler, M., D. E. Beaudette, P. Cavallini, L. Lami & J. Cepicky, 2008. GRASS GIS. In Hall, G. B. & M. G. Leahy (eds) Open Source Approaches in Spatial Data Handling. Springer Berlin Heidelberg, Berlin, Heidelberg, 171-199.

Nino-Ruiz, M., I. Bishop & C. Pettit, 2013. Spatial model steering, an exploratory approach to uncertainty awareness in land use allocation. Environmental Modelling & Software 39:70-80 doi:https://doi.org/10.1016/j.envsoft.2012.06.009.

Palomo, I., M. Adamescu, K. J. Bagstad, C. Cazacu, H. Klug & S. Nedkov, 2017. Tools for mapping ecosystem services.

Peckham, S. D., E. W. H. Hutton & B. Norris, 2013. A component-based approach to integrated modeling in the geosciences: The design of CSDMS. Computers & Geosciences 53:3-12 doi:https://doi.org/10.1016/j.cageo.2012.04.002.

Pitman, A. J., 2003. The evolution of, and revolution in, land surface schemes designed for climate models. International Journal of Climatology 23(5):479-510 doi:10.1002/joc.893.

PostGIS Project Steering Committee (PSC), 2018. PostGIS - Spatial and Geographic objects for PostgreSQL. In. https://postgis.net/.

PostGIS Project Steering Committee (PSC), 2018. PostgreSQL: The World's Most Advanced Open Source Database. In. https://www.postgresql.org/ Accessed 12-03 2018.

Page 122: Advancing the Accessibility, Reusability, and

112

PyWPS, 2009. Python Web Processing Service (PyWPS). In. http://pywps.org.

QGIS Development Team, 2018b. QGIS Geographic Information System. In. http://qgis.osgeo.org.

Qiao, X., Z. Li, D. P. Ames, E. J. Nelson & N. R. Swain, 2019a. Simplifying the deployment of OGC web processing services (WPS) for environmental modelling – Introducing Tethys WPS Server. Environmental Modelling & Software 115:38-50 doi:https://doi.org/10.1016/j.envsoft.2019.01.021.

Qiao, X., E. J. Nelson, D. P. Ames, Z. Li, C. H. David, G. P. Williams, W. Roberts, J. L. Sánchez Lozano, C. Edwards, M. Souffront & M. A. Matin, 2019b. A systems approach to routing global gridded runoff through local high-resolution stream networks for flood early warning systems. Environmental Modelling & Software 120:104501 doi:10.1016/j.envsoft.2019.104501.

Rajib, M. A., V. Merwade, I. L. Kim, L. Zhao, C. Song & S. Zhe, 2016. SWATShare – A web platform for collaborative research and education through online sharing, simulation and visualization of SWAT models. Environmental Modelling & Software 75:498-512 doi:https://doi.org/10.1016/j.envsoft.2015.10.032.

Roberts, W., G. P. Williams, E. Jackson, E. J. Nelson & D. P. Ames, 2018. Hydrostats: A Python Package for Characterizing Errors between Observed and Predicted Time Series. Hydrology 5(66).

Rockel, B., A. Will & A. Hense, 2008. The regional climate model COSMO-CLM (CCLM), vol 17.

Rose, J. B., S. Daeschner, D. R. Easterling, F. C. Curriero, S. Lele & J. A. Patz, 2000. Climate and waterborne disease outbreaks. 92(9):77-87 doi:10.1002/j.1551-8833.2000.tb09006.x.

Schaeffer, B., 2008. Towards a transactional web processing service. Proceedings of the GI-Days, Münster.

Schiermeier, Q., 2011. Increased flood risk linked to global warming. Nature 470(7334):316-316 doi:10.1038/470316a.

Schut, P. & A. Whiteside, 2007. OpenGIS web processing service. OGC project document.

Page 123: Advancing the Accessibility, Reusability, and

113

Shaad, K., 2018. Evolution of river-routing schemes in macro-scale models and their potential for watershed management. Hydrological Sciences Journal 63(7):1062-1077 doi:10.1080/02626667.2018.1473871.

Shaw, D., L. W. Martz & A. Pietroniro, 2005. Flow routing in large-scale models using vector addition. Journal of Hydrology 307(1):38-47 doi:https://doi.org/10.1016/j.jhydrol.2004.09.019.

Singh, R. S., J. T. Reager, N. L. Miller & J. S. Famiglietti, 2015. Toward hyper-resolution land-surface modeling: The effects of fine-scale topography and soil texture on CLM4.0 simulations over the Southwestern U.S. Water Resources Research 51(4):2648-2667 doi:10.1002/2014WR015686.

Skøien, J. O., M. Schulz, G. Dubois, I. Fisher, M. Balman, I. May & É. Ó. Tuama, 2013. A Model Web approach to modelling climate change in biomes of Important Bird Areas. Ecological informatics 14:38-43.

Snow, A. D., S. D. Christensen, J. W. Lewis, S. McDonald & T. Whitaker, 2017. erdc/RAPIDpy: 2.6.0 (Version 2.6.0). doi:Zenodo. http://doi.org/10.5281/zenodo.941990.

Snow, A. D., S. D. Christensen, N. R. Swain, E. J. Nelson, D. P. Ames, N. L. Jones, D. Ding, N. S. Noman, C. H. David, F. Pappenberger & E. Zsoter, 2016. A High-Resolution National-Scale Hydrologic Forecast System from a Global Ensemble Land Surface Model. JAWRA Journal of the American Water Resources Association 52(4):950-964 doi:10.1111/1752-1688.12434.

Solem, A., 2007. Celery: Distributed Task Queue. In. http://www.celeryproject.org/ 2018.

Sood, A. & V. Smakhtin, 2015. Global hydrological models: a review. Hydrological Sciences Journal 60(4):549-565 doi:10.1080/02626667.2014.950580.

Swain, N. R., 2015. Tethys Platform: A Development and Hosting Platform for Water Resources Web Apps. Brigham Young University.

Swain, N. R., S. D. Christensen, A. D. Snow, H. Dolder, G. Espinoza-Dávalos, E. Goharian, N. L. Jones, E. J. Nelson, D. P. Ames & S. J. Burian, 2016. A new open source platform for lowering the barrier for environmental web app development. Environmental Modelling & Software 85:11-26.

Page 124: Advancing the Accessibility, Reusability, and

114

Tan, X., L. Di, M. Deng, J. Fu, G. Shao, M. Gao, Z. Sun, X. Ye, Z. Sha & B. Jin, 2015. Building an Elastic Parallel OGC Web Processing Service on a Cloud-Based Cluster: A Case Study of Remote Sensing Data Processing Service. Sustainability 7(10):14245-14258 doi:10.3390/su71014245.

Tavakoly, A. A., A. D. Snow, C. H. David, M. L. Follum, D. R. Maidment & Z.-L. Yang, 2017. Continental-Scale River Flow Modeling of the Mississippi River Basin Using High-Resolution NHDPlus Dataset. JAWRA Journal of the American Water Resources Association 53(2):258-279 doi:10.1111/1752-1688.12456.

Taylor, C. M., D. Belušić, F. Guichard, D. J. Parker, T. Vischel, O. Bock, P. P. Harris, S. Janicot, C. Klein & G. Panthou, 2017. Frequency of extreme Sahelian storms tripled since 1982 in satellite observations. Nature 544:475 doi:10.1038/nature22069.

Tu, H., X. Wang, W. Zhang, H. Peng, Q. Ke & X. Chen, 2020. Flash Flood Early Warning Coupled with Hydrological Simulation and the Rising Rate of the Flood Stage in a Mountainous Small Watershed in Sichuan Province, China. 12(1):255.

Tung, Y. K., 1985. River Flood Routing by Nonlinear Muskingum Method. Journal of Hydraulic Engineering 111(12):1447-1460 doi:10.1061/(ASCE)0733-9429(1985)111:12(1447).

UW-Madison, C. T. a., 2018. HTCondor - Home. In. https://research.cs.wisc.edu/htcondor/.

Vannote, R. L., G. W. Minshall, K. W. Cummins, J. R. Sedell & C. E. Cushing, 1980. The River Continuum Concept. Canadian Journal of Fisheries and Aquatic Sciences 37(1):130-137 doi:10.1139/f80-017.

Vitolo, C., W. Buytaert & D. Reusser, Hydrological Models as Web Services: An Implementation Using OGC Standards. In: Proceedings of the 10th International Conference on Hydroinformatics, HIC, Hamburg, Germany, 2012.

Vitolo, C., Y. Elkhatib, D. Reusser, C. J. A. Macleod & W. Buytaert, 2015. Web technologies for environmental Big Data. Environmental Modelling & Software 63:185-198 doi:https://doi.org/10.1016/j.envsoft.2014.10.007.

Voinov, A. & C. Cerco, 2010. Model integration and the role of data. Environmental Modelling & Software 25(8):965-969 doi:https://doi.org/10.1016/j.envsoft.2010.02.005.

Page 125: Advancing the Accessibility, Reusability, and

115

Ward, P. J., B. Jongman, P. Salamon, A. Simpson, P. Bates, T. De Groeve, S. Muis, E. C. de Perez, R. Rudari, M. A. Trigg & H. C. Winsemius, 2015. Usefulness and limitations of global flood risk models. Nature Climate Change 5:712 doi:10.1038/nclimate2742.

Werkzeug, 2011. Werkzeug - The Python WSGI Utility Library. In. http://werkzeug.pocoo.org/.

Westra, S., H. J. Fowler, J. P. Evans, L. V. Alexander, P. Berg, F. Johnson, E. J. Kendon, G. Lenderink & N. M. Roberts, 2014. Future changes to the intensity and frequency of short-duration extreme rainfall. 52(3):522-555 doi:10.1002/2014rg000464.

Wikipedia, 2019. Floods in Bangladesh. In: Wikipedia. The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Floods_in_Bangladesh&oldid=920590097 Accessed October 11 2019.

Wu, H., R. F. Adler, Y. Hong, Y. Tian & F. Policelli, 2012. Evaluation of Global Flood Detection Using Satellite-Based Rainfall and a Hydrologic Model. Journal of Hydrometeorology 13(4):1268-1284 doi:10.1175/jhm-d-11-087.1.

Wu, H., R. F. Adler, Y. Tian, G. J. Huffman, H. Li & J. Wang, 2014. Real-time global flood estimation using satellite-based precipitation and a coupled land surface and routing model. Water Resources Research 50(3):2693-2717 doi:10.1002/2013WR014710.

Xia, Y., K. Mitchell, M. Ek, J. Sheffield, B. Cosgrove, E. Wood, L. Luo, C. Alonge, H. Wei, J. Meng, B. Livneh, D. Lettenmaier, V. Koren, Q. Duan, K. Mo, Y. Fan & D. Mocko, 2012. Continental-scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products. Journal of Geophysical Research: Atmospheres 117(D3) doi:10.1029/2011jd016048.

Xiao, D., M. Chen, Y. Lu, S. Yue & T. Hou, 2019. Research on the Construction Method of the Service-Oriented Web-SWMM System. ISPRS International Journal of Geo-Information 8(6):268.

Yamazaki, D., G. A. M. de Almeida & P. D. Bates, 2013. Improving computational efficiency in global river models by implementing the local inertial flow equation and a vector-based river network map. Water Resources Research 49(11):7221-7235 doi:10.1002/wrcr.20552.

Page 126: Advancing the Accessibility, Reusability, and

116

Yang, C.-T., G.-L. Huang, F.-H. Huang, T.-W. Ho, P.-H. Huang & J.-M. Yang, Soa-based platform for water resource information exchanging. In: 2009 17th International Conference on Geoinformatics, 2009. IEEE, p 1-4.

Yin, D., Y. Liu, A. Padmanabhan, J. Terstriep, J. Rush & S. Wang, 2017. A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale. Paper presented at the Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, New Orleans, LA, USA.

Younis, J. & A. P. J. De Roo, 2010. LISFLOOD: a GIS‐based distributed model for river basin scale water balance and flood simulation AU - Van Der Knijff, J. M. International Journal of Geographical Information Science 24(2):189-212 doi:10.1080/13658810802549154.

Yue, P., M. Zhang, Z. J. E. M. Tan & Software, 2015. A geoprocessing workflow system for environmental monitoring and integrated modelling. 69:128-140.

Yue, S., M. Chen, Y. Wen & G. Lu, 2016. Service-oriented model-encapsulation strategy for sharing and integrating heterogeneous geo-analysis models in an open web environment. ISPRS Journal of Photogrammetry and Remote Sensing 114:258-273.

Zhang, F., M. Chen, D. P. Ames, C. Shen, S. Yue, Y. Wen & G. Lü, 2019. Design and development of a service-oriented wrapper system for sharing and reusing distributed geoanalysis models on the web. Environmental Modelling & Software 111:498-509 doi:https://doi.org/10.1016/j.envsoft.2018.11.002.

Zhang, Q., X. Gu, V. P. Singh, L. Liu & D. Kong, 2016. Flood-induced agricultural loss across China and impacts from climate indices. Global and Planetary Change 139:31-43 doi:https://doi.org/10.1016/j.gloplacha.2015.10.006.

Zhao, J., J. M. Gomez-Perez, K. Belhajjame, G. Klyne, E. Garcia-Cuesta, A. Garrido, K. Hettne, M. Roos, D. D. Roure & C. Goble, Why workflows break — Understanding and combating decay in Taverna workflows. In: 2012 IEEE 8th International Conference on E-Science, 8-12 Oct. 2012 2012a. p 1-9.

Zhao, P., T. Foerster & P. Yue, 2012b. The geoprocessing web. Computers & Geosciences 47:3-12.

Page 127: Advancing the Accessibility, Reusability, and

117

APPENDIX A. SOFTWARE AVAILABILITY

1. New Software Developed in the Dissertation

Chapter 2:

Extended RAPIDpy

o Source code: https://github.com/BYU-Hydroinformatics/RAPIDpy.git.

Chapter 3:

Streamflow prediction service

o http://cosmo.byu.edu:8060/modelser/preparation/5deef20a230f6b6e9bbae443

Streamflow prediction service encapsulation package for OGMS-MS:

o https://github.com/xhqiao89/rapidpy_docker_opengms

Dockerized streamflow prediction system:

o https://hub.docker.com/r/xhqiao89/rapidpy

OpenGMS service client SDK Python version example:

o https://github.com/xhqiao89/OpenGMS_SDK_Python

Chapter 4:

Tethys WPS Server

o Source code: https://github.com/tethysplatform/tethys/tree/tethys_wps

o Web Page: http://cosmo.byu.edu/tethys_wps/?service=WPS&request=GetCapabilities

Page 128: Advancing the Accessibility, Reusability, and

118

o License: The BSD 2-Clause open source license

Tethys WPS Server case study apps

o http://cosmo.byu.edu/apps/. (username : demo; password : demo)

o The source code for these three web apps are available on GitHub in the following

repositories:

o Watershed Delineation for Dominican Republic App:

https://github.com/xhqiao89/watershed_delineation_app

o Reservoir Calculation for Dominican Republic App:

https://github.com/xhqiao89/reservoir_calculation_app

o HydroProspector for Dominican Republic App:

https://github.com/xhqiao89/hydroprospector_app

2. Third-party Software Used in the Dissertation

Tethys Platform

o Source code: https://github.com/tethysplatform/tethys

o Home page: http://www.tethysplatform.org/

o License: The BSD 2-Clause open source license

PyWPS

o Source code: https://github.com/geopython/pywps

o Home page: http://pywps.org/

o License: The MIT license

ArcGIS RAPID preprocessing toolbox

o Developed by Esri:

o Source code: https://github.com/Esri/python-toolbox-for-rapid.git.

Page 129: Advancing the Accessibility, Reusability, and

119

RAPIDpy

o Developed by U.S. Army Engineer Research and Development Center

o Source code: https://github.com/erdc/RAPIDpy.git.

Global “Streamflow Prediction Tool” web app

o https://tethys.byu.edu/apps/streamflow-prediction-tool/ (username : demo; password :

demo).

OGMS-MS source code:

o https://gitee.com/geomodeling/GeoModelServiceContainer

OpenGMS service client SDK:

o Source code: https://gitee.com/geomodeling/NJModelServiceSDK