308
Péter Kacsuk Editor Science Gateways for Distributed Computing Infrastructures Development Framework and Exploitation by Scientific User Communities

Science Gateways for Distributed Computing Infrastructures ||

  • Upload
    peter

  • View
    240

  • Download
    13

Embed Size (px)

Citation preview

  • PterKacsuk Editor

    Science Gateways for Distributed Computing InfrastructuresDevelopment Framework and Exploitation by Scientific User Communities

  • Science Gateways for Distributed ComputingInfrastructures

  • Pter KacsukEditor

    Science Gateways forDistributed ComputingInfrastructuresDevelopment Framework and Exploitation byScientific User Communities

    123

  • EditorPter KacsukLaboratory of Parallel and DistributedSystems

    Hungarian Academy of Sciences (MTA)BudapestHungary

    ISBN 978-3-319-11267-1 ISBN 978-3-319-11268-8 (eBook)DOI 10.1007/978-3-319-11268-8

    Library of Congress Control Number: 2014951697

    Springer Cham Heidelberg New York Dordrecht London

    Springer International Publishing Switzerland 2014This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed. Exempted from this legal reservation are briefexcerpts in connection with reviews or scholarly analysis or material supplied specifically for thepurpose of being entered and executed on a computer system, for exclusive use by the purchaser of thework. Duplication of this publication or parts thereof is permitted only under the provisions ofthe Copyright Law of the Publishers location, in its current version, and permission for use must alwaysbe obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearance Center. Violations are liable to prosecution under the respective Copyright Law.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names are exemptfrom the relevant protective laws and regulations and therefore free for general use.While the advice and information in this book are believed to be true and accurate at the date ofpublication, neither the authors nor the editors nor the publisher can accept any legal responsibility forany errors or omissions that may be made. The publisher makes no warranty, express or implied, withrespect to the material contained herein.

    Printed on acid-free paper

    Springer is part of Springer Science+Business Media (www.springer.com)

  • Acknowledgments

    I would like to say thanks to all the SCI-BUS partners, subcontractors and asso-ciated members for their constant feedback by which they helped with continuousimprovement of the SCI-BUS technology. I owe special thanks to Tibor Gottdankfor his help in editing the book.

    Peter KacsukCoordinator of SCI-BUSSCI-BUS project: Chaps. 119This work is financially supported by the SCI-BUS project funded by European

    Union Seventh Framework Programme (FP7/2007-2013) under grant agreementno. 28348.

    ER-flow project: Chaps. 5, 9, 10, 11, and 18The research leading to these results has received funding from the European

    Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no312579 (ER-Flow).

    agINFRA project: Chap. 17The agINFRA section, which is supported by EU FP7-Infrastructures agINFRA

    project no. 283770, was written in close contribution with Charalampos Thanop-oulos, Nikos Manolis, and Andreas Drakos from AgroKnow and Valeria Pescefrom FAO. The authors would like to say a special thank you to the projectcoordinator, Miguel-Angel Sicilia.

    CloudSME project: Chaps. 5 and 19The research leading to these results received funding from the European Union

    Seventh Framework Programme (FP7/2007-2013) under grant agreement no.608886 (CloudSME).

    SHIWA project: Chap. 9The development of the Coarse-Grained Interoperability concept and the SHI-

    WA Simulation Platform was supported by the EU funded FP7 Sharing Interop-erable Workflow for Large-Scale Scientific Simulation on Available DCIs(SHIWA) project (grant no. 261585).

    v

    http://dx.doi.org/10.1007/978-3-319-11268-8_1http://dx.doi.org/10.1007/978-3-319-11268-8_19http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_9http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_11http://dx.doi.org/10.1007/978-3-319-11268-8_18http://dx.doi.org/10.1007/978-3-319-11268-8_17http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_19http://dx.doi.org/10.1007/978-3-319-11268-8_9

  • VIALACTEA project: Chap. 5The research leading to these results has received funding from the European

    Union Seventh Framework Programme (FP7/2007-2013) under grant agreementno. 607380 (VIALACTEA).

    Further Acknowledgments

    Chapter 10: We thank the colleagues who participated in the development,deployment and testing of the AMC-NSG: Paul F.C. Groot, Hurng-Chun Lee,Mostapha al Mourabit, Mark Santcroos, Gerbrand Spaans and Jalmar Teeuw.

    This work is financially supported by the COMMIT project e-Biobanking withimaging for healthcare, funded by the Nederlandse Organisatie voor Wet-enschappelijk Onderzoek (Netherlands Organization for Scientific Research,NWO), and the HPCN UvA project Computational Neuroscience Gateway,funded by the University of Amsterdam.

    Chapter 11: The authors would like to thank the German Federal Ministry ofEducation and Research (BMBF) for the opportunity to do research in the MoSGridproject (reference 01IG09006).

    The research leading to these results has also partially been supported by theLSDMA project of the Helmholtz Association of German Research Centers. Spe-cial thanks are due to NGI-DE for managing the German Grid infrastructure and allcompute centres supporting MoSGrid.

    Chapter 14: The authors are particularly grateful to Prof. Peter Gallagher and histeam for the invaluable help and support.

    Chapter 15: This work was supported in part by the Ministry of Education,Science and Technological Development of the Republic of Serbia under projectsON171017, III43007 and NAI-DBEC; by German Academic and Exchange Service(DAAD) under project; NAI-DBEC; and by the European Commission under EUFP7 project SCI-BUS (grant no. 283481), PRACE-3IP (grant no. 312763), HP-SEE(grant no. 261499) and EGI-InSPIRE (grant no. 261323).

    Chapter 17: The authors of the VERCE science gateway section, whose work issupported by EU FP7-Infrastructures VERCE project (no. 283543), represent alarger team including Emanuele Cesarotti, Claudia Ramos Garcia, Leong SiewHoon, Amrey Krause, Lion Krischer, Federica Magnoni, Jonas Matser and VisakhMuraleedharan, who contributed to the implementation.

    The authors of the DRIHM Gateway section, which is supported by EU FP7-Infrastructures DRIHM (project no. 28356), are only representatives of a largerteam composed of Andrea Clematis, Antonella Galizia, Alfonso Quarati. LucaRoverelli and Gabriele Zereik from CNR-IMATI; and Dieter Kranzlmller, NilsGentschen Felde and Christian Straube from LMU. They would like to say a specialthank you to the project coordinator, Antonio Parodi.

    vi Acknowledgments

    http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_11http://dx.doi.org/10.1007/978-3-319-11268-8_14http://dx.doi.org/10.1007/978-3-319-11268-8_15http://dx.doi.org/10.1007/978-3-319-11268-8_17

  • Contents

    Part I WS-PGRADE/gUSE Science Gateway Framework

    1 Introduction to Science Gateways and Science GatewayFrameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Pter Kacsuk

    2 Introduction to the WS-PGRADE/gUSE Science GatewayFramework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Tibor Gottdank

    3 Workflow Concept of WS-PGRADE/gUSE. . . . . . . . . . . . . . . . . . 33kos Balask

    4 DCI Bridge: Executing WS-PGRADE Workflowsin Distributed Computing Infrastructures . . . . . . . . . . . . . . . . . . 51Miklos Kozlovszky, Krisztin Karczkai, Istvn Mrton,Pter Kacsuk and Tibor Gottdank

    5 Remote Storage Resource Managementin WS-PGRADE/gUSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69kos Hajnal, Zoltn Farkas, Pter Kacsuk and Tams Pintr

    6 WS-PGRADE/gUSE Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Zoltn Farkas

    7 WS-PGRADE/gUSE and Clouds . . . . . . . . . . . . . . . . . . . . . . . . . 97Zoltn Farkas, kos Hajnal and Pter Kacsuk

    vii

    http://dx.doi.org/10.1007/978-3-319-11268-8_1http://dx.doi.org/10.1007/978-3-319-11268-8_1http://dx.doi.org/10.1007/978-3-319-11268-8_1http://dx.doi.org/10.1007/978-3-319-11268-8_2http://dx.doi.org/10.1007/978-3-319-11268-8_2http://dx.doi.org/10.1007/978-3-319-11268-8_2http://dx.doi.org/10.1007/978-3-319-11268-8_3http://dx.doi.org/10.1007/978-3-319-11268-8_3http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_6http://dx.doi.org/10.1007/978-3-319-11268-8_6http://dx.doi.org/10.1007/978-3-319-11268-8_7http://dx.doi.org/10.1007/978-3-319-11268-8_7

  • 8 Developing Science Gateways at Various Levelsof Granularity Using WS-PGRADE/gUSE . . . . . . . . . . . . . . . . . . 111Tams Kiss, Gbor Terstynszky, Pter Borsody,Pter Kacsuk and kos Balask

    9 Sharing Science Gateway Artefacts Through Repositories. . . . . . . 123Gbor Terstynszky, Edward Michniak, Tams Kissand kos Balask

    Part II Domain-Specific Science Gateways Customizedfrom the WS-PGRADE/gUSE Framework

    10 Computational Neuroscience Gateway: A Science GatewayBased on the WS-PGRADE/gUSE . . . . . . . . . . . . . . . . . . . . . . . . 139Shayan Shahand, Mohammad Mahdi Jaghoori,Ammar Benabdelkader, Juan Luis Font-Calvo, Jordi Huguet,Matthan W.A. Caan, Antoine H.C. van Kampenand Slvia D. Olabarriaga

    11 Molecular Simulation Grid (MoSGrid): A Science GatewayTailored to the Molecular Simulation Community . . . . . . . . . . . . 151Sandra Gesing, Jens Krger, Richard Grunzke,Luis de la Garza, Sonja Herres-Pawlis and Alexander Hoffmann

    12 Statistical Seismology Science Gateway . . . . . . . . . . . . . . . . . . . . 167elebi Kocair, Cevat ener and Ayen D. Akkaya

    13 VisIVO Gateway and VisIVO Mobile for the AstrophysicsCommunity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181Eva Sciacca, Fabio Vitello, Ugo Becciani,Alessandro Costa and Piero Massimino

    14 HELIOGate, a Portal for the Heliophysics Community . . . . . . . . 195Gabriele Pierantoni and Eoin Carley

    15 Science Gateway for the Serbian Condensed MatterPhysics Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209Duan Vudragovi and Antun Bala

    viii Contents

    http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_9http://dx.doi.org/10.1007/978-3-319-11268-8_9http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_11http://dx.doi.org/10.1007/978-3-319-11268-8_11http://dx.doi.org/10.1007/978-3-319-11268-8_11http://dx.doi.org/10.1007/978-3-319-11268-8_12http://dx.doi.org/10.1007/978-3-319-11268-8_12http://dx.doi.org/10.1007/978-3-319-11268-8_13http://dx.doi.org/10.1007/978-3-319-11268-8_13http://dx.doi.org/10.1007/978-3-319-11268-8_13http://dx.doi.org/10.1007/978-3-319-11268-8_14http://dx.doi.org/10.1007/978-3-319-11268-8_14http://dx.doi.org/10.1007/978-3-319-11268-8_15http://dx.doi.org/10.1007/978-3-319-11268-8_15http://dx.doi.org/10.1007/978-3-319-11268-8_15

  • Part III Further Applications of WS-PGRADE/gUSE

    16 WS-PGRADE/gUSE-Based Science Gateways in Teaching . . . . . . 223Slvia Delgado Olabarriaga, Ammar Benabdelkader,Matthan W.A. Caan, Mohammad Mahdi Jaghoori,Jens Krger, Luis de la Garza, Christopher Mohr,Benjamin Schubert, Anatoli Danezi and Tamas Kiss

    17 WS-PGRADE/gUSE in European Projects . . . . . . . . . . . . . . . . . . 235Tams Kiss, Pter Kacsuk, Rbert Lovas, kos Balask,Alessandro Spinuso, Malcolm Atkinson, Daniele DAgostino,Emanuele Danovaro and Michael Schiffers

    18 Creating Gateway Alliances Using WS-PGRADE/gUSE . . . . . . . . 255Ugo Becciani, Eva Sciacca, Alessandro Costa, Piero Massimino,Fabio Vitello, Santi Cassisi, Adriano Pietrinferni, Giuliano Castelli,Cristina Knapic, Riccardo Smareglia, Giuliano Taffoni,Claudio Vuerli, Marian Jakubik, Lubos Neslusan, Mel Krokosand Gong-Bo Zhao

    19 Commercial Use of WS-PGRADE/gUSE . . . . . . . . . . . . . . . . . . . 271Tams Kiss, Pter Kacsuk, va Takcs, ron Szab,Pter Tihanyi and Simon J.E. Taylor

    Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

    References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291

    Contents ix

    http://dx.doi.org/10.1007/978-3-319-11268-8_16http://dx.doi.org/10.1007/978-3-319-11268-8_16http://dx.doi.org/10.1007/978-3-319-11268-8_17http://dx.doi.org/10.1007/978-3-319-11268-8_17http://dx.doi.org/10.1007/978-3-319-11268-8_18http://dx.doi.org/10.1007/978-3-319-11268-8_18http://dx.doi.org/10.1007/978-3-319-11268-8_19http://dx.doi.org/10.1007/978-3-319-11268-8_19

  • Contributors

    Ayen D. Akkaya Middle East Technical University, Ankara, Turkey

    Malcolm Atkinson School of Informatics, Data-Intensive Research Group, Uni-versity of Edinburgh, Edinburgh, Scotland

    kos Balask Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Antun Bala Scientific Computing Laboratory, Institute of Physics Belgrade,University of Belgrade, Belgrade, Serbia

    Ugo Becciani Astrophysical Observatory of Catania, National Institute forAstrophysics (INAF), Catania, Italy

    Ammar Benabdelkader AcademicMedical Centre of the University of Amsterdam,Amsterdam, The Netherlands

    Pter Borsody Centre for Parallel Computing, University of Westminster, London,UK

    Matthan W.A. Caan Academic Medical Centre of the University of Amsterdam,Amsterdam, The Netherlands

    Eoin Carley School of Computer Science and Statistics and School of Physics,Trinity College Dublin, Dublin, Ireland

    Santi Cassisi Astronomical Observatory of Teramo, National Institute for Astro-physics (INAF), Teramo, Italy

    Giuliano Castelli Astronomical Observatory of Trieste, National Institute forAstrophysics (INAF), Trieste, Italy

    Alessandro Costa Astrophysical Observatory of Catania, National Institute forAstrophysics (INAF), Catania, Italy

    xi

  • Daniele DAgostino Consiglio Nazionale delle Ricerche (CNR-IMATI), Genoa,Italy

    Anatoli Danezi SURFsara, Amsterdam, The Netherlands

    Emanuele Danovaro Consiglio Nazionale delle Ricerche (CNR-IMATI), Genoa,Italy

    Luis de la Garza Applied Bioinformatics Group, University of Tbingen,Tbingen, Germany

    Zoltn Farkas Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Juan Luis Font-Calvo Academic Medical Centre of the University of Amsterdam,Amsterdam, The Netherlands

    Sandra Gesing Center for Research Computing, Information Technology Center,University of Notre Dame, Notre Dame, USA

    Tibor Gottdank Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Richard Grunzke Technische Universitt Dresden, Dresden, Germany

    kos Hajnal Laboratory of Parallel and Distributed Systems, Institute for Com-puter Science and Control, Hungarian Academy of Sciences, Budapest, Hungary

    Sonja Herres-Pawlis Ludwig-Maximilians-Universitt Mnchen, Munich,Germany

    Alexander Hoffmann Ludwig-Maximilians-Universitt Mnchen, Munich,Germany

    Jordi Huguet Academic Medical Centre of the University of Amsterdam,Amsterdam, The Netherlands

    Mohammad Mahdi Jaghoori Academic Medical Centre of the University ofAmsterdam, Amsterdam, The Netherlands

    Marian Jakubik Astronomical Institute, Slovak Academy of Sciences, TatranskaLomnica, The Slovak Republic

    Pter Kacsuk Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary; Centre for Parallel Computing, University of Westminster, London, UK

    Krisztin Karczkai Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    xii Contributors

  • Tams Kiss Centre for Parallel Computing, University of Westminster, London,UK

    Cristina Knapic Astronomical Observatory of Trieste, National Institute forAstrophysics (INAF), Trieste, Italy

    elebi Kocair Middle East Technical University, Ankara, Turkey

    Miklos Kozlovszky Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Mel Krokos University of Portsmouth, Portsmouth, UK

    Jens Krger Applied Bioinformatics Group, University of Tbingen, Tbingen,Germany

    Rbert Lovas Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Istvn Mrton Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Piero Massimino Astrophysical Observatory of Catania, National Institute forAstrophysics (INAF), Catania, Italy

    Edward Michniak Centre for Parallel Computing, University of Westminster,London, UK

    Christopher Mohr Applied Bioinformatics Group, University of Tbingen,Tbingen, Germany

    Lubos Neslusan Astronomical Institute, Slovak Academy of Sciences, TatranskaLomnica, The Slovak Republic

    Slvia Delgado Olabarriaga Academic Medical Centre of the University ofAmsterdam, Amsterdam, The Netherlands

    Gabriele Pierantoni School of Computer Science and Statistics and School ofPhysics, Trinity College Dublin, Dublin, Ireland

    Adriano Pietrinferni Astronomical Observatory of Teramo, National Institute forAstrophysics (INAF), Teramo, Italy

    Tams Pintr Laboratory of Parallel and Distributed Systems, Institute forComputer Science and Control, Hungarian Academy of Sciences, Budapest,Hungary

    Michael Schiffers Ludwig-Maximilians-Universitt Mnchen, Munich, Germany

    Contributors xiii

  • Benjamin Schubert Applied Bioinformatics Group, University of Tbingen,Tbingen, Germany

    Eva Sciacca Astrophysical Observatory of Catania, National Institute for Astro-physics (INAF), Catania, Italy

    Cevat ener Middle East Technical University, Ankara, Turkey

    Shayan Shahand Academic Medical Centre of the University of Amsterdam,Amsterdam, The Netherlands

    Riccardo Smareglia Astronomical Observatory of Trieste, National Institute forAstrophysics (INAF), Trieste, Italy

    Alessandro Spinuso R&D, Koninklijk Nederlands Meteorologisch Instituut(KNMI), De Bilt, The Netherlands

    ron Szab E-Group ICT Software Zrt., Budapest, Hungary

    Giuliano Taffoni Astronomical Observatory of Trieste, National Institute forAstrophysics (INAF), Trieste, Italy

    va Takcs 4D Soft Kft., Budapest, Hungary

    Simon J.E. Taylor Brunel University, London, UK

    Gbor Terstynszky Centre for Parallel Computing, University of Westminster,London, UK

    Pter Tihanyi E-Group ICT Software Zrt., Budapest, Hungary

    Antoine H.C. van Kampen Academic Medical Centre of the University ofAmsterdam, Amsterdam, The Netherlands

    Fabio Vitello Astrophysical Observatory of Catania, National Institute forAstrophysics (INAF), Catania, Italy

    Duan Vudragovi Scientific Computing Laboratory, Institute of Physics Bel-grade, University of Belgrade, Belgrade, Serbia

    Claudio Vuerli Astronomical Observatory of Trieste, National Institute forAstrophysics (INAF), Trieste, Italy

    Gong-Bo Zhao University of Portsmouth, Portsmouth, UK

    xiv Contributors

  • Abbreviations

    A&A Astronomy and astrophysicsAEGIS Academic and Educational Grid Initiative of SerbiaagINFRA Agricultural data infrastructureAPI Application programming interfaceASM Application-specific moduleBDII Berkeley Database Information IndexBEDPOSTX Bayesian estimation of diffusion parameters obtained

    using sampling techniques for modelling crossing fibresBES Basic execution serviceBFT Basic file transferBOINC Berkeley Open Infrastructure for Network ComputingBTSWD Block task to subworkflow decomposition patternCA Certificate authorityCADDSuite Computer-aided drug design suiteCDK Chemistry Development KitCE Computing elementCGI Coarse-grained interoperabilityChartEx Charter ExcavatorCIARD Coherence in Information for Agricultural ResearchCloudSME Cloud-based Simulation platform for Manufacturing

    and EngineeringCMB Cosmic microwave backgroundCML Chemical markup languageCMPC Condensed matter physics communityCOMCAPT Comets captureCRL Certificate revocation listDAG Directed acyclic graphDCI Distributed computing infrastructureDFT Density functional theoryDPM Damage probability matrixDRIHM Distributed research infrastructure for hydro-meteorologyDT Data transport

    xv

  • DTI Diffusion tensor imagingEADR Expected annual damage ratioEC2 Elastic Compute CloudEDGeS Enabling Desktop Grids for e-ScienceeDOX Document and records management system developed by E-

    GroupEFEHR European Facility for Earthquake Hazard and RiskEGI European grid infrastructure/initiativeEMI European middleware initiativeEoS Equation of stateER-flow Building an European Research Community through Interopera-

    ble Workflows and DataESA European Space AgencyFAO Food and Agriculture Organization of the United NationsFDSN International Federation of Digital Seismograph NetworksFIR Full isochrone runFMIT From multiple instance task patternFRANEC Frascati Raphson Newton evolutionary codeFUSE Filesystem in userspaceGEMLCA Grid execution management for legacy code applicationsgLite Lightweight middleware for grid computingGMBS Generic Metabroker ServiceGSISSH Grid Security Infrastructure Secure ShellGT5 Globus Toolkit version 5GUI Graphical user interfacegUSE Grid and Cloud User Support EnvironmentGWT Google Web ToolkitHMR Hydro-meteorological researchHPC High performance computingHP-SEE High-Performance Computing Infrastructure for South

    East Europes Research CommunitiesIDB Incarnation data baseIPDA International Planetary Data AllianceiRODS Integrated rule-oriented data systemIS Information systemIVOA International Virtual Observatory AllianceJDL Job description languageJSDL Job submission description languageJSON JavaScript object notationKVM Kernel-based virtual machineLaSMoG Large Simulation for Modified GravityLCG Large hadron collider computing gridLFC LCG file catalogueMD Molecular dynamicsMeso-NH Non-hydrostatic mesoscale atmospheric model

    xvi Abbreviations

  • MESTREAM Meteoroid streamMIReG Management Information Resources for eGovernmentMOOC Massive open online courseMoSGrid Molecular simulation gridMPI Message passing interfaceMRC Metadata and replica catalogMRI Magnetic resonance imagingMSML Molecular simulation markup languageNGI National Grid InitiativeNMI National science foundation Middleware InitiativeNSG Neuroscience gatewayNWP Numerical weather predictionOCR Optical character recognitionOCSP Online Certificate Status ProtocolOGSA Open Grid Services ArchitectureORFEUS Observatories and Research Facilities for European SeismologyOSD Object storage deviceOVA Open virtual appliance or applicationPaaS Platform as a servicePBS Portable Batch SystemPDB Protein data bankPE Processing elementPM Processing managerPOSIX Portable operating system interfacePRACE Partnership for Advanced Computing in EuropePROVman Provenance managerPS Parameter sweep/studyPSHA Probabilistic seismic hazard assessmentQC Quantum chemicalQDR Quad data rateRA Registration authorityRegexp Regular expressionREST Representational state transferRIBS Real-time interactive basin simulatorRING Routemap to information nodes and gatewaysS3 Simple storage serviceSaaS Software as a serviceSAGA Simple API for Grid ApplicationsSAML Security Assertion Markup LanguageSCI-BUS Scientific Gateway-Based User SupportSCP Secure copySDF Structure data formatSE Storage elementSFTP Secure File Transfer ProtocolSG Science gateway

    Abbreviations xvii

  • SHA Seismic Hazard AssessmentSHIWA Sharing Interoperable Workflows for large-scale scientific

    simulations on Available DCIsSM Synthetic modelSME Small and medium size enterpriseSMR Synthetic model runSOA Service-oriented architectureSOAP Simple Object Access ProtocolSRA Seismic risk analysisSRM Storage resource managementSSF Statistical Seismology FunctionSSO Single sign onSSP SHIWA Simulation PlatformSSS-Gateway Statistical Seismology Science GatewaySVO Solar Virtual ObservatorySWDBT Sub-workflow decomposition to block task patternTIFF Tagged Image File FormatTLS Transport layer securityTMIT To multiple instance task patternTOD Time ordered dataTORQUE Terascale Open-Source Resource and Queue ManagerTransAT Transport phenomena analysis toolUNICORE Uniform Interface to Computing ResourcesVBT VisIVO binary tableVERCE Virtual Earthquake and seismology Research

    Community in Europe e-science environmentVisIVO Visualization Interface for the Virtual ObservatoryVM Virtual machineVO Virtual organizationVOMS Virtual organization membership serviceVTK Visualization toolkitW3C-PROV World Wide Web Consortium PROV Data ModelWFI Workflow interpreterWfMS Workflow management systemWFS Workflow storageWLDG Westminster Local Desktop GridWMS Workload management systemWRF Weather Research and ForecastingWRF-ARW Weather Research and Forecasting-Advanced Research WRFWRF-NMM Weather Research and Forecasting-Nonhydrostatic Mesoscale

    ModelWSE Workflow submission engineWS-PGRADE Web Service-Parallel Grid Run-time and Application

    Development Environment

    xviii Abbreviations

  • WVO Westfocus Virtual OrganizationXACML eXtensible Access Control Markup LanguageXAdES XML Advanced Electronic SignaturesXNAT eXtensible Neuroimaging Archive ToolkitXpath XML Path LanguageXSEDE Extreme Science and Engineering Discovery EnvironmentYAWL Yet Another Workflow Language

    Abbreviations xix

  • Part IWS-PGRADE/gUSE Science Gateway

    Framework

  • Chapter 1Introduction to Science Gatewaysand Science Gateway Frameworks

    Pter Kacsuk

    Abstract This chapter gives a short introduction to the basic architecture andfunctionalities of science gateways, as well as their development methods. It thenbriefly describes the EU FP7 SCI-BUS project that is developing a core sciencegateway framework called as WS-PGRADE/gUSE. A large number of various usercommunities have developed application-oriented science gateways by adaptingand customizing the WS-PGRADE/gUSE gateway framework. The chapter alsoexplains the vision of SCI-BUS on a collaboration-based SG instance developmentmethodology. Finally, it gives a guide on how to read the rest of the book.

    1.1 Science Gateway Frameworks and Instances

    More and more scientific communities use distributed computing infrastructures(DCI) including grids and clouds. Unfortunately, directly using these infrastructuresis not easy; it requires a lot of expertise and skill, and a good understanding of theworking mechanisms of these infrastructures. Typical scientists like chemists,biologists, etc., do not have this required skill, and hence they require a high-level,scientific domain-specific user interface that hides all the details of the underlyinginfrastructure and exposes only the science-specific parts of the applications to beexecuted in the various DCIs.

    Science gateways are the typical environments that realize these needs. They aretypically provided as a web interface that can be accessed from everywhere in theworld. They have the advantage that scientists do not have to install anything ontheir personal desktop machines or mobile devices and no matter where they travel

    P. Kacsuk (&)Centre for Parallel Computing, University of Westminster, London, UKe-mail: [email protected]; [email protected]

    P. KacsukLaboratory of Parallel and Distributed Systems, Institute for Computer Science and Control,Hungarian Academy of Sciences, Budapest, Hungary

    Springer International Publishing Switzerland 2014P. Kacsuk (ed.), Science Gateways for Distributed Computing Infrastructures,DOI 10.1007/978-3-319-11268-8_1

    3

  • (conferences, visiting other scientists, etc.), they can access the DCIs and runapplications on them. Recognizing these advantages, more and more scientificcommunities have decided to build such gateways in order to simplify their use ofthe various DCIs.

    Using the terminology introduced by the EGI Science gateway Virtual Team,science gateways (SG) can be divided into two main categories (Lovas 2013): SGframeworks and SG instances. SG frameworks or generic DCI gateway frame-works are not specialized for a certain scientific area, and hence scientists frommany different areas can use them. National Grid Initiatives (NGIs) are goodcandidates to set up such gateways to support their very heterogeneous usercommunities. Typical gateways belonging to this category are the Catania ScienceGateway (Rotondo 2012), GridPort (Thomas 2001), Vine Toolkit (Dziubecki2012), and WS-PGRADE/gUSE (Kacsuk 2012). These gateways usually expose alarge set of low-level services for their users. On the one hand, this is an obviousadvantage, but on the other hand in order to exploit their full power, scientists needa relatively long learning period to efficiently use all the available features. Thepowerful but complex functionalities offered by a generic SG may be too com-plicated for end-users but could represent the right abstraction level for IT spe-cialists, who can develop DCI applications for the scientists.

    SG instances or application-specific SGs target a well-defined set of scientiststypically working in a specific field of science. They provide a simplified userinterface that is highly tailored to the needs of the given scientific community. As aresult, the scientists do not have to learn too much to use the functionalities pro-vided by the gateway. On the other hand, these services are limited, and hence if ascientist needs a more complex service, for example, utilizing a new type of DCI,this cannot be easily created and managed by these gateways. There are two optionsin order to create such SG instances there are two options.

    The first option is to write the gateway from scratch. Since the services neededfor a particular community are typically limited, and there are good technologies forthe construction of web portals, like Liferay, it is relatively easy to develop such SGinstances (compared with the development of an SG framework). However, suchsimplified gateways typically support the use of only one particular DCI andpossibly do not support some advanced features such as workflow execution. Somecommunities selecting this option may underestimate the required manpower andtime to produce a robust gateway that can be provided as a production 24/7 servicefor the large number of members of the community. Problems that typically ariseonce the gateway goes into production and becomes successful are scalability (tocope with more users than initially planned) and flexibility (to add new functionsrequested by the users). Moreover, while building and maintaining such gateways,the different communities usually solve again and again the same technical issuesindependently from each other, which could be avoided by reusing and customizingsolutions implemented by SG frameworks.

    The other option is to customize an existing versatile SG framework accordingto the needs of a certain user community. In this case the full power of theunderlying portal framework can be exploited, for example, by developing

    4 P. Kacsuk

  • comprehensive and sophisticated workflows for the community and hiding thesecomplex workflows behind a simplified application-specific user interface. Theadvantage of this approach is that the DCI access services are already solved andprovided in a robust way by an SG framework, and hence the user communities canconcentrate on producing their application-specific layers of the science gateway. Inthis way the redundancy of developing the same DCI access mechanisms by manydifferent communities can be avoided. For the same reason, the development timeof SG instances can be significantly reduced, and there is a good chance that withinthe lifetime of the requiring project the science gateway can be built and providedas a production service. Another advantage is that the cost of producing such agateway is usually lower than in the case of the first approach. Since the gateway isa customization of an existing robust and scalable SG framework, the resultingproduction SG instance will also be robust and scalable. The sustainability of suchan SG instance is more certain than in the case of the first method since the large setof user communities involved in the adaptation and maybe further development ofthe framework represents a strong lobbing force to get further funding for main-tenance and development. It is also important that the gateway framework shouldbe open source and should involve community members in the development andmaintenance of the code. When the SG framework is sustainable, the community ofthe SG instance should maintain only a narrow set of user-specific services, and therest should be maintained by the SG framework developer community.

    1.2 Architecture of Science Gateways

    In both SG frameworks and SG instances two main components should bedistinguished:

    Front-end Back-end

    The role of the front-end is to provide the necessary user interface. In the case ofSG instances the interface is very much customized to the particular needs of thescientific user community. For example, chemists and biologists would like to seevisualization tools for molecules, whereas meteorologists need various types of mapvisualizations. The major focus of SG instances should be to develop this kind ofspecialized user interface to provide the right front-end for the target user com-munity. In the case of an SG framework the interface is typically more generic,providing user interface for generic features that might be needed for many differentuser communities and SG instances. For example, these could include user inter-faces for certificate management, file and data management, job submission,workflow creation and management, monitoring, etc. These generic parts of thefront-end could also be reused from an SG framework for the implementation ofcustomized SG instances. Quality requirements for a front-end are as follows:

    1 Introduction to Science Gateways and Science Gateway Frameworks 5

  • User-friendliness: provides intuitive user interface. Efficiency: provides fast response time even for complex user requests. Scalability: provides fast response time even for a large number of simultaneous

    user requests. Robustness: keeps working under any circumstances and recovers gracefully

    from exceptions. Extensibility: it must be easy to extend with new interfaces and functionalities.

    Notice that the main difficulty of building an SG front-end is not the prettydesign of the user interface but the achievement of the quality requirements listedabove. These become really important when the SG is used in production by a largenumber of scientists. Gateways created from scratch in many cases reach only theprototype level, or if they go into production, they face a lot of difficulties to meetthese quality requirements.

    The back-end provides the necessary DCI access mechanisms that are needed torealize the typical gateway functionalities like certificate management, file and datamanagement, job submission, workflow management, monitoring, etc., for variousDCIs. The back-end is typically generic, i.e., the same back-end can be used bymany different SG instances. Therefore the main advantage of developing SGframeworks and deriving the SG instances for them appears in the field of devel-oping the back-ends. If a generic back-end is developed in a robust way by an SGframework, all the SG instances derived from it can take the benefit from itsrobustness with no or little development effort. A good back-end can supportseveral DCI types (clusters, grids, desktop grids, clouds, etc.); therefore one of thedistinguishing features of SG frameworks is how many different DCIs they cansupport and how easily these DCIs can be accessed via the functionalities providedby the SG framework.

    Quality requirements for a back-end are similar to the front-end requirements,although their meaning could be quite different since the front-end serves users andthe back-end manages jobs and service calls:

    Efficiency: provides fast response time even for complex submitted jobs orservice calls.

    Scalability: provides fast response time even for a very large number (even formillions) of simultaneously submitted jobs or services calls.

    Robustness: keeps working under any circumstances and recovers gracefullyfrom exceptions.

    Flexibility: ability to manage many different types of DCIs and many concreteinstances of DCIs.

    Extensibility: it must be easy to extend with the support of new types of DCIs,with new concrete DCIs, and new back-end services.

    6 P. Kacsuk

  • 1.3 Functionalities of Science Gateways

    A science gateway can have many different functionalities. In fact, each usercommunity typically requires some new functionalities according to their specificneeds compared to the original, generic functionalities of the SG framework fromwhich they derive their own SG instance. Therefore here we show only the typicalfunctionalities that are commonly used by many different SG frameworks and SGinstances. These functionalities can be grouped according to their relationship to theusers and the DCIs:

    DCI-oriented functionalities:

    Certificate proxy management Job submission Data management Workflow management Monitoring the usage of DCIs Accounting the usage of DCIs

    User-oriented functionalities:

    User certificate management Workflow editing Job and workflow execution progress visualization Scientific visualization where requested User collaboration support

    In many DCIs accessing resources requires user authentication, and, unfortu-nately, different DCIs require different types of authentication mechanisms. If agateway is to support access to different kinds of DCIs, then it should support all theuser authentication methods required by the different DCIs. These methods include,for example, X509 certificate management and certificate proxy management.Chapter 6 WS-PGRADE/gUSE security describes the major authenticationmethods and their support in the WS-PGRADE/gUSE SG framework.

    Users typically want to submit jobs to the different DCIs, and hence the jobsubmission mechanism is a basic service in every science gateway. Again, differentDCI types implement different types of job submission protocols, and a genericgateway framework should be prepared to handle all these different kinds of protocols.The WS-PGRADE/gUSE SG framework contains a generic job submission servicethat can submit jobs to all the major DCI types. This service, called the DCI Bridge, isdescribed in detail in Chap. 4. Other SG frameworks also support access to severalDCIs, but in a much more restricted way than is supported in WS-PGRADE/gUSE.

    Jobs require access to data storage when they are executed. In many cases thedifferent DCIs apply different storage access protocols, which also cause difficultiesfor gateway developers who must cope with the variety of these protocols. Exe-cuting a job in a certain DCI can require access to data storage maintained in otherDCIs. To solve these problems, SCI-BUS developed the Data Avenue service that

    1 Introduction to Science Gateways and Science Gateway Frameworks 7

    http://dx.doi.org/10.1007/978-3-319-11268-8_6http://dx.doi.org/10.1007/978-3-319-11268-8_4

  • enables access to the most important storage types, even if jobs running in otherDCIs. This service and its use in the WS-PGRADE/gUSE SG framework isexplained in Chap. 5. Other SG frameworks typically lack this generic approach ofaccessing various types of data storages. Recently, the EUDAT EU FP7 project alsostarted to develop a generic solution for this problem (Riedel 2013).

    Beyond simple job submissions and service calls, applications solving complexproblems like scientific simulations require the creation and execution of scientificworkflows. To support these more advanced types of applications, SG frameworksshould provide workflow editing and execution services. Recently, more and moreSG frameworks have such workflow support. The WS-PGRADE/gUSE SGframework was designed from the very beginning to include workflow manage-ment. This capability of WS-PGRADE/gUSE is described in detail in Chap. 3.

    As jobs and workflows are executed in the various DCIs, users should be able toobserve how their execution is progressing. Therefore the gateway back-end shouldbe able to collect execution monitoring information from the DCIs, and the front-end component should be able to present this information to the users in a com-prehensive way. This is such a basic requirement that it is typically supported byevery SG framework. On the other hand providing accounting information on howmany resources for what price have been used during job and workflow execution isalso an important service of science gateways but is frequently neglected and notsupported. The WS-PGRADE/gUSE SG framework provides such accountingservice for commercial clouds when it is used together with the CloudBrokerplatform. This facility is explained in Chap. 6.

    User collaboration is needed both inside a user community and among severaluser communities. WS-PGRADE/gUSE provides an internal application repositoryfor collaboration inside a user community, and access to the SHIWA WorkflowRepository in order to help external collaboration among different user communi-ties. These services of WS-PGRADE/gUSE are described in Chap. 9.

    Tools for scientific visualization are typically provided by SG instances and notby SG frameworks since scientific visualization is application-dependent. Thereforesuch tools and services are described in Chaps. 1015, where the SG instancesderived from the WS-PGRADE/gUSE SG framework are introduced.

    1.4 Developers and Users of Science Gateways

    People involved in the creation, operation, and usage of gateways have differentroles, and a good gateway should provide support for all the roles.

    The first category is the gateway developers, who develop the gateways. Herewe have to distinguish SG framework developers and SG instance developers. Theprimary goal of SG framework developers is to develop the SG framework back-end in a portable way that enables SG instance developers to use it without mod-ifications. Their second goal is to develop the generic part of the front-end, andobviously also generate and maintain up-to-date documentation. Beyond these tasks

    8 P. Kacsuk

    http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_3http://dx.doi.org/10.1007/978-3-319-11268-8_6http://dx.doi.org/10.1007/978-3-319-11268-8_9http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_15

  • directly related to the gateway framework development, they should also provideuser support, including the evaluation of feature requests and further developing thegateway framework according to the new functionality requirements. Developingan SG framework requires very deep understanding of the underlying infrastruc-tures and the required web technologies. Therefore, to develop an SG frameworkthe developer community should invest in a long-running and constant learningprocess, which is very costly. As a result, there are only very few SG frameworks,and the number of gateway framework developers is also very low.

    The main task of the SG instance developers is either to customize an existingSG framework for their user community, i.e., to extend the SG framework with newapplication-specific interfaces, or to develop the SG instance from scratch. In theformerand recommendedcase SG instance developers can concentrate on theapplication domain-specific features of their SG. In the latter case, they need tolearn all those aspects of the underlying DCI middleware and web technologies thatare needed for the SG framework developers. As a result, they usually create the SGinstance much more slowly and with more efforts than those SG instance devel-opers who choose the customization development method. The number of SGinstance developers is about an order of magnitude larger than the number of SGframework developers, but in the ideal case, the difference would be even twoorders of magnitude. In the case of the WS-PGRADE/gUSE framework we getclose to this ideal case, since the framework has been adapted by more than 90different communities who develop SG instances based on the framework. The WS-PGRADE/gUSE framework helps this customization process by providing a specialAPI called Application Specific Module (ASM) API by which existing workflowscan easily be embedded in application specific portlets (see details in Chap. 3).

    Once the SG frameworks or SG instances are developed, they should be set upand operated. Here the role of gateway operators comes into play. They should beable to deploy, configure, run, and maintain the gateway service for the usercommunities. For these purposes, good gateways provide complete and up-to-datedocumentation, installation and configuration wizards, user management supportinterfaces, etc. These can be developed in a generic way within an SG frameworkand just be used (and maybe adapted) by SG instances.

    Once the SG frameworks or SG instances are set up and operating, they areready for use. We must distinguish two user categories: end-users and applicationdevelopers. In fact, they need different front-ends. The application developersdevelop DCI applications, for example, new workflows, which are used by the end-users. The application developers are typically IT people or scientists (chemists,etc.) with good understanding of the underlying IT technology. They should haverelatively detailed information on the underlying DCIs, while this information couldpartially or completely be hidden from the end-users. Therefore, the SG frameworksare primarily targeted to the application developers, and the SG instances aretypically designed for the end-users. Of course, this typical usage does not excludethe possibility that some SG frameworks can be used by end-users and SG instancescan provide front-ends necessary for DCI application development. However, agood practice is the clear separation and support of these two user types, and WS-

    1 Introduction to Science Gateways and Science Gateway Frameworks 9

    http://dx.doi.org/10.1007/978-3-319-11268-8_3

  • PGRADE/gUSE supports this concept. It provides a full-scale user interface forworkflow developers (called power users) that enables the fast and efficientdevelopment of DCI-oriented workflows. On the other hand, its end-user interfaceconcept enables the automatic creation of an end-user interface with limitedfunctionality that can be easily used by scientists who do not know the underlyingDCIs. This aspect of the WS-PGRADE/gUSE gateway framework is described in amore detailed way in Chaps. 2 and 8.

    1.5 The SCI-BUS Project

    As written in Sect. 1.1, the recommended way to develop SG instances is thecustomization methodology. This approach is followed by the SCI-BUS (ScienceGateway Based User Support, https://www.sci-bus.eu) EU FP7 project thatdevelops the WS-PGRADE/gUSE SG framework and also a customization tech-nology by which a large number of scientific user communities can easily adapt theframework and develop their SG instance. The structure of the project and therelated applied technologies are shown in Fig. 1.1.

    Fig. 1.1 SG instance development methodology and required services (with permission ofCloudBroker GmbH)

    10 P. Kacsuk

    http://dx.doi.org/10.1007/978-3-319-11268-8_2http://dx.doi.org/10.1007/978-3-319-11268-8_8https://www.sci-bus.eu

  • The central component of the project is the WS-PGRADE/gUSE gatewayframework. This is the basis of all the SG instances developed by project partners,subcontractors, and associated partners. During the project the WS-PGRADE/gUSEframework has been significantly further developed, including the following mainfeatures:

    1. Cloud integration via the CloudBroker Platform (this is described in detail inChap. 7) to access a large variety of commercial and academic clouds

    2. Direct cloud integration to access academic clouds (see details in Chap. 4)3. To provide robot certificates (see details in Chap. 6)4. To provide an efficient and flexible data management system over various DCIs

    (see details in Chap. 5)5. To extend the workflow management system with workflow debugging capa-

    bilities (see details in Chap. 2)

    Of course, not only was the functional extension a major goal in the project but italso made the framework robust and efficient in the sense that a large number ofusers (in the range of 1001,000) could simultaneously use it with short responsetimes and the gateway should be able to handle even millions of simultaneous jobsubmissions. Another important aspect was the improvement of the gatewayinstallation procedure, for which an installation and a service wizard have beendeveloped. The documentation of the framework was also significantly improved. Itcontains 14 documents in the following 4 series:

    1. Blue series for end-users (2 documents)2. Green series for gateway administrators (5 documents)3. Red series for workflow developers (3 documents)4. Orange series for general purposes (4 documents)

    The gateway framework is published at SourceForge (https://sourceforge.net/projects/guse/) and has become very popular. There have been over 15,000downloads as of the writing this book. The user forum is very active, and nearly 200different topics are discussed by a large number of participants. The furtherdevelopment of the WS-PGRADE/gUSE gateway framework will not be stoppedwhen the SCI-BUS project is over at the end of September 2014. The project hasalso developed a sustainability plan that, together with the large number of users,guarantees the further progress of the WS-PGRADE/gUSE gateway framework. Aroadmap of development goals with their expected deadline is found on the SCI-BUS web page (http://www.sci-bus.eu), which will be maintained even after SCI-BUS project is finished.

    As Fig. 1.1 shows, 11 communities as project partners have develop application-specific SG instances based on the WS-PGRADE/gUSE gateway framework. TheseSG instances are the following:

    1. Swiss proteomics gateway2. MoSGrid gateway (see details in Chap. 11)3. Statistical seismology gateway (see details in Chap. 12)

    1 Introduction to Science Gateways and Science Gateway Frameworks 11

    http://dx.doi.org/10.1007/978-3-319-11268-8_7http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_6http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_2https://sourceforge.net/projects/guse/https://sourceforge.net/projects/guse/http://www.sci-bus.euhttp://dx.doi.org/10.1007/978-3-319-11268-8_11http://dx.doi.org/10.1007/978-3-319-11268-8_12

  • 4. Business process gateway5. Computational neuroscience gateway developed by Amsterdam Medical Center

    (see details in Chap. 10)6. Blender rendering gateway7. VisIVO astrophysics gateway (see details in Chap. 13)8. PireGrid commercial community gateway9. Software building and testing gateway (see details in Chap. 19)

    10. DocumentArchiving Gateway for citizen web community (see details in Chap. 19)11. Heliophysics gateway (see details in Chap. 14)

    Subcontractors of SCI-BUS have also developed SG instances as listed below:

    1. Science gateway for condensed matter physics community (see details in Chap. 15)2. Weather Research and Forecasting science gateway developed by University of

    Cantabria3. Academic Grid Malaysia Scientific Gateway4. AdriaScience Gateway developed by Ruer Bokovi Institute5. Metal physics science gateway of the G.V. Kurdyumov Institute for Metal

    Physics6. ChartEX Gateway developed by Leiden University

    The condensed matter physics gateway is described in detail in Chap. 15 but theother subcontractors gateways are not detailed in this book due to the size limi-tations of the book. The interested reader can find details of these gateways in thepublic deliverable D6.2 of SCI-BUS under the title Report on developed and

    Fig. 1.2 SG instance developer communities using SCI-BUS technology (with permission ofElisa Cauh Martn)

    12 P. Kacsuk

    http://dx.doi.org/10.1007/978-3-319-11268-8_10http://dx.doi.org/10.1007/978-3-319-11268-8_13http://dx.doi.org/10.1007/978-3-319-11268-8_19http://dx.doi.org/10.1007/978-3-319-11268-8_19http://dx.doi.org/10.1007/978-3-319-11268-8_14http://dx.doi.org/10.1007/978-3-319-11268-8_15http://dx.doi.org/10.1007/978-3-319-11268-8_15

  • ported applications and application-specific gateways that is accessible at the SCI-BUS web page. Figure 1.2 shows those communities who have some relationshipwith SCI-BUS to build their science gateway instances. Beyond these communitiesthere are many others without any relationship with SCI-BUS that also intensivelyuse the SCI-BUS gateway technology.

    1.6 Collaboration-Based SG Instance DevelopmentMethodology

    SCI-BUS technology helps the collaboration among the different types of peopledeveloping and using the gateway technology. As already mentioned, two differentlevel repositories help collaboration between workflow developers and workflowusers. Inside a community using the same gateway, the internal gUSE ApplicationRepository can be used for workflow developers to publish the ready-to-useworkflows, and scientists in the end-user mode of the gateway can import theseready-to-use workflows from the Application Repository. After parameterizing theworkflows they can be executed in the target DCIs. Of course, the ApplicationRepository can also be used to support collaboration between workflow developers.A workflow stored in the Application Repository can be taken by any workflowdeveloper belonging to the same gateways community and can extend or furtherdevelop the imported workflow. Similar activities are supported among workflowdevelopers and end-users belonging to different gateway communities via theSHIWA Workflow Repository. Using the coarse-grained workflow interoperabilitytechnique developed in the SHIWA project, this repository and the WS-PGRADE/gUSE gateway enable collaboration even in cases when the different communitiesuse different workflow systems (see details in Chap. 8).

    Collaboration is supported not only among workflow developers and workflowusers but also among gateway developers. For this purpose SCI-BUS developed andset up the SCI-BUS Portlet Repository. This enables the sharing of Liferay portletsbetween SG instance developers (see details in Chap. 9). This sharing of existingportlets can further accelerate the customization process of gateway frameworks.

    In fact, these repositories, the SG framework stored in the open sourceSourceForge repository and the customization concept of SCI-BUS enable theintroduction of a collaborative SG instance development methodology. Figure 1.3shows the services required for the SG instance development methodology as wellas the different types of developers and users related to the SG instance. The stepsin developing an SG instance according to this SG instance development meth-odology are as follow:

    Step 1: An SG instance developer downloads the WS-PGRADE/gUSE frame-work from SourceForge and deploys it as a general purpose science gateway. Itcontains the major functionalities to develop and run workflows by the work-flow developers and to run workflows by the end-user scientists.

    1 Introduction to Science Gateways and Science Gateway Frameworks 13

    http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_9

  • Step 2: An SG instance developer downloads several domain-specific portletsfrom the SCI-BUS Portlet Repository that are needed for the target user com-munity. At this stage, without any development the community already has adomain-specific gateway. Although it may not be perfectly what they want, theusers can start to work with it.

    Step 3: An SG instance developer downloads several domain-specific work-flows from the SHIWA Workflow repository and develops new domain-specificportlets on top of them. At this stage, without any workflow development thecommunity already has an improved domain-specific gateway; although it is notperfectly what they want, the users can have more portlets to work with. For thesake of mutual collaboration, the SG instance developer uploads the newportlets into the SCI-BUS Portlet Repository so other communities can takeadvantage of using these new portlets.

    Step 4: The workflow developer develops new domain-specific workflows anduploads them to the SHIWA Workflow Repository. She might download otherworkflows from the SHIWARepository and use them to develop new workflows.

    Step 5: An SG instance developer develops new domain-specific portlets on topof the workflows developed in step 4. At this stage the domain-specific gatewayis extended with new portlets specifically designed according to the needs of thiscommunity. For the sake of mutual collaboration, the SG instance developeruploads the new portlets into the SCI-BUS Portlet Repository so other com-munities can take advantage of using these new portlets.

    Fig. 1.3 Collaboration-based SG instance development methodology and required services

    14 P. Kacsuk

  • Of course, steps 25 can be repeated in as many times as required. Everyiteration results in a further improved and extended SG instance for the usercommunity.

    1.7 How to Read This Book?

    The main goal of the book is to transfer the knowledge of building science gate-ways for those communities who would like to develop their own science gatewayinstance in the future or who would like to extend or improve their existing sciencegateway with new functionalities, services, portlets, and workflows. The booksummarizes those technologies that we have developed in the SCI-BUS projectconcerning building general-purpose science gateway frameworks as well as cus-tomizing the framework toward domain- and application-specific science gatewayinstances. Since workflows play more and more important roles in IT-based sci-entific research, we also show how the SCI-BUS workflow technology can be usedand extended with other workflows by using the workflow interoperability tech-nology developed in the EU FP7 SHIWA project and currently actively used in theEU FP7 ER-Flow project (see Chap. 8).

    The book is divided into three main parts. After the current chapter, the first partdescribes the core SCI-BUS gateway framework technology, WS-PGRADE/gUSE.Chapter 2 gives a generic introduction to WS-PGRADE/gUSE science gatewayframework technology and summarizes the main features of WS-PGRADE/gUSE.Since all the other chapters are built on the knowledge described in this chapter it isrecommended that everyone read this chapter. Similarly, reading of Chap. 8 is alsorecommended for every reader since it explains all the major use-case scenarioswhere the gateway framework can be applied.

    Chapter 4 describes the DCI Bridge service that enables access to a large set ofDCIs via a common interface based on the OGF standard BES. Since any workflowsystems and existing gateways can be extended to be able to exploit this service,any reader who is interested in extending their workflow system and gateway withaccess to such a large set of DCIs should read this chapter. Similarly, Chap. 7describes the Data Avenue service that enables file transfer between different DCIstorages having different protocols. This is a very generic service that can be usedindependently from WS-PGRADE/gUSE, and hence readers who would like toextend their workflow manager and gateway to exploit this service should read thischapter.

    The following chapters should be read by those readers who are interested inlearning more on the following aspects of WS-PGRADE/gUSE:

    Workflow concepts of WS-PGRADE/gUSE (Chap. 3) Executing WS-PGRADE workflows in various Distributed Computing Infra-

    structures and the DCI Bridge service (Chap. 4). Security aspects of WS-PGRADE/gUSE (Chap. 6)

    1 Introduction to Science Gateways and Science Gateway Frameworks 15

    http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_2http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_7http://dx.doi.org/10.1007/978-3-319-11268-8_3http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_6

  • Integration of WS-PGRADE/gUSE and clouds via the CloudBroker Platform(Chap. 7)

    Data management inWS-PGRADE/gUSE and the Data Avenue service (Chap. 5) Usage scenarios by WS-PGRADE/gUSE (Chap. 8) Community activity support in WS-PGRADE/gUSE via the SHIWA technology

    and ER-Flow experience (Chap. 9)

    The second part of the book contains concrete use cases that describe how theWS-PGRADE/gUSE gateway framework was customized by SCI-BUS projectpartners and subcontractors to develop a domain-specific science gateway instance.These chapters are completely independent from each other but they use differentfeatures of the WS-PGRADE/gUSE framework; hence they are built on informationdescribed in various chapters in the first part of the book. These chapters are veryuseful for those readers who also want to develop a domain-specific sciencegateway instances because here they can find many good ideas on how to adaptWS-PGRADE/gUSE gateway for their own purposes.

    Some further gateway instance examples that were developed in other EU FP7projects like agINFRA, DRIHM, and VERCE are shown in Chap. 17 in the thirdpart of the book. Chapter 18 even shows how different user communities can cometogether and create a science gateway alliance based on the same gateway tech-nology. Notice that there are many more science gateway instances developedbased on the WS-PGRADE/gUSE gateway framework, but due to the restrictedsize of the book those are not described here. However, the interested reader canfind those further use cases via the SCI-BUS web page. Part 3 also describes somefurther application areas of the SCI-BUS gateway technology. These includeeducational and commercial uses. Those readers who are interested in the use ofSCI-BUS technology in university courses are recommended to read Chap. 16. Thecommercial use of SCI-BUS technology is also possible and was exploited byseveral companies in the SCI-BUS project; there are other companies that arecurrently working on the commercial applications inside the EU FP7 CloudSMEproject. These commercial applications of the SCI-BUS technology are described inChap. 19.

    The book ends with a short Conclusions and outlook in which the future of theSCI-BUS technology is covered.

    1.8 Conclusions

    The goal of the current book is to describe the WS-PGRADE/gUSE SG framework,its customization technology and to show use cases for several user communitieswhere this technology was successfully applied to create application-specific SGinstances. Within the SCI-BUS project 11 partner user communities establishedtheir own SG instances as production services, another 6 communities as sub-contractors have developed their gateways and 7 associated partners also use the

    16 P. Kacsuk

    http://dx.doi.org/10.1007/978-3-319-11268-8_7http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_9http://dx.doi.org/10.1007/978-3-319-11268-8_17http://dx.doi.org/10.1007/978-3-319-11268-8_18http://dx.doi.org/10.1007/978-3-319-11268-8_16http://dx.doi.org/10.1007/978-3-319-11268-8_19

  • SCI-BUS gateway technology. The WS-PGRADE/gUSE SG framework is an opensource software that can be downloaded from SourceForge. The number ofdownloads is over 15,000 as of writing this book and constantly grows. There aremore than 90 SG instances are deployed world-wide as shown by the google map athttps://guse.sztaki.hu/MapService/. The technology therefore matured enough to beused by large number of user communities and hence the significance of this bookis to disseminate this know-how for the scientific communities who are interested inbuilding gateways based on such a matured technology that SCI-BUS can provide.

    1 Introduction to Science Gateways and Science Gateway Frameworks 17

    https://guse.sztaki.hu/MapService/

  • Chapter 2Introduction to the WS-PGRADE/gUSEScience Gateway Framework

    Tibor Gottdank

    Abstract WS-PGRADE/gUSE is a gateway framework that offers a set of high-level grid and cloud services by which interoperation between grids, clouds, andscientific user communities can be achieved. gUSE is also a workflow system thatenables scientific communities to compose and execute a series of computational ordata manipulation steps in a scientific application on Distributed ComputingInfrastructures (DCIs). The current chapter summarizes the most important featuresof WS-PGRADE/gUSE.

    2.1 Introduction

    The Grid and Cloud User Support Environment (gUSE), also known as WS-PGRADE (Web ServiceParallel Grid Run-time and Application DevelopmentEnvironment)1/gUSE, is a renowned European science gateway (SG) framework2

    that provides users with convenient and easy access to grid and cloud infrastruc-tures as well as to data storage.

    WS-PGRADE/gUSE provides a specific set of enabling technologies (Lovas2013) as well as front-end and back-end services that together build a generic SG.An enabling technology provides the required software stack to develop SGframeworks and SG instances (that is, to provide a simple user interface that istailored to the needs of a given scientific community). Typical examples of suchenabling technologies are: web application containers (Tomcat, Glassfish, etc.),portal or web application frameworks (Liferay, Spring, etc.), database management

    T. Gottdank (&)Laboratory of Parallel and Distributed Systems, Institute for Computer Science and Control,Hungarian Academy of Sciences, Budapest, Hungarye-mail: [email protected]

    1 WS-PGRADE is the graphical user interface of gUSE. See a detailed description of WS-PGRADE in Sect. 2.5.2 gUSE is the Most Visited SG Framework and Most Visited Workflow System by the EGIApplications Database (https://appdb.egi.eu/).

    Springer International Publishing Switzerland 2014P. Kacsuk (ed.), Science Gateways for Distributed Computing Infrastructures,DOI 10.1007/978-3-319-11268-8_2

    19

    https://appdb.egi.eu/

  • systems (MySQL, etc.), and workflow management systems (WS-PGRADE/gUSE,MOTEUR, etc.). With help of gUSE, scientific communities can compose andexecute a series of computational or data manipulation steps in a scientific appli-cation on Distributed Computing Infrastructures (DCIs).

    This chapter introduces the key features, the architecture, the common user-levelcomponents, and the customization modes of the gUSE framework.

    2.2 What gUSE Offers

    gUSE provides a transparent, web-based interface to access distributed resources,extended by a powerful general-purpose workflow editor and enactment system,which can be used to compose scientific applications into data-flow based workflowstructures (Balasko 2013a). gUSE is the only SG framework in Europe that offers acomprehensive and flexible workflow-oriented framework that enables the devel-opment, execution, and monitoring of scientific workflows. In addition, the nodesof these workflows can access a large variety of different DCIs, including clusters,grids, desktop grids, and clouds (Kacsuk 2012).

    This SG framework can be used by National Grid Initiatives (NGIs) to supportsmall user communities who cannot afford to develop their own customized SG.The gUSE framework also provides two Application Programming Interfaces(APIs), namely the Application-Specific Module API and the Remote API, to createapplication-specific SGs according to the needs of different user communities.

    A relevant requirement in the development of gUSE was to enable the simulta-neous handling of a very large number of jobs, even in the range of millions, withoutcompromising the response time at the user interface. In order to achieve this level ofconcurrency, the workflow management back-end of gUSE is implemented based onthe web service concept of Service Oriented Architecture (SOA) (Kacsuk 2012).

    2.3 Key Features

    Among many other features, the main five capabilities of gUSE are as follows:

    1. gUSE is a general-purpose SG framework under which users can access morethan twenty different DCIs3 via the DCI Bridge service, and six different datastorage types (HTTP, HTTPS, GSIFTP, S3, SFTP, and SRM) via the DataAvenue service. Both DCI Bridge and Data Avenue were developed as part ofthe WS-PGRADE/gUSE service stack, but they can also be used as independentservices enabling their use from other types of gateways and workflow systems.

    2. WS-PGRADE/gUSE is a workflow-oriented system. It extends the DirectedAcyclic Graph (DAG)-based workflow concept with advanced parameter sweep

    3 The full list of supported DCIs is in Sect. 2.4.

    20 T. Gottdank

  • (PS) features by special workflow nodes, condition-dependent workflow exe-cution, and workflow embedding support. Moreover, gUSE extends the concreteworkflow concept with the concepts of abstract workflow, workflow instance,and template (see the details in Sect. 2.5).

    3. WS-PGRADE/gUSE supports the development and execution of workflow-based applications. Users of gUSE define their applications as workflows. Theycan share their applications among each other by exporting them to the internalApplication Repository. Other users can import such applications and execute ormodify them in their user space.

    4. gUSEsupports the fast development ofSG instances by a customization technology.gUSE can serve different needs, according to the community requirements about thecomputational power, the complexity of the applications, and the specificity of theuser interface to fit the community needs and to meet its terminology.

    5. The most important design aspect of gUSE is flexibility. Flexibility of gUSE isexpressed

    in exploiting parallelism: gUSE enables parallel execution inside a workflownode as well as among workflow nodes. It is possible to use multiple instancesof the same workflow with different data files. See details in Chap. 3.

    in the use of DCIs: gUSE can access various DCIs: clusters, cluster grids,desktop grids, supercomputers, and clouds. See details in Chap. 4.

    in data storage access: gUSE workflow nodes can access different datastorage services in different DCIs via the Data Avenue Blacktop service.Therefore, the file transfer among various storages and workflow nodes canbe handled automatically/transparently. See details in Chap. 5.

    in security management: For secure authentication it is possible to use userspersonal certificates or robot certificates. See details in Chap. 6.

    in cloud access: A large set of different clouds (Amazon, OpenStack, Op-enNebula, etc.) can be accessed by WS-PGRADE/gUSE either directly (seeChap. 4) or via the CloudBroker Platform (see Chap. 7).

    of supported gateway types: gUSE supports different gateway types: general-purpose gateways for national grids (e.g., for Greek and Italian NGIs), general-purpose gateways for particular DCIs (e.g. EDGI gateway), general-purposegateways for specific technologies (e.g. SHIWA gateway for workflowsharing and interoperation, see Chap. 9) and domain-specific science gatewayinstances (e.g. Swiss proteomics portal, MoSGrid gateway, Autodock gate-way, Seizmology gateway, and VisIVO, see Part 2 of the book).4 This aspectof WS-PGRADE/gUSE is described in detail in Sect. 2.6 and in Chap. 8.

    in use of workflow systems: Users can access from the SHIWA WorkflowRepository many workflows written in various workflow languages and usethese workflows as embedded workflows inside WS-PGRADE workflownodes. This feature of WS-PGRADE/gUSE gateways is described in detail inChap. 9.

    4 The domain specific science gateways are discussed in Part 2.

    2 Introduction to the WS-PGRADE/gUSE Science Gateway Framework 21

    http://dx.doi.org/10.1007/978-3-319-11268-8_3http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_6http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_7http://dx.doi.org/10.1007/978-3-319-11268-8_9http://dx.doi.org/10.1007/978-3-319-11268-8_8http://dx.doi.org/10.1007/978-3-319-11268-8_9

  • 2.4 Architectural Overview

    The main goal of designing the multitier architecture of WS-PGRADE/gUSE wasto enable versatile access to many different kinds of DCIs and data storage bydifferent kinds of user interfaces. This access can be technically performed throughthe DCI Bridge job submission service which is in the bottom within the gUSEarchitectural layers as shown in Fig. 2.1, and via the Data Avenue Blacktop servicethat is an independent service provided by SZTAKI (see Chap. 5).

    DCI Bridge5 is a web service-based application providing standard access tovarious DCIs. It connects through its DCI plug-ins to the external DCI resources.When a user submits a workflow, its job components are submitted transparentlyinto the various DCI systems via the DCI Bridge service using its standard OGSABasic Execution Service 1.0 (BES) interface. As a result, the access protocol and allthe technical details of the various DCI systems are totally hidden behind the BESinterface. The job description language of BES is the standardized Job SubmissionDecription Language (JSDL). See further details on DCI Bridge in Chap. 4.

    The DCIs supported by DCI Bridge are the followings:

    Clusters (PBS, LSF, MOAB, SGE) Grids (ARC, gLite, GT2, GT4, GT5, UNICORE) Supercomputers (e.g., via UNICORE) Desktop grids (BOINC) Clouds (via CloudBroker Platform, GAE, as well as EC2-based Cloud Access).

    The middle tier of the gUSE architecture contains the high-level gUSE services.The Workflow Storage stores every piece of information that is needed to define aworkflow (graph structure description, input files pointers, output files pointers,

    Fig. 2.1 The three-tier architecture of WS-PGRADE/gUSE

    5 DCI Bridge is discussed in Chap. 4.

    22 T. Gottdank

    http://dx.doi.org/10.1007/978-3-319-11268-8_5http://dx.doi.org/10.1007/978-3-319-11268-8_4http://dx.doi.org/10.1007/978-3-319-11268-8_4

  • executable code, and target DCI of workflow nodes) except the input files of theworkflow. The local input files and the local output files created during workflowexecution are stored in the File Storage. TheWorkflow Interpreter is responsible forthe execution of workflows, which are stored in the Workflow Storage. TheInformation System holds information for users about workflows running and jobstatus. Users of WS-PGRADE gateways work in isolated workspace, i.e., they seeonly their own workflows. In order to enable collaboration among the isolatedusers, the Application Repository stores the WS-PGRADE workflows in one oftheir five possible stages. (Physically all the five categories are stored as zip files.)The five categories of stored workflows are as follows, and the collaboration amongthe gateway users is possible via all these categories:

    Graph (or abstract workflow) containing information only on the graph structureof the workflow.

    Workflow (or concrete workflow) containing information both on the graphstructure and on the configuration parameters (input files pointers, output filespointers, executable code and target DCI of workflow nodes).

    Template: a workflow containing information on every possible modifiableparameter of the workflow if they can be changed by the users or not. These playan important role in the automatic generation of executable workflows in theend-user mode of a WS-PGRADE/gUSE gateway (Sect. 2.6).

    Application is a ready-to-use workflow that contains all the embedded work-flows, too. It means that all the information needed to execute this workflowapplication is stored in the corresponding zip file.

    Project is a workflow that is not completed yet and can be further developed bythe person who uploaded it into the Application Repository or by another person(so collaborative workflow development among several workflow developers issupported in this way).

    At the top of the three-tier structure, the presentation tier provides WS-PGRADE, the graphical user interface of the generic SG framework. All func-tionalities of the underlying services are exposed to the users by portlets residing ina Liferay portlet container, which is part of WS-PGRADE. This layer can be easilycustomized and extended according to the needs of the SG instances to be derivedfrom gUSE. The next section introduces the essential user-level elements of WS-PGRADE.

    2.5 Introduction to WS-PGRADE

    Most users of gUSE come into contact with WS-PGRADE portal interface. TheWS-PGRADE portal is a Liferay technology-based web portal of gUSE. It can beaccessed via the major modern web browsers like Chrome, Firefox, etc.

    2 Introduction to the WS-PGRADE/gUSE Science Gateway Framework 23

  • 2.5.1 User Roles

    A member of a gUSE community can be a power user or an end-user in the WS-PGRADE portal environment. The power user or, in other words, workflowdeveloper develops workflows for the end-user scientists (chemists, biologists,etc.). The power user understands the usage of the underlying DCI and is able todevelop complex workflows. This activity requires editing, configuring, and run-ning workflows in the underlying DCI as well as monitoring and testing theirexecution in the DCIs. In order to support the work of these power users, WS-PGRADE provides a GUI through which all the required activities of developingworkflows are supported. When a workflow is developed for end-user scientists, itshould be uploaded to a repository where scientists can download from and executeit. In order to support this interaction between power users and end-users, gUSEprovides the earlier-mentioned Application Repository service in the gUSE ser-vices-tier, and power users can upload and publish their workflows for end-usersvia this repository.

    The end-user scientists are generally not aware of the features of the underlyingDCI nor of the structure of the workflows that realize the type of applications theyhave to run in the DCI(s). For these users, WS-PGRADE provides a simplified end-user GUI where the available functionalities are limited. Typically, end-user sci-entists can download workflows from the Application Repository, parameterizethem, and execute them on the DCI(s) for which these workflows were configuredto run. They can also monitor the progress of the running workflows via a sim-plified monitoring view. Any user of WS-PGRADE can login to the portal either asa power user or an end-user and according to this login she/he can see either thedeveloper view or the end-user view of WS-PGRADE.

    2.5.2 The Three-Phase Process of Workflow Development

    The WS-PGRADE power users (workflow developers) typically perform a three-phase operation sequence (workflow edit, workflow configure, and workflowexecution) as shown in Fig. 2.2. This step sequence covers the life-cycle of aworkflow. The life-cycle of a WS-PGRADE workflow is the following:

    1. During the editing phase, the user creates the abstract graph of the workflow.2. In the workflow configuring phase the executable, the input/output files, and the

    target DCI of the workflow nodes representing the atomic execution units of theworkflow are specified.

    3. Finally, in the submitting phase, the workflow is submitted resulting in aworkflow instance.

    The following section gives a detailed description about what happens in thethree phases.

    24 T. Gottdank

  • 2.5.2.1 The Editing Phase: Creation of the Workflow Graph

    The users construct their abstract workflows in this phase. Practically, it covers theworkflow graph creation by the interactive, online workflow graphical designer andvisualizer tool, the Graph Editor of WS-PGRADE (Fig. 2.3). The structure of WS-PGRADE workflows are represented by directed acyclic graphs (DAGs) as shownin Fig. 2.3. The DAG-based structure is the static skeleton of a workflow in WS-PGRADE. The nodes of the graph are abstract represenations of jobs (or servicecalls). Each job must have a name, and job names are unique within a givenworkflow. The job communicates with other jobs of the workflow through inputand output ports. An output port of a job connected to an input port of a differentjob is called a channel. Channels are directed edges of a graph, directed from theoutput ports toward the input ports. A single port must be either an input or anoutput port of a given job.

    A job in a workflow may have single and parametric input ports (which shouldbe specified in the next, the configuring phase of workflow development whenconcrete workflow is defined from abstract workflow). If a node has only singleinput ports, it is executed only once as a single instance processing the single inputsof every input ports. These nodes are called normal nodes. If a node has at least oneparametric input port it is called parametric node. If a parametric node has oneparametric input port, it will be executed in as many instances as the nukmber offiles that arrive on the parametric input port (Manual 2014).

    Fig. 2.2 The three generic workflow development phases in WS-PGRADE

    2 Introduction to the WS-PGRADE/gUSE Science Gateway Framework 25

  • A special but widely used workflow type also supported by WS-PGRADE/gUSE is the so-called parameter sweep or parameter study (PS) workflow, which istypically used for simulations where the same simulated workflow should beexecuted with many different input sets. DCIs are ideal for PS executions, andtherefore their most frequent usage scenario is performing such PS workflows.

    A typical PS-workflow contains three nodes (jobs) as shown by Fig. 2.4:

    1. the generator job generates the necessary parameter set;2. the parametric job (this is the call job in the example of Fig. 2.4) executes a

    specific application in as many instances as there were outputs generated by thegenerator job; and,

    3. the collector job collects and processes the results of the parametric job (forexample, by creating statistics based on the results of the different executions).

    If the output port of a generator job is connected to the parametric input port of aparametric job, then this parametric job will be executed for every file generated bythe generator job.

    Another useful charateristic of WS-PGRADE workflows is the possibility toembed workflows into workflow nodes. Thus, instead of running, for example, anexecutable inside a workflow node, another WS-PGRADE workflow may runinside the parent workflow node. To embed workflows, users need to applyworkflows created from the so-called templates. A template is a generic workflowwhere some configuration parameters are fixed. It can be used to serve as a base ofcreating the definitions of new workflows.6

    Fig. 2.3 Directed acyclic graph-based structure of a sample workflow in WS-PGRADE grapheditor

    6 The gUSE workflow concept is discussed in detail in Chap. 3.

    26 T. Gottdank

    http://dx.doi.org/10.1007/978-3-319-11268-8_3

  • 2.5.2.2 The Configuring Phase: Setting of Workflow Nodes

    The abstract workflow created in the first (editing) phase represents only thestructure (graph) of a workflow, but the semantics of the nodes are not defined yet.The abstract workflow can be used to generate various concrete workflows in theconfiguring phase.

    The concrete workflows are derived from abstract workflows by exactly speci-fying the workflow nodes and the DCIs where the various nodes should be exe-cuted. The concrete workflows generated from a certain graph can be differentconcerning the semantics of the workflow nodes, and the input and output filesassociated with ports.

    The node configuration includes:

    algorithm configuration: determines the functionality of a node; target DCI resource configuration: determines where this activity will be

    executed; port configuration: determines what input data the job needs and how the result

    (s) will be forwarded to the user or to other jobs as inputs.

    A typical node configuration in WS-PGRADE contains the following genericfunctions (see also Fig. 2.5):

    1. The properties of the node can be defined by the Job executable configurationfunction (see Fig. 2.5).

    2. The Port configuration function helps users to define the file arguments of therequired calculation. Each port configuration entry belonging to the current nodeis listed and can be made visible.

    3. By using the JDL/RSL function users can add or remove ads. Removing an adhappens by the association of an empty string to the selected key.

    4. The state and history of the node configuration can be checked by the Jobconfiguration history function.

    5. The error free state of the configuration can be checked by the Info function.

    Users can also define breakpoints for every node during workflow configurationin order to control the runtime execution of the created job instances. All instancesof a node marked by a breakpoint can be tracked in the job submission process(Manual 2014).

    Fig. 2.4 A graph of a sample parameter study (PS) workflow in WS-PGRADE

    2 Introduction to the WS-PGRADE/gUSE Science Gateway Framework 27

  • 2.5.2.3 The Submitting Phase: Workflow Execution

    After all the properties of the workflow have been set, it can be submitted, resultingin an instance of the workflow. A concrete workflow can be submitted several times(for example, in case of performance measurements), and every submission resultsin a new instance of the same concrete workflow.

    The execution of a workflow instance is data driven. The order of execution isforced by the graph structure: a node is activated (the associated job is submitted orthe associated service is called) when the required input data elements (usually afile, or a set of files) become available at every input port of the node. This nodeexecution is represented as the instance of the created job or service call. One nodecan be activated with several input sets, and each activation results in a new job orservice call instance. The job or service call instances also contain status infor-mation and in the case of successful termination the results of the calculation arerepresented in the form of data entities associated to the output ports of the cor-responding node.

    A typical submission scenario contains three main parts (Fig. 2.6): starting andmonitoring the submission as well as obtaining the submission result:

    1. A workflow submission can be started by clicking on the Submit button.2. Monitoring and observing the submission: the progress of workflow instance

    submission can be checked by the Details function. The result is the list of the

    Fig. 2.5 The main elements of the configuration phase in W