Chapter 19 - STFC Science Testbed

Embed Size (px)

Citation preview

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    1/22

    Chapter 19STFC Science Testbed

    Background

    For the STFC testbeds a methodology was developed in response to the challengeof digital preservation. This challenge lies in the need to preserve not only thedataset itself but also the ability it has to deliver knowledge to a future usercommunity. The preservation objective is dened by the knowledge that a dataset is capable of imparting to any future designated user community and has aprofound impact on the required preservation actions an archive must carry out.

    We sought to incorporate a number of analysis techniques tools and methodsinto an overall process capable of producing an actionable preservation plan forscientic data archives. The Implementation Plans

    19.1 Dataset Selection

    Several datasets are used in four scenarios in order to illustrate a number of impor-tant points. The datasets come from the archives located in STFC acquired frominstruments in other locations, illustrated in Fig. 19.1 , and for the study the MSTradar in Wales (Fig. 19.2 ) and Ionosonde data from many stations around theworld.

    19.2 Challenges Addressed

    The challenges addressed are that the physical phenomena about which the datais being collected are complex and specialist knowledge is needed to use the data.Moreover the data is in specialised formats and needs specialised software in orderto access it. Therefore the risks to the preservation of this data include

    345D. Giaretta, Advanced Digital Preservation , DOI 10.1007/978-3-642-16809-3_19,C Springer-Verlag Berlin Heidelberg 2011

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    2/22

    346 19 STFC Science Testbed

    Fig. 19.1 Examples of acquiring scientic data

    The MST Radar at Capel Dewi near Aberystwyth is the UKs most powerful andversatile wind-proling instrument. Data can currently accessed via the BritishAtmospheric Data Centre. It is a 46.5 MHz pulsed Doppler radar ideally suitedfor studies of atmospheric winds, waves and turbulence. It is run predominantlyin the ST mode (approximately 220 km altitude) for which MST radars areunique in their ability to give continuous measurements of the three dimensionalwind vector at high resolution (typically 23 min in time and 300 m in altitude).

    Fig. 19.2 MST radar site

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    3/22

    19.5 MST RADAR Scenarios 347

    the risk to the continued ability of users to understand and use it, especially sinceintimate knowledge of the instruments is needed, and, as we will see, this knowl-edge is not widespread. Much is contained in Web sites, which are probablyephemeral.

    the likelihood that the software currently used to access and analyse the data willnot be supported in the long term

    the provenance of the data depends on the what is in the head of the repositorymanager

    the funding of the archives is by no means guaranteed and yet, because muchknowledge is linked to key personnel, there is a risk that it will not be possible tohand over the data/information holdings fully to another archive.

    19.3 Preservation Aims

    After discussion with the archive managers and scientists it was agreed that thepreservation aims should be to preserve the ability of users to extract from the dataand understand in sufcient detail to use in scientic analyses, a number of keymeasurements.

    The knowledge base of the Designated Community will be somewhat lowerthan the experts, but still include the broad disciplinary understanding of thesubject.

    In order to be in a position to hand over its holdings, some example AIPs must beconstructed. Note that we do not attempt to construct AIPs for the whole archive,nevertheless the Representation Information and PDI we capture are applicable tomost of the individual datasets. With the ability to create AIPs, the archive wouldbe in a position to hand over its holdings to the next in the chain of preservation if and when this is necessary.

    19.4 Preservation Analysis

    We structure the analysis of the detailed work around constructing the AIP.A number of strategies were considered. Of those eliminated it is worth mention-

    ing that emulation was not regarded as useful by the archive scientists because itrestricted the ways in which they could use the data. Similarly transformation of thedata might be an option in future but only when other options became too difcult.In order to understand this, a preservation risk analysis was conducted which allowsthe archive managers to assess when this point is likely to arrive.

    19.5 MST RADAR Scenarios

    Four scenarios are detailed here, for two different instruments. IN the interests of brevity we list the actions carried out in each scenario, including where appropriatethe use of the Key Components and toolkits.

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    4/22

    348 19 STFC Science Testbed

    19.5.1 STFC1: Simple Scenario MST

    A user from a future designated user community should be able to extract thefollowing information from the data for a given altitude and time

    Horizontal wind speed and direction Wind sheer Signal Velocity Signal Power Aspect Correlated Spectral Width

    MST1.1

    An example of data set specic plotting and analysis programs for the MST wouldbe the MST GNU plot software. This software plots Cartesian product of wind pro-les from NetCDF data les. This software was developed by the project scientistdue to specialised visualization requirements where ner denition of colour andfont was needed than that provided by generic tools. Preservation risks are due tothe following user skill requirements and technical dependencies.

    UNIX http://www.unix.org/ or Linux distribution The user must be able to install python http://www.python.org/ with python-dev

    module installed with numpy array package and pycdf GNU plot to be installed http://www.gnuplot.info/docs/gnuplot.html and a user

    must be able to set environmental variables The ability to run required python scripts through a UNIX command line GNU plot template le to format plot output.

    A number of preservation strategies presented themselves

    Emulation Strategy

    One solution is preserving the software through emulation, for example Dioscurihttp://dioscuri.sourceforge.net/faq.html . Current work with the PLANETS projecthttp://www.planets-project.eu/news/?id=1190708180 will make Dioscuri capableof running operating systems such as Linux Ubuntu which should satisfy plat-form dependencies. With the capture of specied software packages/libraries andthe provision of all necessary user instructions this would become a viable strategy.

    Conversion Strategy

    It is additionally possible to convert NetCDF les to another compatible formatsuch as NASA AMES http://badc.nerc.ac.uk/help/formats/NASA-Ames/ . We wereable to achieve this conversion using the community developed software Nappy

    http://www.unix.org/http://www.python.org/http://www.gnuplot.info/docs/gnuplot.htmlhttp://dioscuri.sourceforge.net/faq.htmlhttp://www.planets-project.eu/news/?id=1190708180http://badc.nerc.ac.uk/help/formats/NASA-Ames/http://badc.nerc.ac.uk/help/formats/NASA-Ames/http://www.planets-project.eu/news/?id=1190708180http://dioscuri.sourceforge.net/faq.htmlhttp://www.gnuplot.info/docs/gnuplot.htmlhttp://www.python.org/http://www.unix.org/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    5/22

    19.5 MST RADAR Scenarios 349

    http://home.badc.rl.ac.uk/astephens/software/nappy/ , CDAT http://www2-pcmdi.llnl.gov/cdat and Python. This is a compatible self describing ASCII format, sothe information should still be accessible and easily understood as long as ASCIIencoded text can still be read. There would be however reluctance to do this asNASA AMES les are not as easily manipulated making it more cumbersome toanalyse data in the desired manner.

    Preservation by Addition of Representation Information Strategy

    An alternate strategy is to gather the following documentation relating to theNetCDF le format which contains adequate information for future users to extractthe required parameters from the NetCDF le. Currently this information can befound in the BADC support pages on NetCDF http://badc.nerc.ac.uk/help/formats/ NetCDF/ which can be archived using the HTtrack tool or adequately referenced.These pages suggest some useful generic software a future user may wish to utilize.If these pages or no longer available or the software is unusable a user can consultdocuments from the NetCDF documentation and libraries from Unidata http://www.unidata.ucar.edu/software/NetCDF/docs/ . This means that if future user communitystill have skills in FORTRAN, C, C++, Python or Java they will be able to easilywrite software to access the required parameters.

    The BADC decided to opt for the following strategies

    Referencing BADC support Referencing Unidata support Crystallising out RepInfo from UNIDATA doc library to allow developer to write

    or extend their own software in the following languages JAVA C++ FORTRAN 77 Python

    MST1.2

    The GAP manager can be used to identify NetCDF le as at risk when BADC orUNIDATA support goes away either to a variety of technical or organisational rea-sons. This can now be replaced with other RepInfo from the registry repositorywhich we will take from the NetCDF document library at UNICAR whose longevityis not guaranteed http://www.unidata.ucar.edu/software/netcdf/docs/ . We will usethis documentation and the real life BADC user survey to create different desig-nated community prole with the GAP manager. This will show how we can satisfythe needs of different communities of C++, Fortran, Python and Java programmerswho wish to use the data.

    http://home.badc.rl.ac.uk/astephens/software/nappy/http://www2-pcmdi.llnl.gov/cdathttp://www2-pcmdi.llnl.gov/cdathttp://badc.nerc.ac.uk/help/formats/NetCDF/http://badc.nerc.ac.uk/help/formats/NetCDF/http://www.unidata.ucar.edu/software/NetCDF/docs/http://www.unidata.ucar.edu/software/NetCDF/docs/http://www.unidata.ucar.edu/software/NetCDF/docs/http://www.unidata.ucar.edu/software/netcdf/docs/http://www.unidata.ucar.edu/software/netcdf/docs/http://www.unidata.ucar.edu/software/netcdf/docs/http://www.unidata.ucar.edu/software/NetCDF/docs/http://www.unidata.ucar.edu/software/NetCDF/docs/http://badc.nerc.ac.uk/help/formats/NetCDF/http://badc.nerc.ac.uk/help/formats/NetCDF/http://www2-pcmdi.llnl.gov/cdathttp://www2-pcmdi.llnl.gov/cdathttp://home.badc.rl.ac.uk/astephens/software/nappy/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    6/22

    350 19 STFC Science Testbed

    MST1.3

    We explored good about NetCDF standardisation and show CASPAR supports itby archiving the CF standard name list monitoring it and using POM to send

    notication of changes therefore supporting the semantic integrity of the data.NetCDF (network Common Data Form) is an interface for array-orientated dataaccess and a library that provides an implementation of that interface. NetCDFis used extensively in the atmospheric and oceanic science communities. It is apreferred le format of the British Atmospheric data centre that currently providesaccess to the data. The NetCDF software was developed at the Unidata ProgramCenter in Boulder Colorado USA http://www.unidata.ucar.edu/ . NetCDF facilitatespreservation for the following reasons

    NetCDF is a portable, self-describing binary data format so is ideal for capture

    of provenance, descriptive and semantic information. NetCDF is network-transparent, meaning that it can be accessed by computers

    that store integers, characters and oating-point numbers in different ways. Thisprovides some protection against technology obsolescence.

    NetCDF datasets can be read and written in a number of languages, these includeC, C++, FORTRAN, IDL, Python, Perl, and Java. The spread of languages capa-ble of reading these datasets ensures greater longevity of access because as onelanguage becomes obsolete the community can move to another.

    The different language implementations are freely available from the UNIDATA

    Center, and NetCDF is completely and methodically documented in UNIDATAsNetCDF Users Guide making capture of necessary representation information arelatively easy low cost option.

    Several groups have dened conventions for NetCDF les, to enable the exchangeof data. BADC has adopted the Climate and Forecasting (CF) conventions forNetCDF data and have created standard names.

    CF conventions are guidelines and recommendations as to where to put informa-tion within a NetCDF le, and they provide advice as to what type of information

    you might want to include. CF conventions allow the creator of the dataset toinclude information representation and preservation description information in astructured way. Global attributes describe the general properties and origins of thedataset capturing vital provenance and descriptive information, while local attributesare used.

    MST1.5

    Archive the MST support website and carrying out an assessment of it constituentelements and use the Registry to repository to add basic information on HTML,Word, PDF, JPEG, PNG and PostScript to facilitate preservation of a simple staticwebsite Much additional valuable provenance information has also been recorded inthe MST radar support website. Selected pages or the entire site could be archivedas Preservation Description Information.

    http://www.unidata.ucar.edu/http://www.unidata.ucar.edu/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    7/22

    19.5 MST RADAR Scenarios 351

    Fig. 19.3 STFC MST website

    The MST website is currently located at http://mst.nerc.ac.uk (Fig. 19.3 ). Dueto the sites simple structure, which consists of a set of static pages and commonle types it would be a relatively simple operation to run a web archiving toolsuch as HTtrack (http://www.httrack.com/ ) to copy the website and add additionalRepInfo on HTML, PDF, MS Word and JPEG from the DCC Registry Repositoryof Representation Information RRORI. HTtrack is only one of a range of web-archiving tools which are freely available and require minimal skill to operate.However it is worth noting that it is only by virtue of the technical simplicity of the site that it is so relatively easy to archive and preserve.

    MST1.6

    PACK component was used to create and add checksum to the AIP maintaining theexisting directory structure of data les.

    MST 1.7

    The current directory structure is logical and well thought out. This should be main-tained in the AIP package. Details of archiving conventions are recorded in theMST website http://mst.nerc.ac.uk/archiving_conventions.html which will need tobe altered by the removal of the BADC from the top of the directory hierarchystructure to avoid confusion.

    /badc/ dataset-name/data/data-type-name/YYYY/MM/DD/

    19.5.1.1 Preservation Information Network Model for MST Simple SolutionA preservation information network model (Fig. 19.4 ) is a representation of thedigital objects, operations and relationships which allow a preservation objective tobe met for a future designated community. The model provides a sharable, stableand organized structure for digital objects and their associated requirements. The

    http://mst.nerc.ac.uk/http://www.httrack.com/http://www.httrack.com/http://www.httrack.com/http://mst.nerc.ac.uk/archiving_conventions.htmlhttp://mst.nerc.ac.uk/archiving_conventions.htmlhttp://www.httrack.com/http://mst.nerc.ac.uk/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    8/22

    352 19 STFC Science Testbed

    1.1Description

    directorystructure

    1.2MST website

    1.2.1Description and

    provenance

    1.2.2Instruction forrunning static

    website

    1.2.3UK web

    archivingconsortium

    1.2.3.4.1Word 97

    1.3.1Reference

    BADC help onNetCDF

    1.3.2Reference

    UNIDATA he lpon NetCDF

    1.3.3NetCDF

    tutorial forDevelopers

    1.4Climate ForecastStandard Terms 1.4.1

    XML

    1.4.1.1PDF

    1.3

    1.2.3.4.4PNG

    referenceto standard

    1.2.3.4.3PDF

    1.2.3.4.2JPEG

    1.2.3.4.5HTML 4.0

    1.3.3.1Java libraries, API,

    manual andinstructions for

    developers

    1.3.3.3Python libraries,API, manual andinstructions for

    developers

    1.3.3.3C++ libraries, API,

    manual andinstructions for

    developers

    1.3.3.4FORTRAN libraries,

    API, manual andinstructions for

    developers

    MSTNetCDF

    Cartesian

    1.2.3.4

    Fig. 19.4 Preservation information network model for MST-simple solution

    model also directs the capture and description of digital objects which need to bepackaged and stored within an OAIS compliant Archival Information Package.

    19.5.1.2 Components of a Preservation Network Models

    Preservation network modelling has many similarities to classic conceptual mode-lling approaches such as Entity-Relationship or Class diagrams, as it is based uponthe idea of making statements about resources. The preservation network modelconsist of two components the digital objects and the relationships between them.

    Objects are uniquely identied digital entities capable of an independent exis-tence which possess the following attributes

    Information is a description of the key information contained by the digitalobject. This information should have been identied during preservation anal-ysis as being the information required to satisfy the preservation objective for thedesignated user community.

    Location information is the information required by the end user to physicallylocate and retrieve the object. AIPs may be logical in construction with key dig-ital object being distributed and managed within different information systems.This tends to be the case when data is in active use with resources evolving indynamic environment.

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    9/22

    19.5 MST RADAR Scenarios 353

    Physical State describes the form of the digital object. It should contain sufcientinformation relating to the version, variant, instance and dependencies.

    Risks most digital solutions will have inherent risks and a nite lifespan.Risks such interpretability of information, technical dependencies or loss des-ignated community skill. Risks should be recorded against the appropriateobject so they can be monitored and the implication of them being realisedassessed.

    Termination of network occurs when a user requires no additional informationor assistance to achieve, the dened preservation objective given the acceptedrisks will not be imminently realised.

    Relationship captures how two objects are related to one another in order tofull the specied preservation objective whilst being utilized by a member of the designated user community.

    Function , in order to satisfy the preservation objective a digital object will per-form a specic function for example the delivery of textual information or theextraction and graphical visualisation of specic parameters

    Tolerance, not every function is critical for the fullment of the preservationobjective with some digital objects included as they enhance the quality of thesolution or ease of use. The loss of this function is denoted in the model as atolerance.

    Quality assurance and testing, The ability of an object to perform the speciedfunction may have been subjected to quality assurance and testing which may be

    recorded against the relationship. Alternate and Composite relationships can be thought of as logical And

    (denoted in diagrams by circle) or Or (denoted in diagrams by diamond)relationships. Where either all relationships must function in order to full therequired objective or in the case of the later only one relationship needs tofunction in order to full the specied objective.

    19.5.1.3 Quality Assurance and Testing of MST Simple Solution

    19.5.1.3.1 Overall All Solution Validated By

    Curation Manager at the British Atmospheric Data Centre and the NERC EarthObservation Data Centre. His role is to oversee the operations of the data centresensuring that they are trusted repositories that deliver data efciently to users. Hehas a particular interest in data publication issues. He is also the facility managerfor the NERC MST radar facility.

    NERC MST radar facility project scientist and is part of the committee for theMST radar international workshop. The international workshop on MST radar, held

    about every 23 years, is a major event gathering together experts from all overthe world, engaged in research and development of radar techniques to study themesosphere, stratosphere, troposphere (MST).

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    10/22

    354 19 STFC Science Testbed

    19.5.1.3.2 Element of Solution Validated as Follows

    MST1.1 Directory Structure Directory structure validation trivial as very simplestructure easy to navigate

    MST1.2 MST website Content supplied validated and managed by the projectscientist and is subject to community and user group scrutiny

    MST1.2.1 MST website provenance validate by the website creator and manager

    MST1.2.2 Instructions for running static website this was tested locally with thegroup user where able to unzip and use website providing they had Firefox/InternetExplorer, Adobe and Word installed on their laptops/PC

    MST1.2.3 reference testing trivial easily, risk that this reference needs to be

    monitored is acceptedMST1.2.3.4 composite strategy elements of MST website have been scrutinisedby research team

    We conrmed that the site contained jpeg, png, word, pdf and html le(Fig. 19.5 ). We then established that use of these le types was stable in the usercommunity. Use of le types is monitored by the BADC who carry a regular surveyof their user community. We accepted there was a risk that users may at some pointin the future not be able to use these le and will use the BADC survey mechanism

    to monitor the situation. RepInfo for this le type was also added to the AIP so thele type could easily be understood and monitored.

    Fig. 19.5 MST web site les

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    11/22

    19.5 MST RADAR Scenarios 355

    MST 1.2.3.4.1 Information on Word 97 supplied by Microsoft

    MST 1.2.3.4.2 Reference to British and ISO standards on JPEG

    MST 1.2.3.4.3 W3C validated description

    MST 1.2.3.4.4 Reference to ISO standard

    MST 1.2.3.4.1 Reference to ISO standard

    MST1.3.1 Reference to BADC software solutions for NETCDF. Tested byCASPAR STFC and IBM Haifa.

    Successfully tested and validated the extraction parameters using software sup-plied by the BADC InfrastructureManager. He looks after the software that runsthe BADC, including the registration system and dataset access control software

    and

    Met Ofce Coordinator, who works for the NCAS/British Atmospheric DataCentre (but is located in Hadley Centre for Climate Prediction and Researchat the UK Met Ofce (http://www.metofce.gov.uk ). Main duties involve work with:

    Global model datasets obtained from the European Centre for Medium RangeWeather Forecasts (ECMWF).

    Liaison with the Met Ofce regarding scientic and technical interactions.

    Development of software tools for data extraction, manipulation and delivery(based on Climate Data Analysis Tools (CDAT). Development of software for data format conversion such as NAppy.

    MST 1.3.2 & 1.3.3.9(14); RepInfo has been subjected to community scrutiny andpublished by UNIDATA

    The Unidata mission is to provide the data services, tools, and cyber infrastruc-ture leadership that advance Earth system science, enhance educational opportuni-ties, and broaden participation. Unidata, funded primarily by the National Science

    Foundation, is one of eight programs in the University Corporation for AtmosphericResearch (UCAR) Ofce of Programs (UOP). UOP units create, conduct, and coor-dinate projects that strengthen education and research in the atmospheric, oceanicand earth sciences.

    Unidata is a diverse community of over 160 institutions vested in the commongoal of sharing data, and tools to access and visualize that data. For 20 years Unidatahas been providing data, tools, and support to enhance Earth-system education andresearch. In an era of increasing data complexity, accessibility, and multidisciplinaryintegration, Unidata provides a rich set of services and tools.

    The Unidata Program Center, as the leader of a broad community: Explores new technologies Evaluates and implements technological standards and tools Advocates for the community

    http://www.metoffice.gov.uk/http://www.metoffice.gov.uk/http://www.metoffice.gov.uk/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    12/22

    356 19 STFC Science Testbed

    Provides leadership in solving community problems in new and creative ways Negotiates for new and valuable data sources Facilitates data discovery and use of digital libraries Enables student-centred learning in the Earth system sciences by promoting use

    of data and tools in education Values open standards, interoperability, and open-source approaches Develops innovative solutions and new capabilities to solve community needs Stays abreast of computing trends as they pertain to advancing research and

    education

    MST1.4 CF standard names list. The conventions for climate and forecast (CF)metadata are designed to promote the processing and sharing of les created withthe NetCDF API. The CF conventions are increasingly gaining acceptance and have

    been adopted by a number of projects and groups as a primary standard. The conven-tions dene metadata that provide a denitive description of what the data in eachvariable represents, and the spatial and temporal properties of the data. This enablesusers of data from different sources to decide which quantities are comparable, andfacilitates building applications with powerful extraction, re-gridding, and displaycapabilities. The CF conventions generalize and extend the COARDS conventions.

    19.5.1.3.3 Discussion and Validation of CF metadata Takes Placein Two Formats

    1. CF metadata Trac, and2. cf-metadata mailing list.

    The list is then published by Alison Pamment CF meta data secretary. Alison isresearch scientist based at the Rutherford Appleton Laboratory and is responsiblefor Climate and Forecast (CF) metadata support.

    MST 1.4.1 W3C validate standard

    MST 1.4.1.1 PDF ISO standardInputs needed for the creation of the AIP are illustrated in Fig. 19.6 .

    19.5.2 Scenario2 MST-Complex

    A user from a future designated user community should be able to extract thefollowing information from the data for a given altitude and time

    Horizontal wind speed and direction Wind sheer Signal Velocity Signal Power Aspect Correlated Spectral Width

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    13/22

    19.5 MST RADAR Scenarios 357

    Fig. 19.6 Preservation information ow for scenario 1 - MST-simple

    The Preservation Information Network is shown in Fig. 19.7 . In addition futureusers should have access to User group notes, MST conference proceedings andpeer reviewed literature published by previous data users.

    MST Scenario2 has a higher level preservation objective and can be consideredan extension of scenario 1 as the AIP information content is simply extended. Thesignicance of this is that future data users will have access to important informationwhich will help in the studying the following types of phenomena captured within

    the data Precipitation Convection Gravity Waves Rossby Waves Mesoscale and Microscale Structures Fallstreak Clouds Ozone Layering

    19.5.2.1 Preservation Objectives for Scenario2 MST-Complex

    A user from a future designated user community should be able to extract thefollowing information from the data for a given altitude and time

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    14/22

    358 19 STFC Science Testbed

    F i g

    . 1 9

    . 7

    P r e s e r v a t i o n i n f o r m a t i o n n e t w o r k m

    o d e l f o r M S T - c o m p l e x s o l u t i o n

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    15/22

    19.5 MST RADAR Scenarios 359

    Horizontal wind speed and direction Wind sheer Signal Velocity Signal Power Aspect Correlated Spectral Width

    In addition future users should have access to User group notes, MST conferenceproceedings and peer reviewed literature published by previous data users.

    MST Scenario2 has a higher level preservation objective and can be consideredan extension of scenario 1 as the AIP information content is simply extended. Thesignicance of this is that future data users will have access to important informationwhich will help in the studying the following types of phenomena captured within

    the data Precipitation Convection Gravity Waves Rossby Waves Mesoscale and Microscale Structures Fallstreak Clouds Ozone Layering

    Implementation points based on strategies for scenario1MST1.7 We reviewed bibliography contained by website and quality of refer-

    ences. Carried out an investigation and review of technical reports which are usedheavily at STFC but have not been generated here. Identify clear cases of reportswhich have correctly cited but have not need been deposited anywhere as they haveno natural home and digitise for inclusion within the AIP. The website addition-ally contains a bibliographic record of publications resulting from use of the data.This record contains good quality citations but there would be concerns regardingpermanent access to some of these materials, consider the two examples below

    W. Jones and S. P. Kingsley. MST radar observations of meteors. In Proceedingsof the Wagstaff (USA) Conference on Astroids, Comets and Meteors. Lunar and Planetary Institute (NASA Houston), July 1991

    S. P. Kingsley. Radio-astronomical methods of measuring the MST radar antenna. Technical report to MST radar user community, 1989.

    Neither of these two items is current held by either The British Library http:// www.bl.uk/ or The Library of Congress http://catalog.loc.gov/ based on searchesof their catalogues. Nor do they in exist in the local STFC institutional repositoryhttp://epubs.cclrc.ac.uk/ . The preservation strategy to deal with this bibliographywas to create MARC http://www.loc.gov/marc/ http://www.dcc.ac.uk/diffuse/?s=36records in XML format for items held by the British Library and to begin the processof obtaining copies of the other items from the current community and digitise themin PDF format for direct inclusion within the AIP.

    http://www.bl.uk/http://www.bl.uk/http://catalog.loc.gov/http://epubs.cclrc.ac.uk/http://epubs.cclrc.ac.uk/http://www.loc.gov/marc/http://www.dcc.ac.uk/diffuse/?s=36http://www.dcc.ac.uk/diffuse/?s=36http://www.loc.gov/marc/http://epubs.cclrc.ac.uk/http://catalog.loc.gov/http://www.bl.uk/http://www.bl.uk/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    16/22

    360 19 STFC Science Testbed

    MST1.8

    The international workshop on MST radar is held about every 23 years, and is amajor event gathering together experts from all over the world, engaged in research

    and development of radar techniques to study the mesosphere, stratosphere and tro-posphere (MST). It was additionally attended by young scientists, research studentsand also new entrants to the eld to facilitate close interactions with the experts onall technical and scientic aspects of MST radar techniques. It is this aspect whichmakes the proceedings an ideal resource for future users who are new to the eld.

    Permanent access to these proceedings is again at risk. The MST 10 proceedingsare available for download from the internet http://jro.igp.gob.pe/mst10/ and fromthe British Library. Proceedings 3, 510 are also available from the British library,meeting 4 is only available from the Library of Congress and unfortunately theproceedings from meetings 1 and 2 have not been deposited in either institution.

    A number of strategies present themselves. Copies of proceedings 1, 2 and 4could be obtained from the still active community, digitised and incorporated intothe AIP. The proceedings which are currently held by the British Library can beobtained, digitised and incorporated into the AIP or alternatively the XML MARCrecord can be obtained and incorporated into the AIP as a reference as there is ahigh to degree of condence in the permanence of these holdings.

    MST1.9 The project scientist has again been quite diligent in keeping minutesof the user group meetings which are run for data-using scientists several times ayear. As result this information is easily captured. It currently resides in the NCASCEDA repository which provides easy access to current data users however thereare no guarantees that this repository will persist in the longer term so a simple ref-erence in the form of URL would not be considered to be sufcient to guaranteepermanent access to this material. This leaves two strategies open to the archive.The rst involves taking a copy of this material and including it physically withinthe AIP. The second involves orchestration where the CEDA repository would berequired to alert the custodians of the MST data to the demise of the repository ormigration of this material, so it may be obtained for direct inclusion in the AIP.We created reference to the MST user group minutes held in the newly createdCEDA institutional repository for the Nation Centre for Atmospheric studies http:// cedadocs.badc.rl.ac.uk/ . We registered the demise of this repository as a risk tomonitored and recommended the development of an orchestration strategy for mate-rial held. This repository as it is representative of a proliferation of repositories inacademia whose longevity is not guaranteed.

    19.5.2.2 Quality Assurance and Testing of MST Complex Solution

    MST 2.5 Bibliography content supplied and validated by the project scientistMST 2.5.1 MARC21 specication standard validated by library of congressMST 2.5.1 XML specication validated by W3CMST 2.5.1.1 & 2.5.2.1 PDF ISO standard

    Inputs needed for the creation of the AIP are illustrated in Fig. 19.8 .

    http://jro.igp.gob.pe/mst10/http://cedadocs.badc.rl.ac.uk/http://cedadocs.badc.rl.ac.uk/http://cedadocs.badc.rl.ac.uk/http://cedadocs.badc.rl.ac.uk/http://cedadocs.badc.rl.ac.uk/http://jro.igp.gob.pe/mst10/
  • 7/31/2019 Chapter 19 - STFC Science Testbed

    17/22

    19.6 Ionosonde Data and the WDC Scenarios 361

    Fig. 19.8 Preservation information ow for scenario 2 - MST-complex

    19.6 Ionosonde Data and the WDC Scenarios

    The World Data Centre (WDC) system was created to archive and distribute datacollected from the observational programmes of the 19571958 InternationalGeophysical Year. Originally established in the United States, Europe, Russia,and Japan, the WDC system has since expanded to other countries and to new sci-entic disciplines. The WDC system now includes 52 Centres in 12 countries. Itsholdings include a wide range of solar, geophysical, environmental, and humandimensions data. The WDC for Solar-Terrestrial Physics based at the RutherfordAppleton laboratory holds ionospheric data comprising vertical soundings fromover 300 stations, mostly from 1957 onwards, though some stations have datagoing back to the 1930s.

    The Ionosonde is a basic tool for ionospheric research. Ionosondes are VerticalIncidence radars which record the time of ight of a radio signal swept througha range of frequencies (130 MHz) and reected from the ionised layers of the

    upper atmosphere (90800 km) as an ionogram. These results are analysedto give the variation of electron density with height up to the peak of the iono-sphere. Such electron-density proles provide most of the Information requiredfor studies of the ionosphere and its effect on radio communications. Only a

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    18/22

    362 19 STFC Science Testbed

    small fraction of the recorded ionograms are analysed in this way, however,because of the effort required. The traditional input to the WDC has been hourlyresolution scaled data, but many stations take soundings at higher resolutions.

    The WDC receives data from the many ionosonde stations around the worldthrough a variety of means including ftp, email, CD-ROM. Data is provided ina number of formats: URSI (simple hourly resolution) and IIWG (more com-plex, time varying) standard formats as well as station specic bulletins. TheWDC stored data in digital formats comprises 2.9 GB of data in IIWG formatand 70 GB of raw MMM, SAO, ART les from Lowell digisondes. The WDCalso holds about 40,000 rolls of 16/35 mm lm ionograms and ~10,000 monthlybulletins of scaled ionospheric data. Some of this data is already in digital from,but much, particularly the ionogram images, is not yet digitised.

    Many stations data is provided in IIWG or URSI format directly. This datamay be automatically or manually scaled.

    selection of European stations provide raw format data from Lowelldigisondes, a particular make of ionosonde, as part of a COST project. Thisdata is in a proprietary format, but Lowell provides Java based softwarefor analysis. The WDC uses this software to manipulate this data, particu-larly from the CCLRCs own Ionospheric Monitoring Groups Ionosondes atChilton, UK and Stanley, Falkland Islands. The autoscaled data from these

    stations is also stored in a PostgreSQL database. Other stations provide a small set of standard parameters in a station specicbulletin format which is similar to the paper bulletins traditionally producedfrom the 1950s onwards. The WDC has some bespoke, congurable softwareto extract the data from these bulletins and convert it to IIWG format.

    It is important to realise that this is a totally voluntary data collection and archivesystem. The WDCs have no control or means of enforcing a standard meansof data processing or dissemination, though weight of history and ease-of-usetends to make this the preferred option.

    19.6.1 STFC3: Implementation Plan for Scenario3 Ionosonde-Simple

    The rst preservation scenario show us again supporting and integrating withexisting preservation practices of the World Data Centre, which means creating aconsistent global record from 252 station by extracting a standardise set of parame-ters from the Ionograms produced around the world. A user from a future designatedcommunity should be able to the following fourteen standard Ionospheric param-eters from the data for a given station and time. They should also be able tounderstand what these parameters represent. F min, foE h _ E , foes h_ Es, type of Es,

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    19/22

    19.6 Ionosonde Data and the WDC Scenarios 363

    fbEs, foF 1, M(3000 )F 1, h_F, h _F 2, foF 2, fx, M(3000 )F 2. The preservation informa-tion ow is shown in Fig. 19.9 and the corresponding information network is shownin Fig. 19.10 .

    19.6.1.1 Preservation Information Flow for Scenario3 Ionosonde-Simple

    Fig. 19.9 Preservation information ow for scenario 3 Ionosonde-simple

    1.1Description

    directorystructure

    1.2CSV file of

    stationinformation

    1.4.1DEDSL

    specification

    1.5URSI handbooks

    1.4.2XML

    1.4URSI parameter

    code DEDSLdictionary

    1.3IIWG formatdescription

    1.4.2.1 &1.4.1.1

    PDF

    1.5.1PDF

    IIWG

    Fig. 19.10 Preservation network model for scenario 3 Ionosonde simple

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    20/22

    364 19 STFC Science Testbed

    19.6.1.2 Implementation Points Based on Strategies for Scenario3

    IO1.1 New RepInfo based on IIWG format description removing need tounderstand FORTRAN as is the case with comprehending the current version

    IO1.2 Create DEDSL dictionary for 14 standard parameters and add RepInfofrom the Registry Repository on the XML DEDSL standardIO1.3 Authenticity information from the current archivist for the 252 stations

    and the data transformation/ingest processIO1.4 Perform CSV dump of station information from Postgres databaseIO1.5 A logical description of the directory structure was createdIO1.6 PACK was used to create and add checksum to the AIP maintaining the

    existing directory le structure

    19.6.2 STFC4: Implementation Plan for Scenario4 Ionosonde-Complex

    The second preservation scenario for the Ionosonde can only be carried out for 7European stations but will allow a consistent Ionogram record for the Chilton sitewhich dates back to the 1920s. A user from a future designated community shouldbe able reproduce an Ionogram from the raw mmm/sao data les (see Fig. 19.11 )and have access to the Ionospheric Monitoring groups website, the URSI hand-books of interpretation and Lowell technical documentation. Being able to preservethe Ionogram record is signicant as it a much richer source of information moreaccurately able to covey the state of the atmosphere when correctly interpreted. Thepreservation information ow is shown in Fig. 19.12 .

    Fig. 19.11 Example plot of output from Ionosonde

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    21/22

    19.6 Ionosonde Data and the WDC Scenarios 365

    InformationStatic

    ReferenceInformation

    ProvenanceInformation

    PreservationDescriptionInformation

    ContextInformation

    FixityInformation

    Content

    Stationcode, descriptionsand organisational

    information

    Data Archivist

    Data Producers ScientificOrganisation

    Description ofDirectory structure

    IonosphericMonitoring group

    website

    Lowell Technicaldocumentation

    URSI handbook ofIonogram

    interpretation

    SAO - Explorer

    StructureInformaton

    SemanticInformaton

    Representation Information

    Interpreted using

    OtherRepresentation

    Informationadds meaning to

    MMM &SAOfile formatdescritions

    Raw mmm &SAO data files

    Informationexpected tpevolve over

    time

    Fig. 19.12 Preservation information ow for scenario 4 Ionosonde-complex

    19.6.2.1 Implementation Points Based on Strategies for Scenario4

    IO2.1 Archive SAO explorer with RepInfo from registry repository for JAVA 5software

    IO2.2 Digitise and include URSI handbooks of interpretation in the AIP anddeposit in Registry Repository for other repository users

    IO2.3 Digitise and include Lowell technical documentation in the AIP anddeposit in Registry Repository for other repository users

    IO2.4 Archive the Ionospheric monitoring group website and carrying out an

    assessment of its constituent elements and use the Registry to repository toadd basic information on HTML, Word, PDF, JPEG, PNG and PostScript tofacilitate preservation of a simple static website

    IO2.5 Review bibliography contained by website and quality of references.Carry out an investigation and review of technical reports which are usedheavily at STFC but have not been generated here. Identify clear cases of reports which have correctly cited but have not need been deposited any-where as they have no natural home and digitise for inclusion within theAIP.

    IO2.6 Perform CSV dump of station information from Postgres databaseIO2.7 Create logical description of directory structureIO2.8 Use PACK to create and add checksum to the AIP maintaining the

    existing directory structure

  • 7/31/2019 Chapter 19 - STFC Science Testbed

    22/22

    366 19 STFC Science Testbed

    IO2.9 Use the GAP manager to identify a GAP based on the demise of the JAVAvirtual machine. Use POM to notify us of the gap and update the AIP witha replacement EAST description of the mmm le structure from the registryrepository.

    19.7 Summary of Testbed Checks

    At each of the steps listed above checks were performed to ensure that, for examplethe Representation Information e.g. IO1.1 the description of the IIWG format waschecked by extracting numbers from the data le using generic tools and comparingthese to the values obtained using the current tools.

    The overall check was to go through the AIP with the archive managers andscientists and ensure that they agreed with the Representation Information and PDIwhich had been captured this required several iterations but in the end they werewilling to sign-off on all the materials.

    Users with the appropriate knowledge base have also been successful in extract-ing and performing the basic analysis tasks with the specied data. Taking thistogether with the acceptance by the archive managers and scientists of the preserva-tion analysis, risks analysis and the adequacy of the AIP, we believe that the aimsof the testbed have been successfully achieved.