17
Development and exploitation of a controlled vocabulary in support of climate modelling Article Published Version Creative Commons: Attribution 3.0 (CC-BY) Open Access Moine, M.-P., Valcke, S., Lawrence, B. N., Pascoe, C., Ford, R. W., Alias, A., Balaji, V., Bentley, P., Devine, G., Callaghan, S. A. and Guilyardi, E. (2014) Development and exploitation of a controlled vocabulary in support of climate modelling. Geoscientific Model Development, 7 (2). pp. 479-493. ISSN 1991-9603 doi: https://doi.org/10.5194/gmd-7-479-2014 Available at http://centaur.reading.ac.uk/37827/ It is advisable to refer to the publisher’s version if you intend to cite from the work. See Guidance on citing . Published version at: http://www.geosci-model-dev.net/7/479/2014/ To link to this article DOI: http://dx.doi.org/10.5194/gmd-7-479-2014 Publisher: European Geosciences Union All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other

Development and exploitation of a controlled vocabulary in

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Development and exploitation of a controlled vocabulary in support of climate modelling

Article

Published Version

Creative Commons: Attribution 3.0 (CC-BY)

Open Access

Moine, M.-P., Valcke, S., Lawrence, B. N., Pascoe, C., Ford, R. W., Alias, A., Balaji, V., Bentley, P., Devine, G., Callaghan, S. A. and Guilyardi, E. (2014) Development and exploitation of a controlled vocabulary in support of climate modelling. Geoscientific Model Development, 7 (2). pp. 479-493. ISSN 1991-9603 doi: https://doi.org/10.5194/gmd-7-479-2014 Available at http://centaur.reading.ac.uk/37827/

It is advisable to refer to the publisher’s version if you intend to cite from the work. See Guidance on citing .Published version at: http://www.geosci-model-dev.net/7/479/2014/

To link to this article DOI: http://dx.doi.org/10.5194/gmd-7-479-2014

Publisher: European Geosciences Union

All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other

copyright holders. Terms and conditions for use of this material are defined in the End User Agreement .

www.reading.ac.uk/centaur

CentAUR

Central Archive at the University of Reading

Reading’s research outputs online

Geosci. Model Dev., 7, 479–493, 2014www.geosci-model-dev.net/7/479/2014/doi:10.5194/gmd-7-479-2014© Author(s) 2014. CC Attribution 3.0 License.

GeoscientificModel Development

Open A

ccess

Development and exploitation of a controlled vocabulary in supportof climate modelling

M.-P. Moine1, S. Valcke1, B. N. Lawrence2,3,4, C. Pascoe4, R. W. Ford5, A. Alias6, V. Balaji7, P. Bentley8, G. Devine9,S. A. Callaghan4, and E. Guilyardi9,10

1Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS), CERFACS/CNRS SUCURA1875, Toulouse, France2Department of Meteorology, University of Reading, Reading, UK3Centre for Environmental Data Archival, STFC Rutherford Appleton Laboratory, Didcot, UK4National Centre for Atmospheric Science (NCAS), Natural Environment Research Council, UK5STFC Daresbury Laboratory, Warrington, UK6Centre National de Recherches Météorologiques (CNRM), Meteo-France/CNRS, Toulouse, France7NOAA Geophysical Fluid Dynamics Laboratory (GFDL) and University of Princeton, USA8Met Office Hadley Centre, Exeter, UK9National Center for Atmospheric Science (NCAS), University of Reading, Reading, UK10Institut Pierre Simon Laplace (IPSL), CNRS, Paris, France

Correspondence to:M.-P. Moine ([email protected])

Received: 12 April 2013 – Published in Geosci. Model Dev. Discuss.: 23 May 2013Revised: 20 January 2014 – Accepted: 26 January 2014 – Published: 21 March 2014

Abstract. There are three key components for developing ametadata system: a container structure laying out the key se-mantic issues of interest and their relationships; an extensiblecontrolled vocabulary providing possible content; and toolsto create and manipulate that content. While metadata sys-tems must allow users to enter their own information, the useof a controlled vocabulary both imposes consistency of def-inition and ensures comparability of the objects described.Here we describe the controlled vocabulary (CV) and meta-data creation tool built by the METAFOR project for usein the context of describing the climate models, simulationsand experiments of the fifth Coupled Model IntercomparisonProject (CMIP5). The CV and resulting tool chain introducedhere is designed for extensibility and reuse and should findapplicability in many more projects.

1 Introduction

Climate models have experienced outstanding evolution inthe last 20 years, driven by scientific improvements and in-creases in computing capabilities. Additional components of

the earth system are being represented with an increasingnumber of physical processes taken into account. Higher spa-tial resolution is supported thanks to the emergence of highperformance computing platforms. In addition, more andmore research centres have been engaging in climate mod-elling, which increases the number of models involved. Oneimportant consequence is the growth of the volume of dataproduced. Climate Model Intercomparison Projects (CMIP)initiated and supervised by the World Climate Research Pro-gramme (WCRP) are an academic exercise on which climateprojection assessment is based. Higher complexity of numer-ical models, explosion in the volume of data produced andthe growing number of contributing modelling groups re-quire a dedicated and expert infrastructure for data qualitycontrol, data documentation, data storage and access. Indeed,the ever-growing number of scientific groups producing andusing climate model data requires more sophisticated datamanagement systems, including good quality, understand-able and shareable data documentation.

The technological part of this infrastructure in CMIP5is ensured by ESGF (Earth System Grid Federation): sev-eral distributed data centres host the data produced by the

Published by Copernicus Publications on behalf of the European Geosciences Union.

480 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

modelling groups around the world, some of them (PCMDI,BADC, WDCC) being gateways for data publication anddownload (Williams et al., 2011). It became clear during theset-up of this infrastructure that the definition and adoptionof standard metadata (that is data describing the data), is cru-cial to guide end-users through data mining, data interpreta-tion or data comparison tasks (Guilyardi et al., 2011) – evenoutside the climate modelling community itself, for exampleby the environment and health impact community. Further-more, climate metadata must describe both the data contentand the model and simulations that produced this data. TheCMIP5 metadata standardization effort exploited work con-ducted jointly by the CURATOR project in the US (Dunlapet al., 2008) and the METAFOR project funded by the Eu-ropean Commission (Callaghan et al., 2010). The approachwe followed in METAFOR was to define three key metadatacomponents: a conceptual container to store and organizethe information (the CIM, Common Information Model); thepossible content (the controlled vocabulary) and a method-ology to harvest a specific content, i.e. an instance of meta-data (the so-called “CMIP5 Questionnaire”). The CIM is in-troduced inLawrence et al.(2012). Here we concentrate onthe controlled vocabulary (CV) and the specific harvestingtool developed for CMIP5. We begin by setting the contextof earth system models and simulations so as to appreciatethe challenge raised by climate metadata. We then present abrief inventory of existing metadata systems in the climatearea, pointing out gaps and incompleteness and advocatingfor a unique and encompassing standard. We describe themethodology applied to build the METAFOR CV and its re-sulting structure based on key elements. Finally, we explainhow this CV was used to construct the “CMIP5 Question-naire” and how it was ingested by other metadata systemslike ESGF.

2 Picture of a climate model and climate experiments

Climate study is a highly interdisciplinary science that his-torically emerged with the convergence of scientific exper-tise in the research areas related to the earth system, suchas oceanography, atmospheric physics, sea ice dynamics, hy-drology, etc. As a result, a climate model is a compositionof models (hereafter referred to as “components”, some ofwhich map onto “realms” using the nomenclature ofTaylor etal. (2011a)), each one being devoted to a specific domain ofthe climate system. These models are generally assembled bycoupling software (seeValcke et al., 2012, for a review). Therole of the coupler is to exchange coupling fields at the inter-face of the component domains (for example, wind stress andradiative fluxes are transmitted from the atmosphere to theocean, sea surface temperature and currents from the oceanto the atmosphere), performing the spatial remapping fromthe grid of one component to the other. The resulting global

model, including components and the coupler, is thereforereferred to as a “coupled model”.

A given model can be run and integrated in time (i.e. aclimate simulation can be performed) in a large number ofdifferent ways, depending on the temporal and dynamicalschemes used, and according to the physical parameteriza-tions selected to model subgrid phenomena within each phys-ical scheme of each component. Initial conditions and exter-nal forcing that influence the climate system must be pre-scribed, e.g. green house gases, volcanoes, aerosol types andconcentrations, and land-use changes. By adjusting modelparameters such as orbital parameters or solar irradiance andby applying appropriate forcing and initial conditions, cli-mate models can be run for various time durations (seasonal,decadal, centennial, millennial) and reproduce different cli-matic periods (paleo, present and future).

One particular model configuration is usually targeted ata specific scientific question: for example, to understand thesensitivity of a climate process to horizontal resolution or toprovide a projection of future climate under a specific emis-sion scenario. Hence, it is important to document not only theparticular configuration, but also why that configuration waschosen. The purpose of an experimental protocol like CMIP5(Taylor et al., 2011a) is to provide guidance for the set-upof models and simulations, so that the different modellinggroups address the same questions in a comparable way withtheir own model. It is clear that the way the model is scientif-ically configured (including model parameterizations, initialconditions and forcing) and how it conforms to the experi-mental requirements is crucial information to interpret andcompare results. It is therefore vitally important to preservethis information along with the data.

3 Existing metadata for weather forecast and climate

To ensure interoperability of geo-referenced and weatherforecast data products, international organizations like theOpen Geospatial Consortium (OGC) and the World Meteoro-logical Organization (WMO) promote adoption of standards.These standards are currently used by national meteorologyinstitutes and production centres of remote sensing and insitu observations all over the world.

The CF convention provides a set of standard names forgeophysical variables associated with a precise scientific def-inition and units. In the CMIP5 framework, CF-NetCDF isthe compulsory format for the output data set. Furthermore,CMIP5 output metadata are constrained by the CMIP5 tableswhich impose, among other things, short names and units andensures correspondence with the CF standard names, both fordimensional and physical variables (Taylor and Doutriaux,2010). Additional low-level1 metadata are included in theoutput files as global attributes, for exampleexperiment_id

1“low-level metadata” term refers to metadata that applies to in-dividual data sets and describes their content (i.e. what the data is),

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 481

or model_id that respectively identify the CMIP5 experimentand the coupled model that produced the data set, accord-ing to terms defined in the Data Reference Syntax document(DRS) (Taylor et al., 2011b).

Several previous projects, such as NMM (Numer-ical Model Source Metadata, University of Reading)and NumSim (Numerical Simulation Discovery Metadata,BADC/NCAS, http://proj.badc.rl.ac.uk/ndg/wiki/NumSim)have tried to address the higher-level metadata issue, i.e.not only describing “what” are the data produced but also“how” they were produced (the model and simulation de-tails). NMM and NumSim identified some key terms (e.g.genealogy, boundary condition type, initial condition type,ensemble type, model component, model category) and usedISO standards where relevant. Other specific metadata sys-tems have addressed more technical aspects of climate mod-elling like the configuration of coupling exchanges betweenearth system components (BFG,Ford and Riley, 2011; OA-SIS4,Redler et al., 2010) or the grids on which climate modeldata is discretized (gridSpec,Balaji Institute, 2007). How-ever, no one integrated high-level1 metadata system able toencompass the whole “climate modelling” process emergedfrom these projects, leaving only pieces of metadata, of-ten disconnected. In the previous CMIP phase 3, this re-sulted in asking scientists to provide additional informationabout models and simulations in unconstrained text-baseddocuments (the CMIP3 questionnaire, see an extract in Ap-pendix B).

4 The METAFOR controlled vocabulary

Given that the metadata have to address all stages of the mod-elling process and given that they should serve data discov-ery and access tools, the prime objective of the METAFORproject was to design a conceptual metadata scheme anddevelop the associated hosting structure, the Common In-formation Model (CIM). The CIM defines objects, classes,and their relationships (Lawrence et al., 2012). Through spe-cialized UML (Unified Modelling Language,www.uml.org)packages, the CIM addresses the description of the con-stituent elements of climate modelling: the “activity” pack-age includes the experimental context and simulations; the“software” package covers the climate model itself; the finaldata objects produced by simulations and their inputs are de-scribed by the “data” package and the numerical grids of themodels by the “grid” package; finally, a “shared” package ofreusable elements supports some “orphan” classes, such asquality control records and platform descriptions.

To be operational, each individual CIM package needs anassociated controlled vocabulary (CV) that defines sets of al-lowed attributes (name/value pairs). For the “data”, “grid”and “shared” packages, the CV was mainly based on a list

while “high-level metadata” refers to metadata that applies to wholedata sets and addresses how the data were produced.

of already existing terms, respectively the CF standard, grid-Spec and some ISO standards. Vocabularies for the activ-ity and software packages did not exist, and were devel-oped from scratch. In the following we present the result-ing “Model Controlled Vocabulary” and the “Simulationsand Experiments Controlled Vocabulary”, used in support ofCMIP5 to populate the software and activity packages re-spectively.

4.1 The Model Controlled Vocabulary

The Model Controlled Vocabulary describes the heart of theclimate data production chain, that is, the numerical modelitself. This work had to define the Model CV starting fromscratch and had to go through the early steps of a classicalCV building process:

1. identify the relevant and discriminating information(about the climate model components);

2. set an ensemble of appropriate terms (meaningful andnon-ambiguous) to synthetically and faithfully expressthe information;

3. organize these terms hierarchically, with possibleinter-dependencies;

4. attach a definition to each term;

5. identify allowed/possible values for each term.

Following the CMIP5 protocol (Taylor et al., 2011a), thefirst level decomposition of a coupled climate model wasmapped onto eight identified realm components:ocean, at-mosphere, land surface, land ice, sea ice, atmospheric chem-istry, aerosol and ocean biogeochemistry. Each realm com-ponent is in its turn made of sub-components, one per mainphysical or dynamical process. Here the components are log-ical descriptions of the model, not descriptions of the actualsoftware – it is important that users of these CVs understandthe distinction, since with the version of the CIM used, thereis not necessarily a direct mapping between the descriptionof the components and the actual layout in software compo-nents.

The way of organizing the CV was both driven by typicalstructure of the numerical models themselves and by the sci-entific rationale for gathering ideas into main themes, the twobeing obviously closely related. The current CV granularityis a compromise driven by the requirements of model inter-comparison: to reach a level of details sufficient to be mean-ingful and discriminating across the various climate modelsbut avoid overloading and too-specific information.

The CV could not be established ad hoc by exploring themodel literature alone. The compromise reached is the re-sult of a wide consultation with a number of climate mod-ellers led by one dedicated person in METAFOR. The re-sulting collaboration of a significant number of scientists

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014

482 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

Fig. 1. Consultation process with scientists to define the CV forclimate model description.

from the international climate community, each working withdifferent climate models, was a key part of the CV de-velopment. More than 35 experts from 13 research centresrepresenting 6 countries contributed (see list of contribu-tors in Appendix A), each bringing important scientific ex-pertise to help in identifying the model characteristics im-portant to capture and document for intercomparison. Dur-ing face meetings or through audio screen-sharing sessions,modellers were asked to tell us about the science and algo-rithms of the climate model component they developed. Thediscussions were captured using mindmaps (Freemind soft-ware, http://freemind.sourceforge.net/wiki/index.php/Main_Page), one for each realm, which proved to be very appropri-ate for capturing structured information and feedback on thefly.

The interviewing and reviewing procedure is illustrated inFig. 1: following a first-round interview with one realm ex-pert (step 1), revision processes were launched with other sci-entists from other research centres (step 5). We integrated thefeedback in a structured way (steps 2, 3), capturing their pre-cise meaning, getting confirmation when necessary, workingout possible conflicting views (step 6), and taking care not tointroduce inconsistencies with previously collected CV. Fol-lowing this consultation process, several iterations led to aconsensus among the modellers interviewed. The resultingCV can be seen as the product of a converging process, giv-ing ultimately both the content and the granularity of thatcontent. For instance, the case of CV for atmospheric chem-istry and aerosol modelling raised some debate within thescientific community since the CMIP5 steering committeehad decided to separate them into two different realms. Inten-sive and rich scientific discussions and exchanges of viewswere necessary to raise a consensus.

The resulting scientific CV for climate models has threemain categories:

1. the CV for the model realm components, including de-tails of the numerical schemes deployed for dynamicalprocesses (advection, diffusion, transport), for time in-tegration and key information about the parameteriza-tions used to model sub-grid-scale physical processes(e.g. precipitation and clouds in the atmosphere realm;soil hydrology in the land surface realm, gas phase pro-cesses in the atmospheric chemistry realm); this is theheart of the Model CV;

2. the CV associated with the numerical grids used by themodels for spatial discretization;

3. the CV for describing the way components are cou-pled together for exchanging coupling fields, includingselected terms for spatial regridding and time trans-formation of these fields; these latter have been de-rived from vocabulary used for standard configurationof couplers.

4.1.1 Model realm component CV

The complete set of CV for realm components addressesmore than 570 leaf parameters over 8 realms2.

The CV schema adopted for describing the model compo-nents has a hierarchical structure we illustrate with theSeaIcerealm component (Fig.2). The CV is made of possibly em-bedded elements: single “leaf parameters” (name/value pairs;e.g.SchemeType/snow-agingin Fig. 2) are gathered within“parameter groups” containers (e.g.Snow, to follow the sameexample in Fig.2), themselves gathered within “compo-nents” (e.g.SeaIce_Thermodynamics). Some groups of pa-rameters are “conditional parameter groups” (e.g.if Verti-calDiffusion is multi-layerin Fig. 2) depending on the valuetaken by another parameter (hereVerticalDiffusion). The treestructure of these different container families define the al-lowable embedding of the controlled vocabularies and theirrelationships.

The CV forms a semantic database (the possible content)for building a metadata instance (an actual content recordedas a CIM document) for a given model and related simula-tion. A suite of tools were developed to exploit this semanticdatabase in an automatic way so as to feed downstream toolssuch as the CMIP5 Questionnaire (see Sect.5.1). To that end,coding rules were added to the mindmaps. We defined a setof formal typographic rules (e.g. different font formats andicons) to distinguish the different types of CV containers andthe different types of choice (exclusive or not) among pos-sible values for parameters or to define the type of expectedvalue (numeric or string). These rules are illustrated in Fig.2and detailed in the legend of this figure. A definition of pa-rameters is provided (as attached note, not shown in Fig.2)and units are prescribed where numeric values are expected.

2See http://METAFORclimate.eu/trac/browser/controlled_vocabularies/branches/cmip5/Software.

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 483

Fig. 2.A portion of the sea ice CV, showing theSeaIceThermodynamicssub-component. Black bold font denotes model components; purpleis for parameter groups; blue is for conditional parameter groups; brown is for leaf parameters expecting values; black is for possible valuesfor the leaf parameters; red cross icons mark single choice (XOR), green tick mark icons symbolize multiple choice (OR); pencil icons are forfree text entry (numeric entries are also possible (not shown)); notebook icons ahead of a leaf parameter indicate that a definition is attachedas a footnote.

4.1.2 Model grid CV

With the model grid CV, METAFOR describes the computa-tional grids of the model components. These grids may dif-fer from the grid the data is expressed on, which, according tothe CMIP5 guidance, should be described following the grid-Spec standard (Balaji Institute, 2007). The model numericalgrid CV has to provide information about the horizontal andvertical coordinate system, the vertical coordinate used, thenumber of levels in the mixed layer and boundary layer, forocean and atmosphere respectively, etc. A systematic com-parison with gridSpec vocabulary was conducted prior to es-tablishing the numerical grid CV so as to reuse terms whenpossible. A part of this model grid CV, dealing with the ver-tical coordinate system, is shown in Fig.3: according to thevalue of theVerticalCoordinateTypeleaf parameter, differentvalues for vertical coordinate are proposed (e.g.sigmacoor-dinate is proposed only if the type of vertical coordinate isterrain following).

4.1.3 Coupling exchanges CV

The CV defined in METAFOR to describe the coupling ex-changes between the component models should be consid-ered as an elementary first step. For each exchange, thesource and target components are identified, and the couplingCV covers the coupling software used, the type of the spatialregridding and time transformation of the fields (if any). As

one can see, the coupling exchange CV is currently quite lim-ited.

4.1.4 Climate Model CV evolution and preservation

Even though frozen in the context of CMIP5, we expect that,with usage, this climate Model CV will evolve, improve andbe reused in other scientific projects. Thus, we will have tomanage the evolution and ensure the preservation of this CV,which is the first one encompassing all components of a cou-pled climate model. To that end, it is planned to set up aninternational governance committee under the auspices of IS-ENES2 (https://verc.enes.org/ISENES2/), the EU-FP7 (EU’sSeventh Framework Programme for Research) project thatfollows IS-ENES (InfraStructure for the European Networkfor Earth System Modelling).

4.2 Controlled vocabulary for simulations andexperiments

Although the Model CV discussed above is valid for any cli-mate model, the vocabulary necessary to describe an exper-imental framework depends on the experiment context andaims. In contrast to the model description, METAFOR wasnot asked to define a specific vocabulary for experimentsand simulations, the latter being extensively defined in theCMIP5 experiment design document (Taylor et al., 2011a).This document addresses two main sets of experiments,long-term and near-term, further subdivided according to

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014

484 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

Fig. 3. A portion of the model grid CV. Same typographic rules as in Fig.2 are applied. The parameter groupVerticalCoordinateSystemgathers information about the vertical coordinate system used by the model.

distinct scientific purposes: study of a particular time pe-riod (e.g. mid-Holocene, Last Glacial Maximum or 20th cen-tury long-term experiments), analysis of the climate responseto a given forcing scenario (e.g. volcanic eruptions, anthro-pogenic aerosols) or evaluation of model errors and statis-tical significance (e.g. atmosphere-only experiment to iden-tify biases due to coupled mode). Each experiment type ischaracterized by a set of compulsory requirements and ad-ditional recommendations. But even among mandatory re-quirements, some flexibility remains in their concrete im-plementation. METAFOR work consisted firstly of encod-ing CMIP5-defined experiment- and simulation vocabularyas specific CV-XML documents so as to become machinereadable. Secondly, it aimed at capturing the characteristicsof a simulation that is left to the person configuring the sim-ulation. Thirdly, it proposed a way to tell how the simula-tion described meets the experiment requirements it is in-tended to fit; this is ensured by introduction of the “Con-formance” concept. In its current state, CV for conformanceis quite restricted, asking how experiment requirements aremet (if so) as per the mean. Possible choices are “via stan-dard configuration”, “via model modifications”, “via inputs”,“via combination”, “not applicable” or “not conformant”. Itsmain function is to enforce a conformance check by metadataproviders.

The Experiment CV-XML documents containing the spe-cific CMIP5 experiment and simulation CV3 were fixed onceand for all and cannot be modified by the climate modellers;they are ready for ingestion into the CMIP5 Questionnaire(see next section) and conform to the CIM activity pack-age class structure. In this CV, experiments are identifiedby a label, a title, a description and an associated list of re-quirements. Taking the pre-industrial control experiment asan example (see Fig.4), 3.1_pi-Controlstands for the ex-periment label;Pre-Industrial Control: control experimentagainst which perturbations are comparedfor the experi-ment title andPre-Industrial coupled atmosphere/ocean con-trol run. Imposes non-evolving pre-industrial conditionsforthe experiment description. In turn, each requirement has alabel, a type and a description attached. To continue with thesame example: the requirement with label3.1.bc.CO2_conchasBoundaryConditionas requirement type andPrescribedatmospheric concentrations of pre-industrial well mixed gas:Carbon Dioxideas requirement description.

One or more simulations may support the realization ofone particular experiment. Each simulation is identified bya short name, a long name, a description, its DRS membername (“rip” values standing for “realization – initialization

3See http://METAFORclimate.eu/trac/browser/controlled_vocabularies/branches/cmip5/Activity.

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 485

Fig. 4. Tree diagram showing information necessary to identify and document an experiment. Example shown is the CMIP5 pre-industrialexperiment. The experiment is identified by a label, a title, an associated description and the list of requirements to be fulfilled by thesimulations that instantiate this experiment. Each requirement is in its turn identified by a label, a type and a description. Value (text) forthese attributes is fixed once and for all by the CMIP5 experiment protocol. Notice that this tree diagram is just illustrative (it is not a CVmindmap).

method – physics” identifier; seeTaylor et al., 2011b), thename of the model used, the hardware platform on which ithas been executed, the start date, time extent, or end date.Among these attributes only model name and the DRS mem-ber name is controlled vocabulary (defined within the CMIP5experiment protocol, as mentioned above). When an experi-ment requires ensemble runs, one simulation is in its turn de-scribed as composed of one or several simulation members,each one being unambiguously identified by its DRS membername (“rip” value). Ensemble type (with the following possi-ble values:Experiment Driven, Initial Condition, Perturbedboundary Conditions, Perturbed Physicsor Mixed) is an ad-ditional attribute important for capturing in a standard waythe perturbation applied to the ensemble members. Figure5illustrates how these attributes are filled in for an ensemblesimulation labelleddecadal1959that is an instance of the1.1decadal experiment.

5 From controlled vocabulary to metadata

5.1 Creating instances of CMIP5 metadata

To collect metadata for CMIP5 numerical models, simula-tions and experiments, METAFOR has constructed what wasinitially intended to be a “simple questionnaire”. However,it rapidly became clear that a traditional questionnaire basedon a linear collection of information would be completelyinappropriate for the task, given the amount of informa-tion to be collected and given that much of this informationwould have to be shared and compared, for instance across

two simulation descriptions. Moreover, a simple, linear textbased questionnaire would have required a huge effort of “byhand” treatment in order to translate information harvestedinto CIM-instances that ultimately feed the CMIP5 metadatadatabase (see Sect.5.2 for details on information workflow).Thus a more complex tool was needed, and clearly that toolhad to be based on the controlled vocabularies defined forCMIP5 and described in Sect.4. The name has remained,but the “CMIP5 Questionnaire” should be thought of as acomplex metadata entry tool, reproducing the CIM syntaxstructure and syntax and able to make links between meta-data objects referring each other.

The resulting questionnaire provides support for harvest-ing all aspects a modeller controls when he or she performsa CMIP5 experiment (see Fig.6): the model(s) used (includ-ing the coupling system), its associated grids, the computa-tional platform it has been run on, the different simulationsperformed and the experiment they are related to, the inputdata files and, optionally, the CF standard names of the vari-ables in the file used as a model component input. It allowsusers to interactively produce CIM metadata documents (seeLawrence et al., 2012, for an explanation of the term “docu-ment” in this context) without any knowledge of CIM struc-tures. The CMIP5 Questionnaire has been built using thepython Django web framework (http://www.djangoproject.com/), deployed at the British Atmospheric Data Centre(BADC) and is available online athttp://q.cmip5.ceda.ac.uk/.

An illustration of how the Model CV is exploited to buildthe CMIP5 Questionnaire pages is shown in Fig.7a. The endresult is that the structure of the model component pages in

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014

486 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

Fig. 5. Tree diagram showing attributes used to describe an ensemble simulation. Reproduced here is what the CERFACS group filled in(blue text with a red pencil icon) or what it selected (blue text with a green tick mark icon) through the CMIP5 Questionnaire interface (seeSect. 5). What can be deduced from the information given is that the simulation labelleddecadal1959realizes a1.1 decadalexperiment usingtheCNRM-CM5model and was run onNEC-SX8-MFplatform. The simulation duration is 30 years (from beginning of 1960 till the end of1989). The ensemble is made of 10 members, each being identified by a uniquerip value; rip of the first member is used as the identifier ofthe ensemble. Members can be distinguished by their initial condition.Histnud_1959is a mnemonic that refers to anInput modificationtheusers has previously registered. It provides details about the difference between members of the ensemble (here different initial atmosphericstates). Notice that this tree diagram is just illustrative (it is not a CV mindmap).

Fig. 6.Partial view of the CMIP5 Questionnaire summary page for CNRM-CERFACS modelling group and its CNRM-CM5 coupled model(Voldoire et al., 2011)

.

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 487

the questionnaire – in terms of, for example, the hierarchypresented and the order of the parameters asked about – iscompletely controlled by the originating CV mindmap. Thisflexibility has, of course, been crucial in the development ofthe questionnaire.

Figure 7a shows the page corresponding to theSeaIceThermodynamicscomponent taken as the exam-ple when discussing the Model CV definition process(Fig. 2). The navigation tree on the left provides a hier-archical view of the possible component structure of anearth system model. It strictly reflects the CV structure ofthe eight realm components as fixed in the mindmaps. Thefirst three frames (from the top of the page) are for genericquestions, common to all components (either realm or child):user-defined component names (the component type, hereSeaIceTermodynamicsbeing fixed) and which grid is usedby the current component. The next three frames, zoomed inFig. 7b, contain questions entirely driven by the CV for thatcomponent. For example, the fifth frame that asks a questionabout the SchemeType (snow-aging, snow-ice or Other)mirrors theSnowparameter group.

As explained above, the CMIP5 Questionnaire helps themodellers to describe their model using the CV. The ques-tionnaire is also extensible, however, offering the possibil-ity for the user to define parameter-value attributes for eachcomponent, and indeed arbitrary additional component struc-tures. Obviously, such flexibility is not in line with the cur-rent main scope of standardization. Nevertheless, we consid-ered it important to allow the user to add information thathas not been anticipated by the METAFOR CV. Moreover,additional user inputs can help identifying parts of the CVthat will need to be completed or changed in an after-CMIP5perspective.

The questionnaire also uses the specific CV defined for thesimulation descriptions. The way a given simulation meetsthe CMIP5 requirements of an experiment is described by aso-called “Conformance” (see Sect. 4.2). Conformance canbe reached via modifications of model inputs, changes in themodel parameters, slight modifications of the code itself, orvia a combination of those. A simulation may not even fullyconform to its experiment (for instance when the data pro-ducer realizes afterwards, when checking the long list of re-quirements, that his simulation missed one of them). In thislater case “not conformant” is the minimal amount of infor-mation to provide. Figure8 illustrates how the conformanceof a simulation namedPICTL to requirements of thePre in-dustrial Controlexperiment is captured by the questionnaire.

5.2 The information pipeline

It is clear that the METAFOR CV has been built with theintent to go beyond simple vocabulary collection usage. In-deed, it is targeted at automatic ingestion by downstreamtools (the CMIP5 Questionnaire – discussed in Sect.5.1)and for inclusion into OWL (Web Ontology Language)

ontologies, e.g. as used in the ESG/CURATOR portal thenin use. The Experiment and Simulation CV were fixed by theCMIP5 protocol and are not likely to evolve in the CMIP5time frame (hence it was created and stored directly in XMLwithout extra tooling). The CV built for model description is,on the other hand, intentionally managed in a different way(i.e. in mindmaps, see Sect.4), independently from the soft-ware tools using them. The objective is to ensure separationof concerns between building and usage so that the semanticdatabase (the Model CV) and the tools using them (the ques-tionnaire) or hosting them (the CIM) can evolve on their owntimeline. However, the mindmap format cannot directly feedthese downstream tools: format conversion into a machine-readable format was required. To that end, we developedthe software to support the information pipeline illustratedin Fig. 9. This tool chain can be found on the METAFORSVN repository athttp://metaforclimate.eu/trac.

To satisfy CMIP5 Questionnaire needs, a simple XML-CV structure was defined to encode the Model CV basedon the mindmap rules and constraints described earlier. Amindmap validator (top-left grey box in Fig.9), written inXSLT and invoked by Python, was implemented to check thata specific mindmap (top-right red box in Fig.9) conforms tothe defined encoding rules (see Sect.4.1). If a feature in themindmap missed a rule (e.g. an element coded as leaf param-eter having a child element) the person responsible for theCV mindmap is asked to make appropriate corrections. Oncethe validation step is passed, a mindmap translator (top-rightgrey box in Fig.9) rewrites the mindmap information into anXML file (middle-right red box in Fig.9), suitable for inges-tion in the questionnaire (middle orange box in Fig.9). TheseCV-XML documents are then imported into Django tablesand are used to automatically build the questionnaire graph-ical interface part related to the model description. Oncefilled in, the questionnaire supports three levels of valida-tion (validate in the middle-left, Fig.9): (i) the CV con-straints are directly enforced while filling in the componentdescription (e.g. a page cannot be saved if text is providedwhere a numeric value is expected); (ii) when documentsare exported as XML files, a validation against the CIMXSD (http://www.w3.org/TR/xmlschema11-1/) is automati-cally enforced; (iii) a Schematron (http://www.schematron.com)-based validation is performed to check deeper levels ofcoherency between the different parameters.The schematronvalidation ensures that parameters relevant only for a givencondition are only filled when this condition is met. For ex-ample, in the description of the vertical grid, aSurfaceRef-erenceis asked only if theVerticalCoordinateTypeis mass-based. The pages of the questionnaire being non-dynamic,the schematron function is to check coherency between re-sponses given by the person filling in the questionnaire.

To ensure usage by the ESGF gateway interfaces andfaceted browsing (Williams et al., 2011), a tool was devel-oped to convert the METAFOR Model CV into an OWL on-tology (bottom-right red box in Fig.9). This ontology was

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014

488 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

Fig. 7a.How the model pages of the CMIP5 Questionnaire automatically inherit from the CV mindmap organization. Components’ hierarchy(realm and child components) determines the model navigation tree (left column, enhanced in the zoom).

Fig. 7b. Continuing Fig.7a. Each model component mindmap provides the content of the corresponding questionnaire page, and parametergroups in the component mindmap determine the frames in the page; mindmap leaf parameters define the requested information lines in theframes; list of possible CV values for a given parameter forms the content of drop-down menus (enhanced in the zoom).

also used to guide the mapping tool which allowed the con-version of CIM documents into gateway RDF (Resource De-scription Framework,http://www.w3.org/TR/rdf-mt/) triplestores (Lawrence et al., 2012). The conversion of Model CVinto OWL was then the decisive step for the final adop-tion of METAFOR CV as CMIP5 metadata for models and

simulations (bottom-right yellow box in Fig.9). Finally,CIM-compliant documents, conforming to the CMIP5 DRS,were broadcast as “atom feeds”, and the corresponding meta-data were ready to be included in the CMIP5 metadata cata-logue deployed on the ESG portal.

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 489

Fig. 8. Illustration of the “Conformance” concept in the case of aPICTL simulation performed with CNRM-CM5 model in the frameworkof 3.1 piControl CMIP5 experiment. For the three requirements shown, the conformance is ensuredvia inputs, which means that the inputfiles used contain the forcings requested. Note that a free-text area to enter additional details is always provided.

Since the original tool chain was developed, a new toolchain has been deployed. The CIM-compliant XML docu-ments are now stored in a database, and extracted and dis-played in client portals via JavaScript code which loads thedocuments across the net, and then displays them.

6 Summary and further work

CMIP5 was conducted by 20 modelling groups that producedabout 90 000 years of simulation for a total volume of severalpetabytes. A CMIP5 climate data user is faced with a largeamount and large diversity of data sets archived in CMIP5data-node centres. In this context, the METAFOR missionwas to provide a metadata system to support data preserva-tion, data reuse (both in time and by different research com-munities), data readability and discovery, and to guaranteethe data quality (or conformity). Until now, such an inte-grated metadata system for climate modelling was missing.The controlled vocabulary for model and simulation shouldbe considered as necessary raw material for such a system.

This paper introduced the controlled vocabulary developedboth for generic description of earth system models and asinput for the tool developed to collect this description forCMIP5 models and simulations (the CMIP5 Questionnaire).The mindmap technology used facilitated the CV develop-ment, ensuring a wide engagement of the scientific com-munity in this process, hiding away the complexity of theunderlying ontological concepts (the CIM). The metadata

pipeline, which starts from the mindmaps, serves both themetadata entry tool (the CMIP5 Questionnaire) and metadatacatalogues such as the ESGF gateways. The cornerstone ofthe METAFOR CV has indisputably been the engagementof a large number of modellers from the climate commu-nity since the early stage of the CV elaboration process. TheCV collection produced at the end of the METAFOR projectgathers thousands of terms for which hierarchical arrange-ment is equally important as the terms themselves. Even ifit can be improved further, METAFOR CV is the first oneto address the whole climate modelling chain. Available inCMIP5 metadata catalogues and supporting data discoverytools, the hope is to provide essential services to climate datausers.

There are two significant pieces of work yet to be done be-fore the CV can be easily governed and maintained. Firstly,a conversion tool taking the CV XML back to the mindmapformat would support the ability to convert between all CVformats. This tool would allow use of the CV XML asthe primary preservation and governance artefact, generat-ing mindmaps from those XML instances for websites andhuman-mediated discussions for example. Secondly, we needto formalize an http interface for the CV following appro-priate standards (seeLeadbetter et al., 2011). Secondly, themaintenance and governance of the controlled vocabularyand of the associated metadata pipeline needs addressing.Gathering feedback from the questionnaire users and findingways to benefit from this feedback to make the CV evolve

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014

490 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

Fig. 9.Key components of the CV and information pipeline from METAFOR CV (top yellow box) to CMIP5 metadata (bottom-right yellowbox).

should also be strongly considered. The intended focus is theset-up of a real standard, which requires a governance com-mittee to emerge (as planned in the framework of EU-FP7ENES2 project). For now, the METAFOR work is extendedwithin the UK JISC-funded PIMMS project (Portable Infras-tructure for the METAFOR Metadata System).

Although populating the CMIP5 Questionnaire was notmandatory, but highly recommended by the CMIP5 panel(i.e. not blocking for CMIP5 model outputs publication),about 70 % of the modelling groups contributing to CMIP5provided metadata through the questionnaire and more thana thousand CIM documents are stored in the CMIP5 doc-umentation repository by this way. An overall 78 % of thepublished documents are attached to the description of ex-periments and simulations and 12 % describe the models andtheir grids. Only 2 groups provided a description for theirmodel but not for the simulations they performed, and 8groups did not provide any metadata at all. Further diagnos-tics to measure quality and completion rate of the metadatadocuments would be advisable. A key point for a wider ac-ceptance of the metadata harvesting procedure (the question-naire and the underlying CIM) is certainly to limit the effortasked to metadata providers. With the CMIP5 Questionnaire,the effort required from the provider was indisputably toostrong. The logic of the information flow and connectionsbetween formal concepts was viewed as somewhat complex.Lessons are to be learned from the METAFOR experiencein the context of CMIP5 that should be reinvested in futureprojects.

While the application of CMIP5 has dominated mostof the development thus far, next generations of thequestionnaire are currently being developed by the ES-DOC community (Earth System Documentation,http://earthsystemcog.org/projects/es-doc-models/). Initiated dur-ing the METAFOR project, specific CV is being developedto describe the models and simulations used in the EN-SEMBLES EU project (http://www.ensembles-eu.org). TheUS NCPP project (National Climate Predictions and Pro-jections, http://earthsystemcog.org/projects/ncpp/) and EUEURO-CORDEX (Coordinated Downscaling Experiment– European Domain,http://www.euro-cordex.net) are alsoagreeing on statistical and dynamical downscaling CV forregional climate studies. Finally, one can expect that theMETAFOR CV for global climate models will be reused inupcoming or recent EU FP7 initiatives dedicated to climateservices as the SPECS project (Seasonal-to-decadal climatePrediction for the improvement of European Climate).

Acknowledgements.METAFOR was funded by the EU 7thFramework Programme as an e-infrastructure (project #211753).The support of the EU FP7 IS-ENES (project #228203) is alsoacknowledged. This work benefited significantly from the engage-ment of other METAFOR members and colleagues from the USEarth System Curator project. We also appreciated guidance fromthe METAFOR advisory committee, in particular Wilco Hazelegerand Karl Taylor.

Edited by: M. Kawamiya

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 491

The publication of this article isfinanced by CNRS-INSU.

References

Balaji Institute: Gridspec – A standard for the description of gridsused in Earth System models, available at:http://www.gfdl.noaa.gov/~vb/gridstd/gridstd.html(last access: 18 March 2014),GFDL 2007.

Boucher, O. and Pham, M.: History of sulfate aerosol ra-diative forcings, Geophys. Res. Lett., 29, 22.1–22.4,doi:10.1029/2001GL014048, 2002.

Callaghan, S. A., Treshansky, A., Moine, M.-P., Guilyardi, E., Alias,A., Balaji, V., Bojariu, R., Cofiño, A. S., Denvil, S., Elkington,M., Ford, R., Kolaninski, M., Lautenschlager, M., Lawrence, B.N., Steenman-Clark, L., and Valcke, S.: The METAFOR project:preserving data through metadata standards for climate modelsand simulations, in: INTL-DPIF ’10 Proceedings of the 1st Inter-national Digital Preservation Interoperability Framework Sym-posium, article No. 6, doi:10.1145/2039263.2039269, 2010.

Cariolle, D. and Déqué, M.: Southern hemisphere medium-scalewaves and total ozone disturbances in a spectral general circu-lation model, J. Geophys. Res., 91, 10825–10846, 1986.

Cariolle, D., Lasserre-Bigory, A., Royer, J.-F., and Geleyn, J.-F.: Ageneral circulation model simulation of the springtime Antarcticozone decrease and its impact on mid-latitudes, J. Geophys. Res.Atmos., 95, 1883–1898, 1990.

Dunlap, R., Mark, L., Rugaber, S., Balaji, V., Chastang, J., Cinquini,L., DeLuca, C., Middleton, D., and Murphy, S.: Earth systemcurator: metadata infrastructure for climate modeling, Earth Sci.Inf., 1, 131–149, doi:10.1007/s12145-008-0016-1, 2008.

Ford, R. W. and Riley, G. D.: The Bespoke Framework Gener-ator, in: Earth System Modelling, Vol. 3, Coupling Softwareand Strategies, Series: SpringerBriefs in Earth System Sciences,ISBN 978-3-642-23359-3, 2011.

Guilyardi, E., Balaji, V., Callaghan, S., DeLuca, C., Devine, G.,Denvil, S., Ford, R., Pascoe, C., Lautenschlager, M., Lawrence,B. N., Steenman-Clark, L., and Valcke, S.: The CMIP5 modeland simulation documentation: a new standard for climate mod-eling metadata, CLIVAR Exchanges, 16, 42–46, 2011.

Lawrence, B. N., Balaji, V., Bentley, P., Callaghan, S., DeLuca, C.,Denvil, S., Devine, G., Elkington, M., Ford, R. W., Guilyardi,E., Lautenschlager, M., Morgan, M., Moine, M.-P., Murphy, S.,Pascoe, C., Ramthun, H., Slavin, P., Steenman-Clark, L., Tous-saint, F., Treshansky, A., and Valcke, S.: Describing Earth systemsimulations with the Metafor CIM, Geosci. Model Dev., 5, 1493–1500, doi:10.5194/gmd-5-1493-2012, 2012.

Leadbetter, A., Clements, O., and Lowry, R.: Emerging standardsin vocabulary server access methods, Geophys. Res. Abstr.,EGU2011-A-2143, EGU General Assembly 2011, Vienna, Aus-tria, 2011.

Levitus, S.: Climatological atlas of the world’s oceans, NOAA Pro-fessional Paper 13, 173 pp., available at:ftp://ftp.nodc.noaa.gov/pub/data.nodc/woa/PUBLICATIONS/levitus_atlas_1982.pdf(last access: 18 March 2014), 1982.

Redler, R., Valcke, S., and Ritzdorf, H.: OASIS4 – a coupling soft-ware for next generation earth system modelling, Geosci. ModelDev., 3, 87–104, doi:10.5194/gmd-3-87-2010, 2010.

Taylor, K. E. and Doutriaux, C.: CMIP5 Model Output Re-quirements: File Contents and Format, Data Structureand Metadata, available at:http://cmip-pcmdi.llnl.gov/cmip5/docs/CMIP5_output_metadata_requirements.pdfandhttp://pcmdi-cmip.llnl.gov/cmip5/docs/standard_output.pdf(lastaccess: 18 March 2014), 2010.

Taylor, K. E., Stouffer, R. J., and Meehl, G. A.: An Overview ofCMIP5 and the Experiment Design, B. Am. Meteorol. Soc., 93,485–498, doi:10.1175/BAMS-D-11-00094.1, 2011a.

Taylor, K. E., Balaji, V., Hankin, S., Juckes, M., Lawrence, B.N., and Pascoe, S.: CMIP5 Data Reference Syntax (DRS)and Controlled Vocabularies, available at:http://pcmdi-cmip.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf(last access:18 March 2014), 2011b.

Valcke, S., Balaji, V., Craig, A., DeLuca, C., Dunlap, R., Ford, R.W., Jacob, R., Larson, J., O’Kuinghttons, R., Riley, G. D., andVertenstein, M.: Coupling technologies for Earth System Mod-elling, Geosci. Model Dev., 5, 1589–1596, doi:10.5194/gmd-5-1589-2012, 2012.

Voldoire, A., Sanchez-Gomez, E., Salas y Mélia, D., Decharme, B.,Cassou, C., Sénési, S., Valcke, S., Beau, I., Alias, A., Cheval-lier, M., Déqué, M., Deshayes, J., Douville, H., Fernandez, E.,Madec, G., Maisonnave, E., Moine, M.-P., Planton, S., Saint-Martin, D., Szopa, S., Tyteca, S., Alkama, R., Belamari, S.,Braun, A., Coquart, L., and Chauvin, F.: The CNRM- CM5.1global climate model: Description and basic evaluation, Clim.Dynam., 40, 2091–2121, doi:10.1007/s00382-011-1259-y, 2011.

Williams, D. N., Lawrence, B. N., Lautenschlager, M., Middleton,D., and Balaji, V.: The Earth System Grid Federation: Deliveringglobally accessible petascale data for CMIP5, in: Proceedingsof the 32nd Asia-Pacific Advanced Network Meeting, 121–130,New Delhi, doi:10.7125/APAN.32.15, 2011.

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014

492 M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling

Appendix A

List of climate scientists involved in the METAFOR con-sultation process

The METAFOR project members would like to express theirsincere thanks to all the climate scientists who contributedin a significant way to the METAFOR controlled vocabularyelaboration process, sharing their knowledge without restric-tion and providing excellent guidance and recommendations(in alphabetical order, Table A1 below).

Table A1. List of climate scientists who contributed to theMETAFOR controlled vocabulary, whether during face to face in-terviews, phone calls or e-mail exchanges.

Abrahams, Luke, UKCA, UKBalaji, V., GFDL, USABoone, Aaron, CNRM, FranceBopp, Laurent, LSCE-IPSL, FranceBraesicke, Peter, UKCA, UKBruehl, Christoph, UKCA, UKBuja, Lawrence, NCAR, USADecharme, Bertrand CNRM, FranceDéqué, Michel CNRM, FranceElkington, Mark, MetOffice, UKFichefet, Thierry, UCL-LLN, BelgiumGibelin, Anne-Laure, CNRM, FranceGoosse, Hugues, UCL-LLN, BelgiumGriffies, Stephen, GFDL, USAGuilyardi, Eric, LOCEAN-IPSL, FranceHagemann, Stefan, MPI, GermanyHorowitz, Larry, GFDL, USAHourdin, Fredéric, LMD-IPSL, FranceKageyama, Masa, LSCE-IPSL, FranceKhodry, Myriam, IPSL, FranceKrinner, Gerhard, LGGE, FranceLawrence, Bryan, NCAS-BADC, UKMadec, Gurvan, LOCEAN-IPSL, FranceMalyshev ,Sergey, GFDL, USAMann, Graham, Univ. of Leeds, UKMarti, Olivier, LSCE-IPSL, FrancePeuch, Vincent-Henri, CNRM, FrancePolcher, Jan, LMD-IPSL, FranceRitz, Catherine, LGGE, FranceSalas Y. Melia, David, CNRM, FranceSlawitch, Ross, Univ. Maryland, USAStrand, Gary, NCAR, USAVan Velthoven, Peter, KNMI, the NetherlandsVancoppenolle Martin, UCL-LLN, BelgiumWyman Bruce, GFDL, USA

Appendix B

CMIP3 text-based questionnaire

Model Information of Potential Use to the IPCC Lead Au-thors and the AR4.

CNRM-CM3 (version used for IPCC AR4)2 August 2005Model identity:

A. Institution, sponsoring agency, country: Centre Na-tional de Recherches Météorologiques, Météo France,France

B. Model name (and names of component atmospheric,ocean, sea ice, etc. models): CNRM-CM3

Atmosphere: ARPEGE-Climat version 3

Ocean: OPA 8.1

Sea ice: GELATO 2

C. Vintage (i.e. year that model version was first used ina published application): 2004

D. General published references and web pages:

http://www.cnrm.meteo.fr/scenario2004/references_eng.html

E. References that document changes over the last 5years (i.e. since the IPCC TAR) in the coupled modelor its components. We are specifically looking for ref-erences that document changes in some aspect(s) ofmodel performance.

– descriptions of previous versions of theARPEGE-Climat model can be found inthe following publications:

– Déqué et al. (1994),

– Déqué and Piedelièvre (1995),

– Royer et al. (2002).

F. IPCC model version’s global climate sensitivity (KW-1 m2) to increase in CO2 and how it was determined(slab ocean expt., transient expt–Gregory method,±2 K Cess expt., etc.): not yet available

G. Contacts (name and email addresses), as appropriate,for:

1. coupled model: David Salas y Melia,[email protected]

2. atmosphere : Michel Déqué,[email protected]

3. ocean : David Salas y Melia,[email protected]

Geosci. Model Dev., 7, 479–493, 2014 www.geosci-model-dev.net/7/479/2014/

M.-P. Moine et al.: Development and exploitation of a controlled vocabulary in support of climate modelling 493

4. sea ice: David Salas y Melia,[email protected]

5. land surface: Hervé Douville,[email protected]

6. vegetation: Hervé Douville,[email protected]

7. other?

Besides atmosphere, ocean, sea ice, and prescriptionof land/vegetated surface, what can be included (inter-actively) and was it active in the model version thatproduced output stored in the PCMDI database?

A. Atmospheric chemistry?

– Ozone transport with simplified chemistry as de-scribed in Cariolle and Déqué (1986) and Cari-olle et al. (1990).

B. Interactive biogeochemistry?

– no

C. What aerosols and are indirect effects modelled?

– The distributions of marine, desertic, urbanaerosols, sulfate aerosols are specified. Ma-rine and desertic aerosols are constant in allexperiments. Urban aerosols vary accordingto estimates between 1860 and 2000. Sulfateaerosols are specified in all experiments accord-ing to Boucher and Pham (2002) data, seehttp://www-loa.univ-lille1.fr/boucher/sres/for moredetails. Note that only the direct effect of anthro-pogenic sulfate aerosols was taken into account.

D. Dynamic vegetation?

– no

E. Ice sheets?

– fixed

[. . . ]

Component model characteristics (of current IPCC modelversion):

A. Atmosphere

1. Resolution: triangular truncation T63 with “lin-ear” reduced Gaussian grid equivalent to T42quadratic grid

2. Numerical scheme/grid (advective and time-stepping schemes; model top; vertical coordinateand number of layers above 200 hPa and below850 hPa):

– semi-Lagrangian semi-implicit time integra-tion with 30 min time step, 3-hour time stepfor radiative transfer;

– top layer 0.05 hPa, progressive hybridsigma-pressure vertical coordinate with 45layers, 23 layers above 200 hPa, usually 7layers below 850 hPa (less in regions of highorography)

3. List of prognostic variables (be sure to include,as appropriate, liquid water, chemical species,ice, etc.). Model output variable names are notneeded, just a generic descriptive name (e.g. tem-perature, northward and eastward wind compo-nents, etc.)

– temperature, northward and eastward windcomponents, specific humidity, ozone con-centration, surface pressure

4. Name, terse descriptions, and references (journalarticles, web pages) for all major parameteriza-tions. Include, as appropriate, descriptions of:

a. Clouds:– statistical cloud scheme for stratiform

clouds based on Ricard and Royer(1993). Convective cloud cover basedon the mass-flux transport

b. Convection– mass-flux convective scheme with Kuo-

type closure based on Bougeault (1985)boundary layer based on Louis et al.(1982) with modifications by Mascartet al. (1995). SW, LW radiation basedon Fouquart and Morcrette parameter-izations implemented in a former ver-sion of the ECMWF model (MorcretteJJ, 1990; Morcrette JJ, 1991)

c. any special handling of wind and tempera-ture at top of model:

– relaxation of temperature, linear(Rayleigh) friction for wind

Simulation details (report separately foreach IPCC simulation contributed todatabase at PCMDI)Picntrl/Run_1

This pre-industrial control simulation was initialized froma coupled simulation of a previous version of CNRM coupledmodel that initialized an ocean at rest with temperature andsalinity profiles specified from Levitus (1982) climatology,integrated for 30 years with a relaxation of surface temper-ature to the monthly mean Reynolds climatology for 1950.The CNRM-CM3 version was then integrated for 70 yearswith pre-industrial 1860 greenhouse gases concentrations asa spin-up. After this spin-up period, results were stored fromnominal years 1930 to 2429.

www.geosci-model-dev.net/7/479/2014/ Geosci. Model Dev., 7, 479–493, 2014