Scott HausmanActing DirectorNational Climatic Data Center
Comprehensive Large-Array Data Stewardship System (CLASS) Update
DAARWG Meeting
December 10, 2010
National Climatic Data CenterDAARWG Meeting: CLASS Update 2
Overview• Background• System Development• Operational Integration• Recent Accomplishments• Future Direction• Next Steps
December 10, 2010
National Climatic Data CenterDAARWG Meeting: CLASS Update 3
Background
• No place to put mountain of data. Manage rapidly growing data volume of major observing and modeling systems
• Every program solving the same problem. Eliminate various "stove-pipe” systems and produce a unified "enterprise” data access system to reduce IT cost• Satellite Active Archive (SAA) • GOES Active Archive (GAA)• Earth Observing System (EOS) Archive
• Can’t access the data we have. Centralize NOAA’s numerous data systems for environmental data access--create a single portal
• Don’t break anything. Retain, as much as possible, portions and modules of existing legacy systems
December 10, 2010
Vision
National Climatic Data CenterDAARWG Meeting: CLASS Update 4
Background
• Scope. Enterprise-wide IT system supporting long-term, secure storage of and common access to environmental datasets and information stewarded by NOAA’s Archives.
1. Large Data Campaigns. Satellites (NPP/JPSS, GOES, POES, DMSP, MetOp), Radar (NEXRAD), Models
2. Enterprise Approach• Providing common services for development and operation
of IT systems supporting NOAA Archives• Consolidating legacy archival storage systems• Relieving data producers of responsibility for archival
December 10, 2010
Level 1 Requirements
National Climatic Data CenterDAARWG Meeting: CLASS Update 5
Background
• The CLASS support contract was awarded competitively on June 20, 2008 to Diversified Global Partners (DGP) JV LLC
• Small business set-aside – 8(a) mentor-protégé program
• Protégé Company: DB Consulting Group• Mentor Company: Global Science and
Technology (GST) Inc.• Potential nine-year period of performance
– Now in year 3:• Base Year • Four (4) one-year Option Periods• Four (4) one-year Award Term Option
Periods• Indefinite Delivery/Indefinite Quantity
(IDIQ) contract • Maximum Ordering Volume of $200M
(maximum of nine years)• Cumulative Tasking to date valued at
$42.0M
December 10, 2010
Contract
NGDC (Ops)Boulder, CO
NCDC (Ops)Asheville, NC
Fairmont, WV (Devel)
NSOF (Devel/Ingest)Suitland, MD
National Climatic Data CenterDAARWG Meeting: CLASS Update 6
Development• Systems• Software• Data Integration
December 10, 2010
National Climatic Data CenterDAARWG Meeting: CLASS Update 7
Software Development
• CLASS Software Evolution (CLASS-SE). Five-year project to establish configurable ingest
• NOAA Enterprise Archive Access Tool (NEAAT). Enterprise Application Program Interface (API)
December 10, 2010
Open Archive Information System Reference Model (OAIS-RM)
Stakeholders
CLASS System
Prod
ucer
Cons
umer
Ingest
Preservation Planning
Administration
Data Management
Archival Storage
Access
System Administrators
Data Stewards
OAIS-RM
National Climatic Data CenterDAARWG Meeting: CLASS Update 8
Software Development
• CLASS-SE provides configurable ingest capability
• Configurability reduces development costs for storage and access of new data types
• Applications, or services, are ‘data agnostic’ and may be applied against selected data types
• Workflow engine supports both• Collective system operations &
resource allocation• Allocates services from 1 to N
instances of CLASS nodes • Individual Nodes may perform all
or sub-set of system capabilities December 10, 2010
CLASS Software Evolution (CLASS-SE)
National Climatic Data CenterDAARWG Meeting: CLASS Update 9
Software Development
• Enterprise access to all NOAA archive storage systems
• Satisfies L1RD Requirements• CLASS Access Interface• Support to GEO/IDE• Interface to Legacy Systems
• Supports both data access and stewardship applications
• Service-Oriented Architecture (SOA) Middleware• Simple plugin adaptor in integration layer
provides interface to NEAAT• Support for open source tool kits (i.e.,
OGC)• OPeNDAP protoype provides access to
climate model data data through NOMADS (National Operational Model Archive & Distribution System)
December 10, 2010
NOAA Enterprise Archive Access Tool (NEAAT)
Legacy Systems
NEAAT
Plugin
SatellitesCFSR
CLASS
NARR
NOMADS
NCDC
HDSS
NCEPModels
ESGINE
IDEASESSE
SPIDR
Plugin Plugin PluginPluginStandardProtocols(OPeNDAP)
Customer Application
Data Migration
National Climatic Data CenterDAARWG Meeting: CLASS Update 10
Data Integration
December 10, 2010
Data Campaigns (L1RD 5.1.2, 11/6/08)
Data Set Phase Status
GOES complete operational
POES complete operational
DMSP complete operational
MetOp complete operational
EOS MODIS - canceled
Jason 1&2 complete operational
Jason 3 not started undefined, expected volume & complexity
NPP in progress see next slide
JPSS (NPOESS) planning requirement added to JPSS L1RD
GOES-R in progressCompleted Systems Definition Review & Systems Requirement Review, preparing for Preliminary Design Review
NEXRAD planning
NCEP Models planning CFSR complete, need to refine requirement
National Climatic Data CenterDAARWG Meeting: CLASS Update 11
Data Integration
December 10, 2010
New AcquisitionsCLASS Charter ProjectsUpdated: 2010-11-12
# PROJECT PROVIDER REQUEST
CHARTER REQUEST
CHARTER DELIVERY
COPB BRIEFING
COPB DECISION
DECISION INGEST START
COMMENTS DATA CENTER
STATIC VOLUME (TB)
ANNUAL VOLUME (TB)
1 National Ice Center Sea Products and Ice Charts 10/12/2010 TBD TBD TBD TBD TBD TBD Waiting on info from provider C TBD TBD
2 OSDPD Blended Total Precipitable Water (TPW) and TPW Anomalies 6/23/2010 7/1/2010 TBD TBD TBD TBD TBD Data are ready on Provider side C 0.09 0.25
3 OSDPD Global Soil Moisture Product System (SMOPS) 10/25/2010 12/31/2010 TBD TBD TBD TBD 8/15/2011 C N/A 0.008
4 OSDPD GOME-2 Bromine and Nitrogen Dioxide Products 12/11/2009 11/30/2010 TBD TBD TBD TBD 4/30/2011 On hold in 2010 due to OSDPD freeze C N/A 0.1
5 NCEP CFS Reanalysis 9/29/2010 5/13/2010 10/18/2010 6/25/2009 6/25/2009 yes 12/31/2009 Complete C 175 N/A
6 NCEP CFS Reforecast (High Priority) 9/29/2010 5/13/2010 10/18/2010 6/25/2009 6/25/2009 yes 11/19/2010 C 65 N/A
7 NCEP CFS Reforecast (Low Priority) 9/29/2010 5/13/2010 10/18/2010 6/25/2009 6/25/2009 yes TBD C 400 N/A
8 NCEP CFS Forecast and Analysis (Operational) 11/11/2010 TBD TBD TBD TBD TBD TBD Support in question C N/A 155
9 ERSL 20th Century Reanalysis, ver 2 10/6/2009 5/13/2010 10/18/2010 6/25/2009 6/25/2009 yes TBD Data are ready on Provider side C 130 N/A
10 ERSL Global Reforecast 8/23/2010 TBD TBD TBD TBD TBD TBD COPB decision needed C 912 N/A
11 Advanced Clear-Sky Processor for Oceans (ACSPO) ? 3/15/2010 4/30/2010 ? ? yes ? O N/A ?
12 Vertical Incidence Pulsed Ionopheric Radar (VIPIR) ? 4/13/2010 10/22/2010 TBD TBD TBD ? G N/A ?
13 JAXA Global Change Observation Mission - Water (GCOM-W) ? 9/1/2010 11/15/2010 TBD TBD TBD ? C, ? N/A ?
14 JAXA Global Change Observation Mission - Cliamte (GCOM-C) ? TBD TBD TBD TBD TBD ? C, ? N/A ?
15 NPP RIPs and Cal/Val 12/15/2008 6/3/2010 6/10/2010 10/22/2010 10/22/2010 yes 10/25/2011 C N/A 565
16NPOESS Data Exploitation (NDE), Ph1 6/24/2009 TBD TBD TBD TBD TBD 10/25/2011 C, O N/A 40
17NPOESS Data Exploitation (NDE), Ph2 TBD TBD TBD TBD TBD TBD 10/25/2011 C, O N/A TBD
18 NPP ? TBD TBD TBD TBD TBD 10/25/2011 C, ? N/A ?
19 NPOESS / JPSS ? TBD TBD TBD TBD TBD ? C, ? N/A ?
20 CORS ? ? ? ? ? yes TBD waiting on data steward G N/A ?
21 GOES-R ? TBD TBD TBD TBD TBD 12/15/2015 All instruments and supporting data C, G, O N/A 2100
Past Event
National Climatic Data CenterDAARWG Meeting: CLASS Update 12
Recent Accomplishments
• First climate model data archived on CLASS; access through NOMADS• 245 TB of CFS reanalysis in tape archive• 100 TB of “high-priority” data available on
disk for rapid access• Just began ingest of reforecast
• NCDC-NCEP-CLASS Project partnership• Reuse of existing spinning disk• Agile Software Development; rapid
integration of open source access protocol• One of NCEPs most important data sets• Significant jump in NCDC data access
December 10, 2010
Archiving Climate Forecast System Reanalysis And Reforecast (CFSRR)
National Climatic Data CenterDAARWG Meeting: CLASS Update 13
Resent Accomplishments
• First NOS data set in CLASS• Complete: CORS File Naming Convention
document signed May 2010• NGDC has established a centralized ingest
interface to CLASS • In-process: Interface Control Document• CORS goals for CLASS version 5.4 release:
• archive of forward-looking RINEX files (3173 daily files) and metadata
• daily ingest ~ 49.5 GB/day• Future CORS goals, pending success of CLASS
archive:• archive of forward-looking binary files• archive of historical RINEX and binary files• archive of NGS reanalysis data• current NGDC archive total: ~69.0 TB
December 10, 2010
Continuously Operating Reference Stations (CORS)NGDC archive of CORS dataset as of 12/31/2009
RINEX = 52.2 TB and binary = 7.8 TB
02000400060008000
1000012000140001600018000
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
year
GB
(u
nco
mp
ress
ed)
Binary RINEX Total (RINEX + binary)
National Climatic Data CenterDAARWG Meeting: CLASS Update 14
Future Direction
December 10, 2010
Current Architecture
Simple model based on preservation through two-site replication.
IngestNode NSOF
FullNodeNCDC Full
Node NGDC
Replication
Archive Data Sources
National Climatic Data CenterDAARWG Meeting: CLASS Update 15
Future Direction
December 10, 2010
Potential Architecture
DataStewardship
Sub-Node(Federated)
Center ofData
ProcessingNode
DataProcessing
IngestNode
DataProducer
FullNode
DataCenter
Cloud Access
• Increase distribution of nodes
• Federate with Centers of Data providing tiers of service
• Exploit cloud resources for faster access
• Becoming more H/W agnostic
National Climatic Data CenterDAARWG Meeting: CLASS Update 16
Future Direction
• NCDC partnership with RENCI and DataNet Consortium
• Prototype “system of systems” framework; federation of NOAA data systems with NOAA archive using iRODS (Integrated Rule-Oriented Data System)
• Connectivity to data systems such as RENCI, ORNL, OOI, and Earth System Grid (ESG)
• Pilot Project• Federate with RENCI to share 70TB of NEXRAD data• Utilize highly distributed computing to derive climate-
quality precipitation re-analysis, push data products to NCDC archive system (CLASS)
• Future Plans to Support Climate Assessments• Federate with NOS systems via RENCI to integrate data
from OOI with climate data at NCDC• Federate with GFDL and ESG to integrate climate model
data with in situ and satellite data at NCDCDecember 10, 2010
Prototype Capability
National Climatic Data CenterDAARWG Meeting: CLASS Update 17
Next Steps• #1 Priority: NPP Operational Test & Evaluation• Prepare FY13 Submission• Release of 5.4.1; implement final version of
NEAAT• Complete Cloud Computing Study• Establish Archive Architecture and ConOps• Prepare for Transition to Climate Service
• Stand-up Project Management Staff• Program Review
• Migrate data from legacy systemsDecember 10, 2010
National Climatic Data CenterDAARWG Meeting: CLASS Update 18
DAARWG Engagement• Need for NOAA-level focus on enterprise
infrastructure (beyond “comm-lines”), i.e., NOAA Program for GEO-IDE development and fielding
• Need for NOAA-level policies, directives, and concepts to constraint operational practices and guide IT investments, i.e., CLASS
December 10, 2010
Scott HausmanActing DirectorNOAA’s National Climatic Data Center (NCDC)151 Patton Avenue, Room 557Asheville, NC 28807-5002 828-271-4848 828-271-4246 828-450-9188