Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ARMADAARMADA(AArmy RRDT&E MMeteorologicalAArchitecture for DData AArchival
4DWX Forecasters Training27 February 2008
Outline
ARMADAARMADA at the Range
BreakBreakARMADA ETL DemonstrationARMADA Feedback Session
The significant problems we haveThe significant problems we havecannot be solved at the same cannot be solved at the same level of thinking with which we level of thinking with which we c r e a t e d t h e m .c r e a t e d t h e m . - A l b e r t E i n s t e i n
http://www.quotationspage.com/quote/23588.html
Data Warehouse1
A data warehouse is an application with a computerda tabase tha t co l lec ts , i n teg ra tes and s to res anorganization's data with the aim of producing accurate andt imely management of in format ion and support ingdata analysis. The practice of data warehousing includesthe storage of virtually all transactional data, master data(customer, material), and meta data2 at a very detailed level.
1From Wikipedia http://en.wikipedia.org/wiki/Data_warehouse2Meta Data – data about the data
Human(1800)
Strip Charts(1950)
TimeTime
Dat
a Vo
lum
eD
ata
Volu
me Digital (ASCII)
(1987)
Database (MySQL)(2002)
ARMADA1
(2007)
Paradigms
Data Archival Systems
1Data Warehouse
N/A8N/A>10005Rawinsonde
259,20071056Towers (32M)
2,88031800302Wind Profiler
23,04081018Present Weather
7,48830300126SAMS1 (Mesonet)
86,40061052Tethersonde
43,200,00040.1150Sonics (3D)
11,5202900403SODAR
976,3207101113PWIDS2
2,419,20041128Electric Field Meters
345,600115203Ceilometers
VerticalVertical(Levels)(Levels)
HorizontalHorizontal(Units)(Units)
ObservationObservationRecords PerRecords Per
DayDay
ObservingObservingParametersParameters
(Count)(Count)
FrequencyFrequency(Seconds)(Seconds)
SpatialSpatialPlatformPlatformObserving UnitObserving Unit
Yellow – Continuous permanent 1SAMS – Surface Atmospheric Measuring SystemOrange – Continuous mobile 2PWIDS - Portable Weather Instrumentation Data SystemLight Blue – Manual mobile
Meteorological Observing Platformsat Dugway Proving Ground
Pre ARMADAData Collection
Data collection was developed around the observing platform– Databases: Microsoft Access, Microsoft SQL Server, and MySQL AB.– Text files: delimited, fixed column, and terse ASCII– Other media: magnetic tape, CDs, and paper– Examples of method and media used for archiving
• SAMS – MySQL database RAID1
• Sonics – delimited or terse ASCII CDs• PWIDS – delimited ASCII on CDs• Rawinsonde – text files on personal desktop
Metadata– Typically archived in spreadsheets– Metadata often archived on different computers
1RAID - Redundant Arrays of Independent Drives
Current SAMSRange Database
Create an application toextract, transport, and loadmultiple data sets
SAMS only!DataIngesters
Archive ALL MetadataNot archived
Metadata
Archive ALL range dataNot included in SAMS DB
Range Data
Create a quality assuranceprogramNone or minimal!Quality
Assurance
SolutionProblemProblem Areas
Rawindsonde SODARs
PWIDS Tower
Sensors Calibration
Site HistoryLocation
Sensor Location
Current Requirements
Any meteorological data and/or metadatacollected must be:– Accessible– Available– Usable
Data integration (modeling) Standardization
– Labels, units, and time
Centralizing the Data
Goal of ARMADA is to centrally archive ALL data andmetadata– Relation database management system (RDBMS)– Data warehouse
Controlled by a MySQL AB. database server– Single point input and output access– Applying proper information assurance procedures (Army
Regulation 25-2) can reduce single point failures Standardization of labels and units is applied to all
components in the archive– Database and table names follow predefined naming convention– Column names follow Climate Forecast (CF) standard names
and associated units (typically SI)
ARMADAARMADASystem Concept
Sounding
Other/Historical
Data
SODAR
SAMS
PWIDS
Tower
Archive(Databases)
DataIngest
Data Output
22 33
11
QualityAssurance
44
4DWX
ARCHIVEARCHIVE
Web
GUIs
CLI2
4DWX3
Serial Ports
TCP/IP
Text Files
Web Portal(Metadata)
ETL1
ApplicationsEnd User
Applications
Quality Quality AssuranceAssurance
FieldFieldDataData
1Extract, Transform, Load2Command Line Interface
ARMADA(AArmy RRDT&E MMeteorologicalAArchitecture for DData AArchival)
3Army RDT&E Four-Dimensional Weather
Standardizationof Archive (1)
Self Describing– Standard naming conventions
• Database/Table/Column• Column/Variable (Climate Forecast Compliant)http://cf-pcmdi.llnl.gov/documents/cf-standard-
names/2/cf-standard-name-table.html– International System of Units (SI)
http://physics.nist.gov/cuu/Units/units.html
Applies to all archive components
Archive Databases (1)
SAMSSAMS11
PWIDSPWIDS11
TowerTower11
META
SensorFielded
Non-FieldedCalibration
1Note site metadata history table resides in platform database
Platforms
Example of aPWIDS Database
Site History Table
Real Time Archive Table
Archive Table+-------------------------+
| Tables_in_pwids |
+-------------------------+
| pwids_2007_02 |
| pwids_2007_03 |
| pwids_2007_04 |
| pwids_2007_05 |
| pwids_month || pwids_sitepwids_site-metadata-metadata |+-------------------------+
Example of aPWIDS Table
A time stamp (UTC) record of insertion or modificationtimestamp
last_update_date_time
tinyintnorthward_wind_qc
Note: flag notation has yet to be determinedtinyinteastward_wind_qc
Raw data quality control flagtinyintwind_speed_qc
tinyintwind_from_direction_qc
doublenorthward_wind
Note: Data types are not restricted to doublesdoubleeastward_wind
Raw data fields that utilize SI unitsdoublewind_speed
doublewind_from_direction
The id of the site location and is subject to changechar(10)site_id
Is a unique unit id of a platform (i.e., SAMS)char(10)unit_id
Date time (UTC) of observationdatetimedate_time
PurposeTypeField
Note: temperature, moisture, and pressure have been left off
4/15/20074/15/2007
1212
Unit ID = 12Site ID = Y
WSMR BoundariesWSMR Boundaries
1212
Unit ID = 12Site ID = X
2/12/20062/12/2006 1212
Unit ID = 12Site ID = Z
2/2/20082/2/2008
Optional name for the site locationchar(50)site_name
Optional notes about the site locationtinytextnotes
Arbitrary altitude of sensors above ground leveldoubleabove_ground_level
Altitude at the surface above mean sea leveldoublesurface_altitude
Geodetic latitude (Note northern hemisphere is positive)doublelatitude
Geodetic longitude (Note western hemisphere is negative)doublelongitude
Date time end for validity of recorddatetimedate_time_stop
Date time start for validity of recorddatetimedate_time_start
The id of the site location and is subject to changechar(10)site_id
Is a unique unit id of a platform (i.e., SAMS)char(10)unit_id
Define a unique number to guarantee no duplicity (auto inc)intunique_site_id
PurposeTypeFields
Example of a PWIDS Site Meta Table
Metadata
SensorSensor Location
CalibrationSite Location
Users
WebInterface
Options1. Web portal and archive at DPG2. Web portal and archive at each range
MEMORIZE
Because table names use a “--” (dash), theseare considered special characters. To escapethis character use the back tick “`̀” found onthe left of the number “1” key. Note the backtick can be applied to any special character.
Example:select * from `pwids_site-metadata`;
Ingest Applications (2)
Turn on/offprocess buttons
Last data record ingested
site id and date + time
ConfigurableTitle
Enlarged visual Indicator of
on()/off() processes buttons
Button to manuallyforce an ingest process
(runs only once)
Turn off/onAuto update
Check box is on
Background colors in last data updatedYellow - idleGreen – processingRed – auto update offOrange – an output process is offMagenta – no new data
Current systemclock date/time
Configuration File
Provides instructions to File Ingester to:1. Extract
Location and architecture of data file2. Transform
Converts raw data to ARMADA architecture3. Load
Writes or populates the data into ARMADA
• Coded in XML (eXtensible Markup Language)– Readable and/or self describing– Rigid
Configuration editor is in the plans for FY08
Configuration file
Broken down into two groups– Global parameters
• Effects all data• Applies to:
– Database connections– Base directory– Time of data processing
– Site or Unit parameters• Effects on individual observations• Applies to:
– Interpolation– Calculations– Upload location within ARMADA
<?xml version="1.0" ?>
<FILEINGESTER>
<BASEDIRECTORY Dir="c:/fm" /><DISPLAYTITLES Title="FIELD MILL" SiteTitle="FM ID" DateTitle="COOL" TimeTitle="TIME UTC"/><TIMERGROUP TimeDataIO="60" TimeProcess="5" TimeAlarm="60" TimeStationUpdate="45"/><DATABASECONNECT Source="140.196.88.15" Username="remote" Password="remote" Database="field-mills" Port="3307"/><STARTPROCESS Auto="true" Database="true" Output="false" /><GLOBALSITEINFO SiteTable="`field-mills_site-metadata`" SingleOBFileCount="28" MultiOBFileCount="0"><SINGLEOBFILE FileName="c:/fm_data/EFM001_Tab1sec" StringSplitter="," EndOfLine="\\n" CharsAllowed="-0123456789.:,"-AppendFilePrefix="DCP0Z_" AppendFileDate="day" OverWrite="True" RemoveFile="True" SkipHeader="4"> <SITEID SiteID="1" Unit_ID="1" Site_ID="1" ColumnCount="13" VariableCount="6" EquationCount="0"SpecialEquations="0" TableName="`field-mills_month_test`" AppendFilePrefix="EFM001_Tab1sec" AppendFileDate="none"OverWrite="True" RemoveFile="True"> <DATETIME MySQLDT="-1" Year4="0" Year2="-1" Month="1" DayOfYear="-1" DayOfMonth="2"
HourMinute="-1" Hours="3" Minutes="4" Seconds="5" UTCTimeOffSet="0" SystemTime="false" /> <INSTANCE>
<COLUMNVARIABLE ColumnNumber="7" ColumnName="surface_electric_field" Type="Constant" Unit="NONE" /> <COLUMNVARIABLE ColumnNumber="8" ColumnName="status" Type="Constant" Unit="NONE" /> <COLUMNVARIABLE ColumnNumber="9" ColumnName="leakage_current" Type="Constant" Unit="NONE" /> <COLUMNVARIABLE ColumnNumber="10" ColumnName="panel_temperature" Type="Temperature" Unit="C" /> <COLUMNVARIABLE ColumnNumber="11" ColumnName="battery_voltage" Type="Constant" Unit="NONE" /> <COLUMNVARIABLE ColumnNumber="12" ColumnName="internal_relative_humidity" Type="Moisture" Unit="%" /> </INSTANCE> </SITEID></SINGLEOBFILE>
…
Example
Output (3)
Commercial or open source applications– MySQL Query Browser– Microsoft Excel
Custom applications– Excel (Macros)
• Times Series• Climatologies
– SAMS Report– HPAC GUI Data Getter– PWIDS Display– Field Meter Display
PWIDS Display(Portable Weather Instrumentation Data System)
2D Electric Field (V/m)Contour Plot
Quality Assurance (4)
Installation
Measurements
CommunicationData Collection
ARMADA
Quality Control
ARMADA QC/QA
• Quality Control– Some data tests
• PWIDS• Sonics• Wind Profiler (NIMA)
– Data are only QC’ed forcustomer requested data
– Not archived in QC format
• Quality Assurance1. QA applied to all data2. Create an archive
environment to support QA(ARMADA)
3. Develop a QA program toa. Automate processb. Manual inspectionc. Process in near real
timed. Rigorous follow on tests
Current Goal
Comparison
Multiple Test + VisualRange Test + Visual
Higher Confidence1,2Lower Confidence1,2
AutomatedManual
Lower Labor Costs1Higher Labor Costs1
Real TimePost Analysis
Single Application For MultipleData SetsSingle Application Per Data Set
GoalGoalCurrentCurrent
1Per datum2End user
ARMADA QA Flow
QA FlagsRaw
QualityControlServer
PassPassFailFail
GOLDStandard
Do Nothing
Pass AllTests
Fail ≥ 1Tests
TemporaryTemporaryData StorageData Storage QCS: Developed in Python
QA Flag
The net results of QA tests! Every variable in ARMADA is assigned a QA flag Results of every test are archived
1. All tests are packed into a type2. A type can be archived as a single element in the
database3. Initially only pass/fail results will be archived4. Possible to archive more test result information
QA Flag is composed of 0 and 1’s that describe theresults of the tests
1.84*1018
4.29*109
1.68*107
65,536
256
256
16
2
PossibleIntegers
64
32
24
16
8
8
4
0 or 1
Bits
8
4
3
2
1
1
1/2
1/8
Bytes
01010101 01010101 01010101Medium INT
Bit RepresentationType
01010101Tiny INT
01010101 01010101Small INT
01010101 01010101 01010101 01010101INT
01010101 01010101 01010101 0101010101010101 01010101 01010101 01010101Big INT
01010101Byte
0101Nibble1
1Bit
MySQL Types
1Not a MySQL type
Packing Results via Bits0 000011 000122 001033 001144 010055 010166 011077 011188 100099 1001AA 1010BB 1011CC 1100DD 1101EE 1110FF 1111
0 00011 00122 01033 01144 10055 10166 11077 111
P 0FF 1
N 00PP 01AA 10BB 11
2-Base 4-Base 8-Base 16-Base … N-Base Bit Oct Hex
PassPass
FailFail
Increasing info per test
AboveAbove BelowBelowValidity TestValidity Test
NotNotTestedTested
Human/DB/Computer(RELATIONSHIP)
Validity CheckValidity Check
ComputerMySQL(Integer)
Human
113Below
102Above
011Pass
000Not Tested
QC Approach
QA FlagsRaw
QualityControlServer
ConfigFile
(XML)
0 1 1 1 0 0 1 1
Temperature_QC = int (Flag)
Tests1
ManualManualValidityValidity
PersistencePersistenceBuddyBuddy
1All tests are performed
ARAMADA Implementation
Pass/FailA few test on SAMSManualValidityPersistenceStep testBuddy Check
Will take years to implement through outARMADA!
ARMADA SUMMARY
All data and metadata will be archived in acentral location
Repository will be standardized Naming conventions SI Units
Quality Assure all data 4DWX access
Questions?
ARMADA at the RANGE
SAMS DB Migrationto ARMADA
SAMS Ingest will be replaced by File Ingester– Allow non-fixed number of elements in SAMS data stream– Interpret numbers and strings– Capable of archiving additional derived variables
(See hand out) SAMS DB will conform to ARMADA
standardization SAMS DB will be reconfigured around range needs SAMS DB will be rebuilt during range installation SAMS DB and ARMADA will run simultaneously
indefinitely (suggested 3-6 months)
ARMADA Installation
Software1. Fileingester
• SAMS• Rawindsonde
2. MySQL 5.x3. Range QC4. Configurator?
Setup1. Software install +
database setup2. SAMS + possible
other data sets3. Training4. Upload old SAMS DB
to ARMADA
One week installation (4-5 days) Starting FY08, except CRTC August 07 Run old SAMS DB simultaneously for 3-6 months
Hardware
Current Hardware
Quality Assurance
Database +Output (Web)
Storage (4TB)
Data Ingest
Quality Assurance
Output (Web)
Database
Storage (8TB)
FY 08 FY 09-11 FY 11-13
Current Hardware
(Data Ingest)
Recommended OptimalAcceptable
DesktopDesktop
ServerServer
RAIDRAID
Backups/Redundancy
All raw data will be archived1
Ingest software will be able to• Write raw data to local files• Read raw data files and populate into ARMADA
Database Tables– Updating tables daily or weekly– Archive tables once or has been updated– Will be automatic
Master archive for all ranges located at DPG–– Need to get Port 3306 open through the firewall!Need to get Port 3306 open through the firewall!
1At a minimum raw data should be archived on long termtransferable media such as CD or DVD
Information Assurance
Adhere to Army Regulation 25-2 “InformationAssurance”and comply with local DIACAP (DoDInformation Assurance Certfication andAccreditation Process) requirements
Tighten up access to ARMADA– No more global IP’s– No remote root access1
Metadata web portal will be password protected Sensitive data can be archived in its own
database, and is not limited to the masterarchive
1Not possible on PC based systems
Maintenance/AdministrativeResponsibilities 0-2 Years1
Range1. Manage raw data
archives2. Load archived raw data3. Assist in backup and
recovery4. Maintain metadata5. Monitor data flow
DPG1. Primary MySQL admin2. Backup database
tables3. Master repository4. ARMADA development5. Customer support
1Negotiable
Out Years Objectives
1. Update Hardware (rack mounted system)2. Web interface or GUI tools to ARMADA, climatology, time series, etc3. Expand ingest capabilities beyond File Ingester4. Primary range data source to 4DWX
0909
1. QA Program Version 22. All range data + historical data in ARMADA3. Develop data mining tools
1010
1. Migrate all capabilities to Linux?2. Include non-meteorological data (Test reports, HPAC output, etc.)11-1311-13
Quality Assurance Program Version 1 Upgrading File Ingester for SODAR/Profiler Metadata Web Server operational Decommission SAMS DB
Install ARMADA at ATC, EPG, NVL,RTTC, WSMR, and YPG
Master repository at DPG Replace Legacy Programs (SAMS
Report)
0808
Install ARMADA at CRTC0707
SummerSummerSpringSpringWinterWinterFallFallFYFY
Summary
ARMADA is coming!– CRTC August– All other Ranges Fall 07 to Winter 08– Plan on a full week
ARMADA will replace SAMS DB New hardware is not required for initial install ARMADA will comply with the local DIACAP
Questions?