A Probabilistic-Spatial Approach to the A Probabilistic-Spatial Approach to the
Quality Control of Climate ObservationsQuality Control of Climate Observations
Christopher Daly, Wayne Gibson, Matthew Doggett, Christopher Daly, Wayne Gibson, Matthew Doggett, Joseph Smith, and George TaylorJoseph Smith, and George Taylor
Spatial Climate Analysis ServiceSpatial Climate Analysis Service
Oregon State UniversityOregon State University
Corvallis, Oregon, USACorvallis, Oregon, USA
Traditional QC Systems are Traditional QC Systems are CategoricalCategorical and and DeterministicDeterministic
• Data subjected to categorical quality checksData subjected to categorical quality checks– Designed to uncover mistakesDesigned to uncover mistakes
• Validity determined from test resultsValidity determined from test results– Mistake = flag / tossMistake = flag / toss
– No mistake = no flag / keepNo mistake = no flag / keep
Designed to Work With Human Observing Systems
Alien Electronic Devices are Invading the Alien Electronic Devices are Invading the Climate Observing World!Climate Observing World!
They’re Everywhere!They’re Everywhere!
Electronic SensorsElectronic Sensors and and Modern ApplicationsModern Applications Create Challenges for Traditional QC SystemsCreate Challenges for Traditional QC Systems
• Errors tend to be Errors tend to be continuous drift, rather continuous drift, rather than categorical than categorical mistakesmistakes
• Increasing usage of Increasing usage of computer applications computer applications that rely on climate that rely on climate observationsobservations
• ContinuousContinuous estimates, estimates, rather than categorical rather than categorical tests, of observation validitytests, of observation validity
• QuantitativeQuantitative estimates of estimates of observational uncertainty, observational uncertainty, not just flagsnot just flags
Situation Need
More Challenges…More Challenges…
• Range of applications is Range of applications is increasingly rapidly, and increasingly rapidly, and each has a difference each has a difference tolerance for outlierstolerance for outliers
• Data are often more Data are often more
voluminous and voluminous and disseminated in a more disseminated in a more timely mannertimely manner
• ProbabilisticProbabilistic information from information from which a decision to use an which a decision to use an obs can be made, not up-obs can be made, not up-front decisionfront decision
• AutomatedAutomated QC methods QC methods
Situation Need
An OpportunityAn Opportunity
Advances in climate mapping technology now make it possible to estimate a reasonably accurate “expected value” for an observation based on surrounding stations.
Assumption: Spatial consistency is related to observation validity
Useful Characteristics for a Next-Generation Useful Characteristics for a Next-Generation Climate QC SystemClimate QC System
continuouscontinuousquantitativequantitativeprobabilisticprobabilisticautomatedautomatedspatialspatial
PRISM Probabilistic-Spatial QC (PSQC) System PRISM Probabilistic-Spatial QC (PSQC) System for SNOTEL Datafor SNOTEL Data
Uses climate mapping technology and climate statistics to provide a continuous, quantitative confidence probability for each observation, estimate a replacement value, and provide a confidence interval for that replacement.
• Start with daily max/min temperature for all SNOTEL sites, period of record
• Move to precipitation, SWE, soil temperature and moisture
• Develop automated system for near-real time operation at NRCS
Climatological Grid DevelopmentClimatological Grid Development
– PRISM must produce a high-quality PRISM must produce a high-quality estimate of temperature at each estimate of temperature at each SNOTEL station each day SNOTEL station each day
– Highest interpolation skill obtained by Highest interpolation skill obtained by using a high-quality predictive grid that using a high-quality predictive grid that represents the long-term climatological represents the long-term climatological temperature for that day, rather than a temperature for that day, rather than a digital elevation grid digital elevation grid
– Climatological grid: 0.8 km resolution, Climatological grid: 0.8 km resolution, 1971-2000 1971-2000
4 km
0.8 km
Oregon Annual Precipitation
Leveraging Information Content of High-Quality Climatologies to Create New Maps with Fewer Data and Less Effort
Climatology used in place of DEM as PRISM predictor grid
PRISM Regression of “Weather vs Climate”
PRISM Results
18
20
22
24
26
28
30
32
34
16.5
17.5
18.5
19.5
20.5
21.5
22.5
23.5
24.5
25.5
26.5
71-00 Mean July Maximum Temperature
Dai
ly M
axim
um
Tem
per
atu
re (
C)
21D12S
21D35S
21D13S
353402
21D08S
5211C70E
324045CC
3240335C
21D14S
Regression
Stn: 21D12SDate: 2000-07-20Climate: 21.53Obs:26.0Prediction: 25.75Slope: 1.4Y-Intercept: -4.37
20 July 2000 Tmax vs 1971-2000 Mean July Tmax
- Generates gridded estimates of climatic parameters
- Moving-window regression of climate vs. elevation for each grid cell- Uses nearby station observations
- Spatial climate knowledge base (KBS) weights stations in the regression function by their climatological similarity to the target grid cell
PRISM
Parameter-elevation Regressions on Independent Slopes Model
PRISM KBS accounts for spatial variations in climate due to:
- Elevation- Terrain orientation- Terrain steepness- Moisture regime- Coastal proximity- Inversion layer- Long-term climate patterns
PRISM
Parameter-elevation Regressions on Independent Slopes Model
PRISM Moving-Window Regression Function
1961-90 Mean April Precipitation, Qin Ling Mountains, China
Weighted linearregression
Rain Shadows: 1961-90 Mean Annual PrecipitationOregon Cascades
Portland
Eugene
Sisters
Redmond
Bend
Mt. Hood
Mt. Jefferson
Three Sisters
N
350 mm/yr
2200 mm/yr
2500 mm/yr
Dominant PRISM KBSComponents
Elevation
Terrain orientation
Terrain steepness
Moisture Regime
1961-90 Mean Annual Precipitation, Cascade Mtns, OR, USA
1961-90 Mean Annual Precipitation, Cascade Mtns, OR, USA
Coastal Effects: 1971-00 July Maximum TemperatureCentral California Coast
Monterey
San Francisco
San Jose
Santa Cruz
Hollister
Salinas
Stockton
Sacramento
Pac
ific
Oce
an
Fremont
N
PreferredTrajectories
DominantPRISM KBS Components
Elevation
Coastal Proximity
Inversion Layer
34°
20° 27°
Oakland
Inversions – 1971-00 July Minimum Temperature Northwestern California
Ukiah
Cloverdale Lakeport
Willits
Cle
ar
Lak
e
Pacific Ocean
Lake Pilsbury.
N
DominantPRISM KBS Components
Elevation
Inversion Layer
Topographic Index
Coastal Proximity
12°
17°
9°
16°
10°
17°
Definition of CP: Given the difference between an observation and an expected value (residual), CP is the probability that another observation and expected value from the same time of year would differ by at least as much
Residual distribution+/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)
PRISM PSQC SystemPRISM PSQC SystemConfidence Probability Confidence Probability
(CP)(CP)
XS
X X
P P
Confidence Probability Takes into Account Confidence Probability Takes into Account Uncertainty in the SystemUncertainty in the System
XS
X X
P P
XS
X XP P
P-value is higher for a given deviation from the mean when Sx is large (low skill)
X = Residual (P-O)
Low Overall Skill High Overall Skill
Interpreting Confidence ProbabilityInterpreting Confidence Probability
Continuous values from 0 – 100%
0% = highly spatially inconsistent observation, reflected in a PRISM prediction that is unusually different than the observation
100% = highly consistent observation, reflected in a PRISM prediction that is relatively close to the observation
Guidelines to dateCP > 30: Use observation as-is
10 < CP < 30: Blend prediction and observation
CP < 10: Use prediction instead of observation
PRISM PSQC ProcessPRISM PSQC Process1. 1. CreateCreate Database RecordsDatabase Records
Goal:Goal: Enter daily tmax/tmin observations for all networks into database and prepare Enter daily tmax/tmin observations for all networks into database and prepare data data
Current Actions: Current Actions:
1.1. Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet, Ingest daily tmin/tmax observations from SNOTEL, COOP, RAWS, Agrimet, ASOS, and first-order networks.ASOS, and first-order networks.
2.2. Shift AM COOP observations of tmax to previous day (assumes standard Shift AM COOP observations of tmax to previous day (assumes standard diurnal curve, which does not always apply).diurnal curve, which does not always apply).
3.3. Convert units to degrees Celsius.Convert units to degrees Celsius.
PRISM PSQC ProcessPRISM PSQC Process2. 2. Single-Station ChecksSingle-Station Checks
Goal:Goal: Take all QC actions possible at the single-station level, before entering the spatial QC Take all QC actions possible at the single-station level, before entering the spatial QC process. process.
Current Checks: Current Checks:
1.1. Temperature observation is well above the all-time record maximum or well below the Temperature observation is well above the all-time record maximum or well below the all-time record minimum for the state – flag set and CP set to 0all-time record minimum for the state – flag set and CP set to 0
2.2. Maximum temperature is less than the minimum temperature – flag set and CP set to 0Maximum temperature is less than the minimum temperature – flag set and CP set to 0
3.3. First daily tmax/tmin observation after a period of missing data – flag set and CP set to First daily tmax/tmin observation after a period of missing data – flag set and CP set to 0 (COOP only?)0 (COOP only?)
4.4. More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C More than 10 consecutive observations with the same value (<+/-1F COOP, <+/-0.1C others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP others), or more than 5 consecutive zero values, is a definite flatliner – flag set and CP set to 0set to 0
5.5. 5-10 consecutive observations with the same value is a potential flatliner, to be 5-10 consecutive observations with the same value is a potential flatliner, to be assessed by the spatial QC system – flag set and CP unchangedassessed by the spatial QC system – flag set and CP unchanged
PRISM PSQC ProcessPRISM PSQC Process3. Spatial QC System3. Spatial QC System
GoalGoal:: Through a series of iterations, gradually and systematically “weed out” spatially inconsistent Through a series of iterations, gradually and systematically “weed out” spatially inconsistent observations from consistent onesobservations from consistent ones
Overview: Overview:
1.1. PRISM is run for each station location for each day, and summary statistics are PRISM is run for each station location for each day, and summary statistics are accumulatedaccumulated
2.2. Once all days have been run, frequency distributions are developed and confidence Once all days have been run, frequency distributions are developed and confidence probabilities (probabilities (CPCP) for each daily station observation are estimated) for each daily station observation are estimated
3.3. These These CPCP values are used to weight the daily observations in a second iteration of values are used to weight the daily observations in a second iteration of PRISM daily runsPRISM daily runs
4.4. Obs with lower Obs with lower CPCP values are given lower weight, and thus have less influence, in the values are given lower weight, and thus have less influence, in the second set of PRISM predictions, and are also given lower weight in the calculation of second set of PRISM predictions, and are also given lower weight in the calculation of the second set of summary statistics the second set of summary statistics
5.5. CPCP values are again calculated and passed back to the daily PRISM runs values are again calculated and passed back to the daily PRISM runs
6.6. This iterative process continues for about 5 iterations, at which time the This iterative process continues for about 5 iterations, at which time the CPCP values have values have reached equilibriumreached equilibrium
QC IterationQC Iteration
For each station-day:For each station-day:• Run PRISM for each station location in its absence, estimating its obs for each dayRun PRISM for each station location in its absence, estimating its obs for each day• PRISM omits nearby stations, singly, and in pairs, to try to better match observationPRISM omits nearby stations, singly, and in pairs, to try to better match observation• Prediction closest to obs is accepted Prediction closest to obs is accepted
– Raw PRISM variables: Raw PRISM variables: Observation (O), Prediction (P), Residual (R=P-O), PRISM Regression Standard Deviation (S)
Once all station-days are run:• Calculate summary statistics for each station for each day
– Mean and std dev of O (Os), P (Ps), R (Rs), and S (Ss)– +/- 15 day, +/- 2 year window = 5 yrs, 31 days each (N~155)– 5-day running Standard Deviation (RunSD) as a measure of day-to-day variability (time shifting)– Potential flatliners: calculate V, the ratio of station’s RunSD (set to 0.3) to that of surrounding stations
• Determine “effective” standard deviation for frequency distribution– Sigma = Max ( Rs, S, Ss, RunSD, 2 )
• Calculate probability statistics for O, P, R, S, and V for each day– Probability statistics are p-values from z-tests– Residual Probability (RP) used as an estimate of overall Confidence Probability (CP) for an observation– Except in the case of potential flatliners, where CP = min(RP,VP)
• CP used to weight stations in next iteration
Observations and CP values, Date: 1996-02-08
Drifting sensor : MCKENZIE PASS (21E07S)
Climatology vs Observation and Prediction, Date: 1996-02-08
Drifting sensor : MCKENZIE PASS (21E07S)
Warm Bias: SALT CREEK FALLS (22F04S)Observations, Date: 2000-07-14
Anomalies and CP values, 7-21 July 2000
Warm Bias: SALT CREEK FALLS (22F04S)
14 July
Scatter Plot: Climatology vs Observation, 14 July 2000
Warm Bias: SALT CREEK FALLS (22F04S)
22F04S
Odell Lake COOP
Tmax Observations, Date: 2000-07-14
Computing ObstaclesComputing Obstacles
• Computing – currently takes about 60 hours to run PRISM PSQC system for SNOTEL sites in the western US– 14-processor cluster
• Disk space – we now have > 1 TB, but will probably need more
• Funds are insufficient to “do it right”
Issues to ConsiderIssues to Consider
• How far can the assumption be taken that spatial How far can the assumption be taken that spatial consistency equates with validity?consistency equates with validity?
• Are continuous and probabilistic QC systems useful for manual observing systems?
• Can a high-quality QC system ever be completely automated?