Upload
illana-blair
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Machine Learning Methods for the Understanding and Prediction of Climate Systems: Tropical Pacific Ocean Thermocline and ENSO Events Abstract GC43A-1014. Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. [email protected] - PowerPoint PPT Presentation
Citation preview
Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. [email protected]
Upmanu Lall - Water Center, Columbia University. New York, United States. [email protected]
Motivation
Machine Learning Methods for the Understanding and Prediction of Climate Systems: Tropical Pacific Ocean
Thermocline and ENSO EventsAbstract GC43A-1014
How to reduce effectively the dimension of a large, complex, climate system with a nonlinear structure? Potential solution: Use of machine learning methods of nonlinear dimensionality reduction. Principal Component Analysis (PCA) and its extensions have been widely used in Climate Science to obtain a lower-dimensional picture of the system under investigation. The internal structure of the system is revealed by projecting the original data onto the eigenvectors of the covariance (or correlation) matrix of the system. However, linearity is a basic assumption of PCA and when the relationships across the variables are nonlinear, PCA fails to identify the main patterns of the data and other methods are needed.
Climate Dataset
Conclusions and Future Work
Acknowledgment
Maximum variance unfolding (MVU) was originally developed by Weinberger and Saul (2006) and has its origins on Kernel PCA, where a known nonlinear function is used to map the original data to a transformed space (the feature space), which is expected to be linear. Using the kernel trick, dual PCA can be applied in this space to obtain a lower-dimensional system of the original data. MVU is a data-driven approach, where the nonlinear function is not known and a Kernel matrix is obtained from the original data by semidefinite progamming. The goal is to maximize the sum of the eigenvalues (trace) of a Kernel matrix while keeping local distances in the Gram matrix equal to the correspondent local distances in the Kernel Matrix. Mathematically, MVU can be expressed as
• More variance explained by MVU modes possibly due to nonlinearities; • Monotonic incresing trend in the first MVU (not clear in the first PC);• Patterns of second and third MVU similar to the equivalent PCs but shifted and more correlated with NINO3;• Forecast model for December NINO3 based on Lasso regression and MVU/PCA modes shows appreciable skills up to eleven month lead time;• Future work will explore a forecast model for monthly values of ENSO indices as well as for the thermocline/SST fields and other ENSO related variables.
We thank IRI for providing the climate datasets and also K. Q.Weinberger for making his MVU code available. The first author acknowledges the financial support from CAPES through grant # 12515-12-4.
Results
References• Lima, C. H. R., Lall, U., Jebara, T., Barnston, A. G., 2009. Statistical Prediction of ENSO from Subsurface Sea Temperature Using a Nonlinear Dimensionality Reduction. J. Climate 22, 4501–4519.• Weinberger, K. Q., Saul, L., 2006. Unsupervised Learning of Image Manifolds by Semidefinite Programming. Int. J. Comp. Vision 70 (1), 77–90.
Our Approach: Maximum Variance Unfolding
Themocline Modes of Variability
ENSO Correlation and Forecasts
Here we extend some previous work (Lima et al., 2009) and apply MVU to the new and updated NOAA/NCEP GODAS sub-surface ocean dataset. We focus on the depth of the 200C isotherm of the tropical Pacific ocean, which is a proxy for the thermocline depth and one of the main carriers of ENSO information. Details: We restrict our analysis to the Pacific D20 along the latitudinal and longitudinal bands bounded by 26N and 28S and 122E and 77W, respectively. The dataset covers the period from January/1980 through June/2012 and consists of 21009 data points located in an equally-spaced grid cell.A predictive model for the December NINO3 index is explored using the thermocline modes at different lag times as covariates. The model is based on the so called LASSO regression, which shrinks the model coefficients and usually outperforms ordinary methods (e.g. AIC, BIC) of model selection, being particularly useful when the number of predictors is very large, as here.
huge. becan However,
.n rather tha )(by defined
space in thePCA apply :Idea
.,...,1 ),(
:
Xx
xx
i
ii
Ni
D
TjiijK
K
)()( Hence,
)()()(),(:)(for E.g.
producs.dot only thebut ,explicitly mapping the
compute toneednot do trick Kernel :Solution
22
xx
zwzwzwww
distant?remain onesdistant andnearby remain pointsnearby such that , where
, outputs compute can we how, inputs ldimensionahigh given :Question
Dd
N di
Di
yx
MVU
Temporal correlation of the D20 gridded data and PCA (left) and MVU (right) modes: first, second and third from top to bottom.
MVU (thicker lines) and PC (thin lines) modes for the thermocline data. The sign of the second and third PCs are inverted for comparison purposes.
1st ModeLag = 3 months
)(3)(2
)(1)(324
itMVUditMVUc
itMVUbatNINO
itit
lagiitlag
Forecast Model
10-fold cross-validation: Correlation skill
2nd ModeLag = 12 months
3rd ModeLag = 18 months
Temporal correlation of SST and PCA (left) and MVU (right) modes.