Upload
clove
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation. George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division. Common evolutionary steps: Experimental science and computational science. - PowerPoint PPT Presentation
Citation preview
Presented by
ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation
George OstrouchovStatistics and Data Sciences Group
Computer Science and Mathematics Division
Beckerman_0611
2 Ostrouchov_SDS_0611
Early computational science relies largelyon intuitive design and visual validation Computational experiments are expensive Petascale data sets are nearly as opaque
as real systems—statistical analysismust select what to visualize
Uncertainty analysis is in its infancy
Statistics is a major partner in bringing computational science to the rigor and efficiency standards of experimental science Methods to see through,
examine, and classify variability Uncertainty quantification Statistical design of experiments
Common evolutionary steps:Experimental science and computational science
Beckerman_0611
3 Ostrouchov_SDS_0611 3 Ostrouchov_SDS_0611
ORNL statistics and data sciences group:
High-dimensional multivariate classification for chem-bio mass spectrometer
Climate: Compact representation and insight in high-dimensional climate simulation space
Network intrusion detectionwithout content analysis
High-dimensional density space classification
Visualization ofhigh-dimensional relationships
Fusion: Mapping clustered heating coefficient space to tokamak cross-section space
Cutting through the fog of high-dimensional variability
Beckerman_0611
4 Ostrouchov_SDS_0611
Transform to reducenon-linearity in distribution(often density-based)
PCA computed via SVD(or ICA, FA, etc.)
Constructionof component movies
Interpretation of spatial, time, and movie components
Pairs of equal singular values indicate periodic motion
Transform to reducenon-linearity in distribution(often density-based)
PCA computed via SVD(or ICA, FA, etc.)
Constructionof component movies
Interpretation of spatial, time, and movie components
Pairs of equal singular values indicate periodic motion
Statistical decomposition of time-varying simulation data
EETG GTC Simulation Data (W. Lee, Z. Lin, and S. Klasky)
Decomposition shows transient wave components in time
ƒ (Xt) = S0 + 1t S1 + … + kt Sk + Et
S1
2t
1t
Beckerman_0611
5 Ostrouchov_SDS_0611
Statistical decomposition of time-varying simulation data
Time dynamicsof component pairsi j stand outin a smallmultiples plot
Movies of any component combinations or residuals provide further viewsof dynamics
¿
ƒ(Xt) = S0 + 1t S1 + … + ktSk + Et
Beckerman_0611
6 Ostrouchov_SDS_0611
Conditional small multiples provide access to high-dimensional relationships Five-dimensional climate
relationship involving surface heat exchange Latent heat flux: x Radiative heat flux: y Various precipitation
regimes: Rows Time of day:
6 a.m. left columns, 6 p.m. right columns
Across a century of increasing CO2; century divided into three parts: Columns (1,4), (2,5), (3,6)
Hexagonal binning on alog scale controls overplotting
Beckerman_0611
7 Ostrouchov_SDS_0611
RobustMap multiscale data decomposition
Hierarchical data decompositionand dimension reduction with O(np) complexity
Axis represents diffe
rence
between data groups
Internal nodes contain axes for data splits
Leaf nodes contain axes for local dimension reductionn data points in p dimensions
C2b1 c1
bk ck
. .. .. .. .. .
C4NULL
C1b1 c1
C3b1 c1
C5b1 c1
bk ck
. .. .. .. .. .
Each set of axes represents a local data group
Beckerman_0611
8 Ostrouchov_SDS_0611
A statistical framework for guiding visualization of massive data setsBrowse a petabyte of data?
To see 1% of a petabyteat 10 MB per second takes35 work days!
Statistical analysis must select views or reduce to quantities of interest in additionto fast rendering of data
More statisticalanalysis
Human
bandwidth
overlo
ad?
Visualization
scalability through
guidance by
statistical analysis
Petabytes
Terabytes
Gigabytes
Megabytes
No analysis
More data
From local models to annotated data in global context through
clustering in a high-dimensional feature space
Beckerman_0611
9 Ostrouchov_SDS_0611
Contact
George OstrouchovStatistics and Data Sciences GroupComputer Science and Mathematics Division(865) [email protected]
9 Ostrouchov_SDS_0611