9
Presented by ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

  • Upload
    clove

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation. George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division. Common evolutionary steps: Experimental science and computational science. - PowerPoint PPT Presentation

Citation preview

Page 1: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Presented by

ORNL Statistics and Data Sciences Understanding Variability and Bringing Rigor to Scientific Investigation

George OstrouchovStatistics and Data Sciences Group

Computer Science and Mathematics Division

Page 2: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

2 Ostrouchov_SDS_0611

Early computational science relies largelyon intuitive design and visual validation Computational experiments are expensive Petascale data sets are nearly as opaque

as real systems—statistical analysismust select what to visualize

Uncertainty analysis is in its infancy

Statistics is a major partner in bringing computational science to the rigor and efficiency standards of experimental science Methods to see through,

examine, and classify variability Uncertainty quantification Statistical design of experiments

Common evolutionary steps:Experimental science and computational science

Page 3: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

3 Ostrouchov_SDS_0611 3 Ostrouchov_SDS_0611

ORNL statistics and data sciences group:

High-dimensional multivariate classification for chem-bio mass spectrometer

Climate: Compact representation and insight in high-dimensional climate simulation space

Network intrusion detectionwithout content analysis

High-dimensional density space classification

Visualization ofhigh-dimensional relationships

Fusion: Mapping clustered heating coefficient space to tokamak cross-section space

Cutting through the fog of high-dimensional variability

Page 4: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

4 Ostrouchov_SDS_0611

Transform to reducenon-linearity in distribution(often density-based)

PCA computed via SVD(or ICA, FA, etc.)

Constructionof component movies

Interpretation of spatial, time, and movie components

Pairs of equal singular values indicate periodic motion

Transform to reducenon-linearity in distribution(often density-based)

PCA computed via SVD(or ICA, FA, etc.)

Constructionof component movies

Interpretation of spatial, time, and movie components

Pairs of equal singular values indicate periodic motion

Statistical decomposition of time-varying simulation data

EETG GTC Simulation Data (W. Lee, Z. Lin, and S. Klasky)

Decomposition shows transient wave components in time

ƒ (Xt) = S0 + 1t S1 + … + kt Sk + Et

S1

2t

1t

Page 5: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

5 Ostrouchov_SDS_0611

Statistical decomposition of time-varying simulation data

Time dynamicsof component pairsi j stand outin a smallmultiples plot

Movies of any component combinations or residuals provide further viewsof dynamics

¿

ƒ(Xt) = S0 + 1t S1 + … + ktSk + Et

Page 6: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

6 Ostrouchov_SDS_0611

Conditional small multiples provide access to high-dimensional relationships Five-dimensional climate

relationship involving surface heat exchange Latent heat flux: x Radiative heat flux: y Various precipitation

regimes: Rows Time of day:

6 a.m. left columns, 6 p.m. right columns

Across a century of increasing CO2; century divided into three parts: Columns (1,4), (2,5), (3,6)

Hexagonal binning on alog scale controls overplotting

Page 7: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

7 Ostrouchov_SDS_0611

RobustMap multiscale data decomposition

Hierarchical data decompositionand dimension reduction with O(np) complexity

Axis represents diffe

rence

between data groups

Internal nodes contain axes for data splits

Leaf nodes contain axes for local dimension reductionn data points in p dimensions

C2b1 c1

bk ck

. .. .. .. .. .

C4NULL

C1b1 c1

C3b1 c1

C5b1 c1

bk ck

. .. .. .. .. .

Each set of axes represents a local data group

Page 8: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

8 Ostrouchov_SDS_0611

A statistical framework for guiding visualization of massive data setsBrowse a petabyte of data?

To see 1% of a petabyteat 10 MB per second takes35 work days!

Statistical analysis must select views or reduce to quantities of interest in additionto fast rendering of data

More statisticalanalysis

Human

bandwidth

overlo

ad?

Visualization

scalability through

guidance by

statistical analysis

Petabytes

Terabytes

Gigabytes

Megabytes

No analysis

More data

From local models to annotated data in global context through

clustering in a high-dimensional feature space

Page 9: George Ostrouchov Statistics and Data Sciences Group Computer Science and Mathematics Division

Beckerman_0611

9 Ostrouchov_SDS_0611

Contact

George OstrouchovStatistics and Data Sciences GroupComputer Science and Mathematics Division(865) [email protected]

9 Ostrouchov_SDS_0611