Upload
louis
View
24
Download
0
Tags:
Embed Size (px)
DESCRIPTION
ISSUES IN THE DESIGN AND ANALYSIS OF COMPUTER EXPERIMENTS. David M. Steinberg Tel Aviv University. COLLABORATORS. Dennis Lin Dizza Bursztyn Ron Kenett Henry Wynn Ron Bates Sigal Levy Einat Neuman Ben Ari. Gideon Leonard Tamir Reisin Eyal Hashavia Zeev Somer. THANK YOUS Noga Alon - PowerPoint PPT Presentation
Citation preview
SAMSI Working Group
March 2007
ISSUES IN THE DESIGN AND ANALYSIS OF COMPUTER
EXPERIMENTS
David M. Steinberg
Tel Aviv University
SAMSI Working Group
March 2007
THANK YOUSNoga Alon
Ronit Steinberg
COLLABORATORSDennis LinDizza BursztynRon KenettHenry WynnRon BatesSigal LevyEinat Neuman Ben Ari
Gideon LeonardTamir ReisinEyal HashaviaZeev Somer
SAMSI Working Group
March 2007
1. Some Applications• Nuclear Waste Repository• Ground Response to an Earthquake• Chemotherapy Simulator• Optimizing a Piston
2. Designing Computer Experiments3. Latin Hypercube Designs4. Rotated Factorial Designs5. LHD’s as Rotated Factorial Designs6. Near LHD’s from Rotated Factorials7. Nuclear Waste Disposal: Quandaries8. Chemotherapy: Quandaries9. Ground Shaking: Quandaries10.GASP Models and Bayesian Regression
PREVIEW
SAMSI Working Group
March 2007
Example: Nuclear Waste Repository
RESRAD computes leaching of radioactive isotopes from the repository into the food and water supply.
Time frame is thousands of years, so field study is impossible.
SAMSI Working Group
March 2007
Inputs
• Initial isotope concentrations
• Distribution coefficients of the isotopes
• Lithology of the repository
Outputs
• Maximal dose during 10,000 years
Example: Nuclear Waste Repository
SAMSI Working Group
March 2007
What will be the ground response to an earthquake?
An engineering simulator uses a finite element scheme to simulate ground motion. Shaking of the bedrock generates surface motion.
We wish to study the output from the program to aid earthquake preparedness plans.
Example: Ground Shaking
SAMSI Working Group
March 2007
Inputs• Geometry of the ground surface• Layers of hard/soft soil below the surface• Shear velocity, density, elasticity of the soil
in each layer• Amplitude and spectrum at bedrock
Outputs• Displacement along the surface• Acceleration along the surface
Example: Ground Shaking
SAMSI Working Group
March 2007
What is the effect of chemotherapy treatment?
The treatment affects both cancerous and healthy cells in the body.
The goal is to develop treatment protocols that will put the cancer into remission with minimal damage to healthy cells.
Example: Chemotherapy Simulator
SAMSI Working Group
March 2007
Inputs
• Treatment protocol: dosage and timing
• Rate of drug decay
• Rate of cell death
• Rate of cell regeneration
Outputs
• Number of healthy and malignant cells, as a fraction of the initial count
Example: Chemotherapy Simulator
SAMSI Working Group
March 2007
The piston simulator was written by Kenett and Zacks as a teaching tool for their text book.
The simulator describes the cycle time of a piston and is based on the physics governing the piston.
Variation in output is related to tolerances in the inputs.
The goal was to achieve a target cycle time with minimal variation.
Example: Piston Performance
SAMSI Working Group
March 2007
Output
• Cycle time
Example: Piston Performance
C: Initial Gas Volume (m3)
B: Piston Surface Area (m2)A: Piston Weight (Kg)
D: Spring Coefficient (N/m)
E: Atmospheric Pressure (N/m2)F: Ambient Temperature (0K)
G: Gas Temperature (0K)
C: Initial Gas Volume (m3)C: Initial Gas Volume (m3)
B: Piston Surface Area (m2)A: Piston Weight (Kg)B: Piston Surface Area (m2)A: Piston Weight (Kg)A: Piston Weight (Kg)
D: Spring Coefficient (N/m)
E: Atmospheric Pressure (N/m2)F: Ambient Temperature (0K)
G: Gas Temperature (0K)
SAMSI Working Group
March 2007
Latin Hypercube Designs
Latin Hypercubes are the most popular class of experimental plan.
LHD’s place the input levels for each factor on a uniform grid.
Then “mate” the levels across factors by randomly permuting the column for each factor.
McKay, Beckman and Conover, Technometrics, 1979.
SAMSI Working Group
March 2007
Latin Hypercube Designs
Example of a Latin Hypercube design for 3 factors.
Initial Grids Shuffled Grids1 1 1 1 0.3 0.5
0.9 0.9 0.9 0.9 0.4 0.20.8 0.8 0.8 0.8 1 0.70.7 0.7 0.7 0.7 0.6 00.6 0.6 0.6 0.6 0.2 10.5 0.5 0.5 0.5 0.7 0.90.4 0.4 0.4 0.4 0 0.10.3 0.3 0.3 0.3 0.9 0.60.2 0.2 0.2 0.2 0.5 0.40.1 0.1 0.1 0.1 0.8 0.8
0 0 0 0 0.1 0.3
SAMSI Working Group
March 2007
Latin Hypercube Designs
Some 2-factor projections from a 250-run LHD.
SAMSI Working Group
March 2007
Latin Hypercube Designs
Other mating schemes have been suggested to obtain columns with low correlation.
Ye showed how to get 2m-2 fully orthogonal columns with 2m runs.
Butler showed how to get orthogonality with respect to a trigonometric regression model and 2m runs.
How many orthogonal columns are possible?
SAMSI Working Group
March 2007
Rotated Factorial Designs
Bursztyn and Steinberg developed experimental plans with many levels in which linear effects are orthogonal.
Start with a “standard” first-order orthogonal design, like a 2k-p fractional factorial: D.
“Rotate” the design using a rotation matrix R: D DR.
Then (DR)’(DR) = R’D’DR = nR’R = nI.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Steinberg and Lin showed how to rotate two-level factorials into Latin Hypercube designs with a large number of first-order orthogonal columns.
This work combines a rotation idea in Bursztyn and Steinberg with another rotation idea developed by Lin and Beattie.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial DesignsLin and Beattie: rotate 2k factorials to Latin
Hypercube designs. The intuition: Columns in a LHD are an arithmetic
sequence. Columns in DR are linear combinations of the
rows of D (the 2k design). The rows of D are a binary expansion of the
odd integers. Using appropriate powers of 2 as the
elements in R, each column in DR is an integer sequence.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
-1 -1 -11 -1 -1-1 1 -11 1 -1-1 -1 11 -1 1-1 1 11 1 1
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
2 -4 1
-1 -1 -11 -1 -1-1 1 -11 1 -1-1 -1 11 -1 1-1 1 11 1 1
Weights
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Weights 2 -4 1
-1 -1 -1 11 -1 -1 5-1 1 -1 -71 1 -1 -3-1 -1 1 31 -1 1 7-1 1 1 -51 1 1 -1
Weighted Sums
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial DesignsLin and Beattie: rotate 2k factorials to Latin
Hypercube designs. Can we organize weights for multiple
columns in a rotation matrix R? Yes – provided R is t by t, where t is a
power of 2. A simple recursive scheme gives the
rotation matrices. Original proposal limited to full factorial
designs 2k, where k is a power of 2.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial DesignsLin and Beattie: rotate 2k factorials to Latin
Hypercube designs.
10 R
12
21
5
11R
jj
jjj
RR
RRR j
j
1
1
2
2
12
2
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Bursztyn and Steinberg showed that fractional factorial designs can also be rotated.
First, the design must be decomposed into sets of factors, each of which is a full factorial.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Steinberg and Lin:
RDRD
R
R
DDDR
t
t
||
00
00
00
||
1
1
The resulting design is an orthogonal Latin hypercube.
Bursztyn & Steinberg
Lin & Beattie
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
The construction requires that each set of columns be a full factorial design.
Suppose we start with a saturated fractional factorial with 2m runs.
How can we “group” the columns to achieve the maximum number of full factorials?
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
We can order the columns so that each set of m consecutive columns is a full factorial.
1. Identify the columns as the non-zero points in GF(2m).
2. All non-zero points (hence all columns) can be obtained as xj mod p(x), where p(x) is a primitive polynomial of GF(2m).
3. Order the columns by the order of the powers.
4. A set of m consecutive columns is not a full factorial if it as a linear dependency. Easy to show that this implies a linear dependency in the first m columns.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
1. Identify the columns as the non-zero points in GF(2m), the Galois Field of binary vectors of length m.
The column of 1’s is matched with (0,0,…,0).
The column for A is matched with (1,0,…,0).
The column for B is matched with (0,1,0,…,0).
The column for AB is matched with (1,1,0,…,0).
In general, the column for any interaction is matched with a vector with 1’s marking the factors involved in the interaction.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
1. Identify the columns as the non-zero points in GF(2m), the Galois Field of binary vectors of length m.
Each binary vector is used to represent a polynomial with binary coefficients.
AC (1,0,1,0,0,0) 1 + x2
BDF (0,1,0,1,0,1) x + x3 + x5
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
2. All non-zero points (hence all columns) can be obtained as xj mod p(x), where p(x) is a primitive polynomial of GF(2m).
GF theory – there exists a primitive polynomial, p(x), that can be used to generate all the non-zero polynomials in GF(2m).
The primitive polynomial is a binary polynomial of degree m. Recall that m is the number of factors, so we want to generate all polynomials of degree m-1 or less.
All calculations are carried out modulo 2.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
2. All non-zero points (hence all columns) can be obtained as xj mod p(x), where p(x) is a primitive polynomial of GF(2m).
For example, with m=4, a primitive polynomial is 1+x+x4.
x0 ≡ 1 (A) x1 ≡ x (B) x2 ≡ x2 (C) x3 ≡ x3 (D)
x4 ≡ 1+x (AB) x5 ≡ x+x2 (BC) etc.
If we continue, we find all the non-zero polynomials.
Every set of m successive columns is a full factorial.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
The rotated designs are a special class of Latin Hypercubes with an external orthogonal array structure (U-designs).
For each pair of columns, ¼ of all the points are in each quadrant.
For many pairs, finer divisions hold.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Some 2-factor projections from the design of the ground-shaking study.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Points may “clump” in low-dimensional projections.
In high dimensions, points do not clump.
The rotation is isometric, so the inter-point differences are like those in the original factorial, except for “shrinking” the final design back to a hypercube.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Steinberg and Lin show that these rotated designs have good statistical properties as screening designs.
Main effects have low aliasing with second order effects (by comparison with randomly mated LHC designs or randomly chosen U-designs).
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
Suppose you use the design to fit a simple first-order regression model, to “screen” the most influential factors:
Y = Xβ + ε.
But the true dependence involves additional regression terms:
Y = Xβ + Zγ.
Then β-hat = β + (X’X)-1X’Zγ = β + Aγ .
The matrix A is known as the alias matrix.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
The alias matrix depends on the design, the model used for screening, and the extra terms in Z.
A good screening design should have small values in A for simple screening models and somewhat more complex extra terms.
Bursztyn and Steinberg, JSPI (2006), 1103-1119,
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
We compared 16-run, 12-factor designs, with a first-order screening model and extra terms of second order.
The alternatives: a standard LHD (best of 100 random choices) and an OA-based LHD (best of 100 random choices).
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
The percent of entries in A that were < 0.1:
Two-factor interactions
Pure Quadratics
Orthogonal LHD
65.0%74.3%
Standard LHD
30.7%50.7%
OA-based LHD
52.7%45.8%
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
For the standard and OA-based LHD’s, the results shown are the best found for 100 random designs.
For the orthogonal LHD, all non-isomorphic groupings of columns into 3 sets of 4 columns were found. Results were very similar for all groupings.
SAMSI Working Group
March 2007
LHD’s as Rotated Factorial Designs
A design with n/2 columns, all orthogonal to each other and to all possible second-order effects, can be constructed using the same ideas.
The trick is in the choice of the starting design.
We rotate the resolution IV “foldover” design. The rotation preserves the foldover property and that, in turn, guarantees the orthogonality properties.
The GF(2m) structure again provides a way to group the columns into full factorials.
SAMSI Working Group
March 2007
Near LHD’s from Rotated Factorials
Orthogonal designs that are nearly LHD’s can be obtained by rotating other base designs.
Example: use as the base design the 48 run Plackett-Burman design.
Rotate 40 factors in 5 groups of 8.
The rotated design has all columns orthogonal. It is also a U-design.
It is nearly a Latin Hypercube.
SAMSI Working Group
March 2007
Near LHD’s from Rotated Factorials
Below is a q-q plot for one of the factors against a uniform distribution.
SAMSI Working Group
March 2007
Nuclear Waste Repository: Quandaries• Main goal is to assess which input factors
have greatest influence on output: Sensitivity Analysis.
• For example: given a proposed site, which factors should be measured?
• Output data are highly skewed, with many 0’s (configurations with no leaching into the drinking water).
• What is the best way to summarize the results?
SAMSI Working Group
March 2007
RESRAD RESRAD is a computer model designed
to estimate radiation doses and risks from RESidual RADioactive materials.
RESRAD simulates radiation doses and cancer risks for a variety of pathways in the environment (e.g. drinking water, food chain, atmosphere).
Developed at Argonne National Laboratory. http://web.ead.anl.gov/
resrad/
SAMSI Working Group
March 2007
RESRAD
Number of input parameters can reach hundreds.
Most parameters are difficult/expensive to measure or control and are subject to wide ranges of uncertainty.
SAMSI Working Group
March 2007
RESRADTypical RESRAD output.
0.00E+01
1.00E+04
2.00E+04
3.00E+04
4.00E+04
5.00E+04
10 100 1000 10000
Years
Prob_run.RAD 07/04/2004 11:53 Includes All Pathways
DOSE: U-238, All Pathways Summed
SAMSI Working Group
March 2007
Our Case Study Twenty-seven input parameters.
Initial radionuclide is U238 buried at a depth of 2 meters.
Lithology is one-dimensional, with contaminated, unsaturated and saturated layers above groundwater.
SAMSI Working Group
March 2007
Our Case Study Wide uncertainties for inputs. Many have log-normal distributions as a
reflection of scientific uncertainty. The distribution coefficients for U234
and U238 should be identical.
Outcome: maximal annual dose during 10k years.
SAMSI Working Group
March 2007
Our Case Study
Use RESRAD’s built-in capability for sensitivity analysis.
Options include: One-factor-at-a-time analysis. Random samples of input settings. Latin Hypercube samples. Different input parameter distributions
(e.g. uniform, normal, log-normal). Specified rank correlations of inputs.
SAMSI Working Group
March 2007
Our Case Study
Limitations include: Inability to enforce equality of inputs. Limited ability to trace dose across time. Built-in analyses.
SAMSI Working Group
March 2007
Our Case StudyWe generated 900 training points, using 3 LHS’s of
300 runs each.Most inputs were sampled from lognormal
distributions.The Kd’s for U238 and U234 were given a rank
correlation of 0.99.The Kd’s of the same isotope in different layers were
given rank correlations of 0.3.A separate test set of 300 test points was generated
from 3 100-run LHS’s.A second test set with some of the original inputs at
fixed values.
SAMSI Working Group
March 2007
Our Case StudyA typical plot of the output vs. a strong input.
U238 Kd, Unsaturated Layer
Ma
x D
ose
10 50 100 500
05
00
01
00
00
15
00
0
SAMSI Working Group
March 2007
Our Case StudyA typical plot of the output vs. a strong
input.
Soil Depth
Ma
x D
ose
0 50 100 150 200
05
00
01
00
00
15
00
0
SAMSI Working Group
March 2007
Our Case StudyOn the 900 training points, 76% had no
migration at all into the water supply.The migration on the remaining 24% was
highly skewed.
Quantiles of Standard Normal
Ma
x D
ose
-3 -2 -1 0 1 2 3
05
00
01
00
00
15
00
0
Quantiles of Standard Normal
Ma
x D
ose
-3 -2 -1 0 1 2 3
10
^-2
51
0^-
20
10
^-1
51
0^-
10
10
^-5
10
^-1
10
^3
SAMSI Working Group
March 2007
Our Case StudyBelow is a normal plot of the log maximal
dose for samples with a maximal dose of at least 0.01.
Quantiles of Standard Normal
Ma
x D
ose
-2 -1 0 1 2
10
^-2
10
^-1
10
^01
0^1
10
^21
0^3
10
^4
SAMSI Working Group
March 2007
Our Case StudyRESRAD provides automatic sensitivity
output, which includes: Partial correlation and regression
coefficient of outcome with each input. Rank correlation and rank regression
coefficient of outcome with each input.
Often these measures point to quite different inputs as being most influential.
SAMSI Working Group
March 2007
Coefficient = PCC SRC PRCC SRRC
Repetition = 1 1 1 1
_______________________________________________ _________ _________ _________
Description of Probabilistic Variable Sig Coeff Sig Coeff Sig Coeff Sig Coeff
_________________________________________ _____ ___ _____ ___ _____ ___ _____
Concentration of U-238 7 0.07 11 0.07 17 -0.04 21 -0.03
Kd of U-238 in Contaminated Zone 10 -0.06 2 -0.18 23 0.01 16 0.04
Kd of U-238 in Unsaturated Zone 1 13 -0.06 6 -0.14 19 -0.02 12 -0.07
Kd of U-238 in Saturated Zone 20 0.03 9 0.08 10 -0.07 3 -0.21
Kd of U-234 in Contaminated Zone 12 0.06 4 0.18 20 -0.02 14 -0.06
Kd of U-234 in Unsaturated Zone 1 26 -0.01 23 -0.01 11 -0.06 4 -0.20
Kd of U-234 in Saturated Zone 6 -0.08 1 -0.22 18 0.02 13 0.07
Kd of Th-230 in Contaminated Zone 23 -0.01 24 -0.01 16 0.04 20 0.03
Kd of Th-230 in Unsaturated Zone 1 5 -0.08 10 -0.08 14 0.04 18 0.03
Kd of Th-230 in Saturated Zone 3 0.12 7 0.12 27 0.00 27 0.00
Kd of Ra-226 in Contaminated Zone 9 -0.07 12 -0.07 9 -0.09 11 -0.07
Kd of Ra-226 in Unsaturated Zone 1 22 -0.02 22 -0.02 4 -0.19 6 -0.15
Kd of Ra-226 in Saturated Zone 24 0.01 25 0.01 13 -0.05 17 -0.04
Kd of Pb-210 in Contaminated Zone 15 0.05 16 0.05 15 -0.04 19 -0.03
Kd of Pb-210 in Unsaturated Zone 1 14 0.05 15 0.05 26 0.01 26 0.01
Kd of Pb-210 in Saturated Zone 19 0.03 19 0.03 25 -0.01 25 -0.01
Precipitation 8 0.07 13 0.06 2 0.29 2 0.22
Saturated zone hydraulic conductivity 2 0.17 5 0.16 8 0.09 10 0.07
Saturated zone hydraulic gradient 17 0.04 18 0.04 21 -0.02 22 -0.01
Well pump intake depth 21 0.03 21 0.02 12 0.06 15 0.04
Well pumping rate 18 0.03 20 0.03 22 0.02 23 0.01
Thickness of Unsaturated zone 1 1 -0.19 3 -0.18 1 -0.52 1 -0.44
Hydraulic Conduct of Unsat zone 1 25 0.01 26 0.01 24 -0.01 24 -0.01
Total Porosity 27 0.00 27 0.00 5 0.18 7 0.14
Saturated zone total porosity 4 0.11 8 0.10 6 0.18 8 0.13 Effective Porosity 11 -0.06 14 -0.06 3 -0.26 5 -0.19
Saturated zone effective porosity 16 -0.04 17 -0.04 7 -0.16 9 -0.11
____________________________________ _____ _____ _____ _____
R-SQUARE 0.16 0.16 0.48 0.48
____________________________________ _____ _____ _____ _____
SAMSI Working Group
March 2007
Our Case StudyThe partial correlations and regressions are
dominated by a small number of very large doses.
The rank analyses largely ignore the large dose information.
The partial correlations have trouble with highly correlated inputs.
The partial regressions may overstate the importance of highly correlated inputs.
All the measures consider only linear dependence.
SAMSI Working Group
March 2007
Our Case StudyWe applied a two-phase analysis:
1. Find which inputs are associated with having a maximal dose of at least 0.1.
2. Among doses of at least 0.1, find which inputs are associated with high doses.
SAMSI Working Group
March 2007
Presence/AbsenceThe first analysis treats the outcome as
binary.
Contamination of at least 0.1 was found in 18% of the training cases.
Logistic regression and CART were used to fit predictive models for having a maximal dose of at least 0.1.
SAMSI Working Group
March 2007
U234 Kd’s were not used due to the high correlation with U238 Kd’s.
Ten input factors were included in the final logistic model, along with some quadratic terms and some interactions.
Presence/Absence
SAMSI Working Group
March 2007
Below is a plot of presence of contamination in the test data vs. phat.
Phat
Co
nta
min
atio
n
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Presence/Absence
SAMSI Working Group
March 2007
The CART model was not as successful.
It often exploited “unimportant” variables for final splits.
It often ignored important variables.
Overall the dependence of “presence of contamination” on the inputs appears to be too smooth to be picked up well by CART.
Other methods could certainly be tried.
Presence/Absence
SAMSI Working Group
March 2007
The following table summarizes the success on the test data.
phatContamination
Logistic
Contamination
CART
<0.00010/11910/126
<0.0011/15410/126
<0.014/18113/175
<0.110/21219/211
<0.212/22419/219
>0.537/4931/51
Presence/Absence
SAMSI Working Group
March 2007
Max DoseTo predict the maximal dose, when
contamination is present, several different regression models were run after transforming to a log scale.
We also fitted a GASP model using the PeRK Software from Brian Williams at Los Alamos National Labs.
SAMSI Working Group
March 2007
The final regression model included 14 input factors, with mostly linear effects.
Regression Prediction
Ma
x D
ose
-2 -1 0 1 2 3 4
10
^-2
10
^-1
10
^01
0^1
10
^21
0^3
10
^4The mean error on the test data (with contamination) was 0.08 with a SD of 0.74.
Training data SD was 0.70.
Max Dose
SAMSI Working Group
March 2007
Linear regression with all inputs has a test case SD of 0.82.
GASP Prediction
Max
Dos
e
-1 0 1 2 3 4
10^-
210
^-1
10^0
10^1
10^2
10^3
10^4GASP model
using 10 main inputs.
Mean error 0.10.
SD 0.86.
Max Dose
SAMSI Working Group
March 2007
Projection pursuit regression model .
Mean error 0.03.
SD 0.83.
Similar to linear regression.
Projection Pursuit Prediction
Ma
x D
ose
-1 0 1 2 3
10
^-2
10
^-1
10
^01
0^1
10
^21
0^3
10
^4
Max Dose
SAMSI Working Group
March 2007
Nuclear Waste Repository: Quandaries• What is the best way to summarize the
results?
• Some factors were influential in determining the maximal dose, if present, but were not important for presence/absence.
• An important question is “how deep” an input configuration sits in the “no migration” region.
SAMSI Working Group
March 2007
Chemotherapy: Quandaries
• Both of these studies have multivariate output.
• In the chemotherapy study, we generate curves (vs. time) of the cell concentrations.
• In the ground shaking study, we get output at a grid of spatial values (on the ground surface). At each site, we have motion, velocity and acceleration as a function of frequency.
Ground Shaking: Quandaries
SAMSI Working Group
March 2007
Chemotherapy: Quandaries
• What are effective ways to summarize this high dimensional output?
Ground Shaking: Quandaries
SAMSI Working Group
March 2007
• Approach has been to compute simple low dimensional summaries.
• Focus on acceleration (of highest engineering importance).
• Summarize across frequencies by computing root mean square acceleration.
• Model RMS acceleration as a function of the spatial locations.
Ground Shaking: Quandaries
SAMSI Working Group
March 2007
Chemotherapy: Quandaries
• Data are much more “dense” in time than in the input factor space.
• We have looked at several methods for explicitly modeling the time dependence, then modeling those functions in terms of the input factors.
SAMSI Working Group
March 2007
Chemotherapy is given for 11 hrs: A (virtual!) patient is exposed to 3.95 mg of a steroid that decomposes at a rate of 1.487 mg/(cm3hr). Cancer cells grow at a rate of 0.0697.
We track the number of cancer cells in the patient’s body throughout the treatment duration, recording results every 6 minutes.
The result is a time dependent curve.
SAMSI Working Group
March 2007
Chemotherapy Data
0 20 40 60 80 100
0.9
85
0.9
90
0.9
95
SAMSI Working Group
March 2007
Chemotherapy data for several protocols
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
0 100 200 300 400
0.0
0.4
0.8
SAMSI Working Group
March 2007
Chemotherapy data for several protocols
What is a good approach to model the data?
Some options:
• Fit a B-spline to each curve, then model the parameters as a function of the inputs. Might add constraints using models for specific time points.
• Derive basis functions from the observed curves via functional cluster analysis. Use this on the “raw data” or to “scaled” data?
SAMSI Working Group
March 2007
0 100 200 300 400
0.70
0.80
0.90
0 100 200 300 400
0.0
0.2
0.4
0.6
0.8
0 100 200 300 400
0.5
0.6
0.7
0.8
0.9
0 100 200 300 400
0.5
0.6
0.7
0.8
0 100 200 300 400
0.5
0.6
0.7
0.8
0.9
0 100 200 300 400
0.6
0.7
0.8
0 100 200 300 400
0.5
0.6
0.7
0.8
0.9
0 100 200 300 400
0.92
0.96
1.00
1.04
0 100 200 300 400
0.0
0.2
0.4
0.6
0.8
0 100 200 300 400
0.4
0.5
0.6
0.7
0.8
0 100 200 300 400
0.2
0.4
0.6
0.8
0 100 200 300 400
0.4
0.6
0.8
0 100 200 300 400
0.0
0.2
0.4
0.6
0.8
1.0
0 100 200 300 4000.
800.
901.
000 100 200 300 400
0.80
0.90
0 100 200 300 400
0.2
0.4
0.6
0.8
0 100 200 300 400
0.6
0.7
0.8
0.9
0 100 200 300 400
0.2
0.4
0.6
0.8
0 100 200 300 400
0.65
0.75
0.85
0 100 200 300 400
0.6
0.7
0.8
0.9
1.0
Results from the B-spline models, on 20 independent test settings.
SAMSI Working Group
March 2007
Data driven basis functions
1. Select k - number of basis functions
2. Define distance function
3. Cluster data into k disjoint groups
4. Use cluster means as basis functions
SAMSI Working Group
March 2007
Shape V. Scale
Consider these functions:
SAMSI Working Group
March 2007
After transformation
0 100 200 300 400
1.0
1.2
1.4
1.6
1.8
2.0
SAMSI Working Group
March 2007
4 clusters for chemotherapy data
0 100 200 300 400
1.0
1.2
1.4
1.6
1.8
2.0
SAMSI Working Group
March 2007
First stage results
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 4001
.01
.41
.80 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
0.8
1.2
1.6
2.0
0 100 200 300 400
1.0
1.4
1.8
0 100 200 300 400
1.0
1.4
1.8
2.2