Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Sandia National Laboratories is a multimis-sion laboratory managed and operated byNational Technology and Engineering So-lutions of Sandia, LLC., a wholly owned
subsidiary of Honeywell International, Inc.,for the U.S. Department of Energy’s National
Nuclear Security Administration under contractDE-NA-0003525. SAND NO. 2019-3409 C
Using Bayesian Neural Networks for UQof Hyperspectral Image Target Detection
AUTHORS
Daniel Ries, Braden Smith,Josh Zollweg,Lauren Hund
Outline
1. What is a neural network?2. What is a Bayesian neural network (BNN)?3. Estimating BNNs with variational inference4. Hyperspectral images and Megascene5. BNN on Megascene6. Concluding remarks
What is a neural network?
What is a neural network?
Logistic Regression as a Neural Network
Suppose Y1, ...,Yn are binary random variables, and x1, ...,xn arep-length vectors
If want to estimate P(Yi = 1|xi), we could use logistic regression:
Yiiid∼ Bernoulli(πi)
log(
πi1− πi
)= x′
iβ
Logistic Regression as a Neural Network
𝑥2
.
.
.
𝑥1
𝑥𝑝
𝑃 𝑌 = 1 𝒙𝑌
𝑖=1
𝑝
𝑥𝑖𝑤𝑖 + 𝑏
= 𝑧
𝑔(𝜋)𝑓 𝑧 = 𝜋
= 𝜋
A Simple Neural Network
𝑥2
.
.
.
𝑥1
𝑥𝑝
𝑃 𝑌 = 1 𝒙𝑌
𝑖=1
𝑝
𝑥𝑖𝑤1𝑖 + 𝑏1
= 𝑧11
𝑖=1
𝑝
𝑥𝑖𝑤2𝑖 + 𝑏2
= 𝑧12
𝑖=1
2
𝑣1𝑖𝑤3𝑖 + 𝑏3
= 𝑧21
𝑓2 𝑧21 = 𝜋 𝑔(𝜋)
𝑓1 𝑧11 = 𝑣11
𝑓1 𝑧12 = 𝑣12= 𝜋
More Complicated Neural Network
𝑥2
.
.
.
𝑥1
𝑥𝑝
𝑃(𝑌 = 1|𝒙) 𝑌
𝑖=1
𝑝
𝑥𝑖𝑤11𝑖 + 𝑏11
= 𝑧11
𝑖=1
𝑝
𝑥𝑖𝑤12𝑖 + 𝑏12
= 𝑧12
𝑖=1
𝑝
𝑥𝑖𝑤1𝑖 + 𝑏13
= 𝑧13
𝑖=1
𝑝
𝑥𝑖𝑤1𝐾1𝑖 + 𝑏1𝐾1
= 𝑧1𝐾1
𝑖=1
𝐾1
𝑣1𝑖𝑤21𝑖 + 𝑏21
= 𝑧21
𝑖=1
𝐾1
𝑣1𝑖𝑤22𝑖 + 𝑏22
= 𝑧22
𝑖=1
𝐾1
𝑣1𝑖𝑤23𝑖 + 𝑏23
= 𝑧23
𝑖=1
𝐾1
𝑣1𝑖𝑤2𝐾2𝑖 + 𝑏2𝐾2
= 𝑧2𝐾2
𝑖=1
𝐾2
𝑣𝐿𝑖𝑤𝐿𝑖 + 𝑏𝐿
= 𝑧𝐿
.
.
.
.
.
.
𝑓 𝑧𝐿
𝑓1 𝑧1𝑖= 𝑣1𝑖
…
𝑓𝐿 𝑧𝐿𝑖= 𝑣𝐿𝑖
= 𝜋
𝑔(𝜋)
What is a Bayesian Neural Network?
For those of you who “do machine learning, not statistics”:BNNs are simply NNs that have an implied loss function (youdon’t have to choose and you get UQ for free!)
For those of you who “do statistics, not machine learning”:BNNs are simply a highly nonlinear model where you put a prioron your parameters (and a Bernoulli likelihood for our binarydata)
For those of you who “do both”:You’re not fooling anyone, I know you associate with one or theother.
For those of you who “do neither”:You are already “resting your eyes.”
What is a Bayesian Neural Network?
For those of you who “do machine learning, not statistics”:BNNs are simply NNs that have an implied loss function (youdon’t have to choose and you get UQ for free!)
For those of you who “do statistics, not machine learning”:BNNs are simply a highly nonlinear model where you put a prioron your parameters (and a Bernoulli likelihood for our binarydata)
For those of you who “do both”:You’re not fooling anyone, I know you associate with one or theother.
For those of you who “do neither”:You are already “resting your eyes.”
What is a Bayesian Neural Network?
For those of you who “do machine learning, not statistics”:BNNs are simply NNs that have an implied loss function (youdon’t have to choose and you get UQ for free!)
For those of you who “do statistics, not machine learning”:BNNs are simply a highly nonlinear model where you put a prioron your parameters (and a Bernoulli likelihood for our binarydata)
For those of you who “do both”:You’re not fooling anyone, I know you associate with one or theother.
For those of you who “do neither”:You are already “resting your eyes.”
What is a Bayesian Neural Network?
For those of you who “do machine learning, not statistics”:BNNs are simply NNs that have an implied loss function (youdon’t have to choose and you get UQ for free!)
For those of you who “do statistics, not machine learning”:BNNs are simply a highly nonlinear model where you put a prioron your parameters (and a Bernoulli likelihood for our binarydata)
For those of you who “do both”:You’re not fooling anyone, I know you associate with one or theother.
For those of you who “do neither”:You are already “resting your eyes.”
What is a Bayesian Neural Network?
For those of you who “do machine learning, not statistics”:BNNs are simply NNs that have an implied loss function (youdon’t have to choose and you get UQ for free!)
For those of you who “do statistics, not machine learning”:BNNs are simply a highly nonlinear model where you put a prioron your parameters (and a Bernoulli likelihood for our binarydata)
For those of you who “do both”:You’re not fooling anyone, I know you associate with one or theother.
For those of you who “do neither”:You are already “resting your eyes.”
An Example BNN
Yi ∼ Bernoulli(πi)
πi = f2(z21)
= f2
2∑j=1
f1
( p∑k=1
xkwjk + bj
)w2j + b3
wjk
iid∼ N (0, 1), ∀j, k
bliid∼ N (0, 1), l = 1, 2, 3
𝑥2
.
.
.
𝑥1
𝑥𝑝
𝑃 𝑌 = 1 𝒙𝑌
𝑖=1
𝑝
𝑥𝑖𝑤1𝑖 + 𝑏1
= 𝑧11
𝑖=1
𝑝
𝑥𝑖𝑤2𝑖 + 𝑏2
= 𝑧12
𝑖=1
2
𝑣1𝑖𝑤3𝑖 + 𝑏3
= 𝑧21
𝑓2 𝑧21 = 𝜋 𝑔(𝜋)
𝑓1 𝑧11 = 𝑣11
𝑓1 𝑧12 = 𝑣12= 𝜋
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMC
But this can be slowwww
Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww
Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww
Enter Variational Inference:
• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww
Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww
Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww
Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Estimating (or Training) BNNs
Bayesians love to sample their posterior distributions via MCMCBut this can be slowwww
Enter Variational Inference:• Instead approximate posterior with q(θ) by solving:
argminq∗
KL(q∗(θ)||p(w,b|y))
= argminq∗
KL(q∗(θ)||p(w,b, y))
• Put restrictions on set of q∗
• Result is a loss function we can do gradient descent as usualon
Hyperspectral Images (HSI)
• Airborne spectrometers construct a (x, y, z) tensor• x and y describe the spatial dimension• z describes the spectra at a single pixel (x, y)
• HSI Target detection is where specific materials are identifiedby their reflectance
• The target object might be smaller than the projection of apixel on the ground
Megascene
• Simulated HSI scene using DIRSIG model• Targets inserted using paint reflectanceand bi-directional reflectancedistribution function data from NEFDS
• 3 environments, and 3 times of day for 9total scenes
• 125 targets placed randomly throughouteach scene
• Targets range from large to subpixel(radius of 0.1m to 4m), size distributednearly uniform
• Some targets only partially visible
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output
• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases
• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum
• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times
• Software: Edward and Tensorflow
BNN on Megascene
• Architecture: Input-10-5-2-Output• Activations: Hyperbolic tan between hidden layers, inverselogit for output
• Priors: Standard normal on all weights and biases• Optimizer: Stochastic gradient descent with momentum• Training Data: Left half of mid latitude summer environmentat 1200
• Test Data: Right half of all environments at all times• Software: Edward and Tensorflow
Test Set Performance
Full test set. High confidence set.
High confidence set: 80% credible interval forπ ∈ (0, 0.2) ∪ (0.8, 1)
Test Set Performance
Moving Forward
There is still a lot of work to be done on BNNs
• Explore deeper and wider, more complicated (convolutional)architectures
• Full tuning of hyperparameters• Explore effects of prior• Assess coverage of credible intervals
Thank you for listening!
Test Set Performance
BNN
Stan-dardNN
Full test set High confidence set
Test Set Performance
Abundance vs Probability