Discovery of the Source of Contaminant Releasecs229.stanford.edu/proj2013/SanjayaQin-DiscoveryOfThe...Discovery of the Source of Contaminant Release Devina Sanjaya Henry Qin 1 Introduction

Discovery of the Source of Contaminant Release

Devina Sanjaya Henry Qin

1 Introduction

Computer ability to model contaminant release events and predict the source of release in real time is crucial in variousapplications, especially in environmental safety monitoring and homeland security. In the event of unintentional industrialaccidents or biological attacks in urban environments, immediate, accurate response is required. A real-time computerprogram that provides information about the identification of the contaminant, the source of the release, and the pre-diction of the subsequent path of contamination can be used to assist the decision making process for evacuation andcountermeasures.

The contaminant source inversion problem involves intricate geometry, uncertain flow conditions, and limited, noisysensor readings. Moreover, the problem is generally ill-conditioned in the sense that small changes in the sensor readingscan cause large changes in the calculated source of release [2]. This makes single-point deterministic calculations notrobust. The lack of robustness is due to the fact that some inputs might produce nearly the same outputs, especially whenmeasurement error is taken into account. To increase robustness, statistical approaches are used, but it often requireslarge sampling and numerous forward simulations which can quickly become computationally expensive.

Multiple previous studies have been conducted to reduce the computational cost, such as grid coarsening, reduced-ordermodeling, and stochastic expansions. Previous studies have also considered applying uncertainty quantification methodsto analyze the propagation of input uncertainties. In this paper, we combine machine learning models and computationalfluid dynamics to discover the source of release for large-scale problems in real time. Various machine learning modelsare evaluated for robustness to noisy sensor readings and limited training data. We will also compare our results withstatistical results from Markov Chain Monte Carlo (MCMC) with single walker [3] and ensemble walkers [1].

2 Data Format

Our training and test data are obtained using the computational fluid dynamics software (XFLOW) developed by Dr.Fidkowski at the University of Michigan: Ann Arbor. For the 2D case, we simulate a contaminant release around crosssections of buildings (see figure 1 (left)). Five sensors are placed around the buildings in a pseudorandom fashion withoutiteration or tuning. Each sensor takes three readings spaced equally in time, for a total of 15 sensor readings. A spatialapproximation order of p = 2 is used and the Peclet number for the simulations, based on the mean velocity and domainsize in the x-direction, is Pe = 100. We use the 15 sensor readings as input features to our machine learning models, andX and Y as the output features. Each forward simulation used to obtain sensor readings was completed in less than 1minute when parallelized with 8 processors.

Figure 1: Mesh and setup used during CFD simulation for two (left) and 3D case (right). Sensor readings from both casesare used as our training and test sets.

1

For the 3D case, we simulate contaminant release in a realistic urban area (see figure 1 (right)); this domain is the sameas in Lieberman et al [4]. There are 10 sensors placed around the buildings chosen in a pseudorandom fashion withoutiteration or tuning. Each sensor provides 4 readings for a total of 40 sensor readings. A spatial approximation order ofp = 1 was used and the Peclet number for the simulations, based on the mean velocity and the domain extent in thedirection of velocity, was Pe = 50. Our input features are the 40 sensor readings, and our outputs are the X, Y , Z, andAmplitude. Each forward simulation takes about 8 minutes on 100 processors.

3 Implementation & Discussion

In this section, we will discuss how we implement machine learning models to our problem. All of our models are trainedusing the statistical programming language R or Matlab. For the purpose of the discussion below, we assume that weare only attempting to predict X, since predicting other variables (Y in 2D case and Y,Z, and Amplitude in 3D case) issymmetric. Moreover, to create realistic test cases, sensor errors are considered; we consider both uniform and Gaussianerror distributions. Due to our time constraint, we will mainly discuss the results from the 2D case.

3.1 Test Error Definition

All reported test errors are defined as a percentage of the interval over the parameter which we are making predictions.Equation 1) shows how we compute the test error when predicting X.

%Test Error =|x− x̂|

(max(X)−min(X))(1)

where x is the true value of X, x̂ is the predicted value of X, max(X) is the maximum of the interval of X, and min(X)is the minimum of the interval of X.In the 2D case, the X ∈ [0, 1], and in the 3D case, X ∈ [0, 4.71]. We believe thisdefinition of test error percentage, rather than the standard definition of (Expected - Actual) / Actual, allows us tofairly compare predictions across examples with different true locations of the source of release. Intuitively, since we aretrying to predict a location rather than a quantity, an error of 0.01 model units should be interpreted the same waywhether it is an error from of a ground truth of 0.1 or a ground truth of 0.2, and our definition of error reflects this.Furthermore, this percentage error deifinition enables us to compare the test errors between 2D and 3D cases.

3.2 Perfect Sensor Readings

First, we consider the case where all sensor readings are perfect. For our 2D test case, we have 144 examples in total:72 for training and 72 for testing. We found that ordinary linear regression, which directly models the output values asa linear combination of the raw input feature values, does not perform well, with a mean error of 24%. However, linearregression with logarithmic feature mapping (see equation 2) performs quite well, with a mean error of 1%. This suggeststhat there is a log relationship between the sensor readings and the location of contaminant release.

X = β0 + β1 log r1 + β2 log r2 + ...+ β15 log r15 (2)

To use log-transformed model, we first replace any readings that are less than equal to zero with the fixed constant1×10−10, take the log of each sensor reading, and then apply multiple linear regression. Figure 2 (left) shows the residualsplot from our testing. This figure shows the mean error of 1% and standard deviation of 1% in predicting the source ofrelease.

Next, we consider the 3D case. Here, we have 256 examples in total: 200 examples for training and 56 examplesfor test. As with the 2D case, linear regression does not perform well, while linear regression with logarithmic featuremapping works well, with a mean error of 22.5%. Performing the same steps as before, we found a mean error of 0.26%and standard deviation of 0.26% as shown in Figure 2 (right). We acknowledge that these errors are unusually low, butdo not claim that these errors will generalize to other 3D simulations, even assuming perfect sensors.

2

●●●●●

●●●●●●●●●

●

●●●

●

●●●●

●

●●●

●

●●●●●●

●

●●●●

●

●●●●

●●

●●●●●●

●●

●●●

●●

●●

●

●

●

●●

●●●●

●

●

0 10 20 30 40 50 60 70

−0.

20.

00.

20.

4

Test Examples

Res

idua

ls

●●●●●●

●

●●

●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●

●●●●●●

●

●●●●

●●●●●

●●●

●●●●●

●

●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●

●●●●●●

●●

●

●

●●●

●●

●●●●●

●

●●●●●●

●●●●●●●●

●

●●●●●●●

0 20 40 60 80 100 120

−0.

20.

00.

20.

4

Test Examples

Res

idua

ls

Figure 2: Test residuals of linear regression with logarithmic feature mapping from the 2D (left) and 3D (right) cases withperfect sensor readings

3.3 Sensor Readings with Uniform Error Distribution

Having successfully modeled the simple case, we moved onto a more complex case: uniform, substantial sensor error.Based on knowledge of the field, 1× 10−2 is a reasonable sensor error. To apply the uniform error, we add the same fixedconstant to each of our sensor readings, and treat these “perturbed” sensor readings as the new raw features. Duringtraining, we naturally assume that the constant is unknown, as it would be in practice. Our training and test sets are thesame as in the previous case.

Unfortunately, this more complex case clearly demonstrated that our previous method was not robust against uniformsensor error, as our test errors increased by an order of magnitude. This behavior is consistent with the ill-conditionednature of the inverse problem. In retrospect, we could have anticipated this. When systematic sensor error is added, thetrue model starts to look like the function in equation 3, while we were still trying to model it using equation 2.

X = β0 + β1 log (r1 + ε) + β2 log (r2 + ε) + ...+ β15 log (r15 + ε) (3)

We tried to model equation 3 by using R’s nonlinear least squares nls package, but we ran into singularity problem.Since direct model fitting did not pan out, we implemented a hill-climbing algorithm in an attempt to greedily discover

the value of the hidden constant ε. More specifically, our hill-climbing algorithm varies the value of ε to find the maximumR2 statistic for a least squares fit against log (ri − ε). The procedure is as follows:

1. Initialize a step size s to a constant 0.001.

2. Choose a random starting value ε0 ∈ [1× 10−2, 5× 10−2].

3. Fit a least squares model using the features log (ri − ε0).

4. Fit two more least squares models using ε = ε0 − s and ε = ε0 + s.

5. Set ε0 to be equal the ε in the model above which produced the highest R2 statistic.

6. Half the step size s, so s← s/2.

7. If R2 > 0.99, terminate the algorithm and report ε0. Otherwise, return to Step 3.

The above algorithm is able to pinpoint ε in the 2D case well, and thus, we can substitute ε to 3 before modeling the data.However, this hill-climbing algorithm does not work well in 3D case due to multiple local maxima.

3.4 Mixed Sensor Readings

Now, we will consider the case where some sensor readings might happen to contain no errors while others have uniform orGaussian-distributed errors. To simulate these errors, we first replicate the original set of examples three times, creatinga new data set with three times the number of examples. Next, we add a constant error term to one full replica, addGaussian-distributed errors (mean 0, stddev 1E − 2) to the second full replica, and then randomize the order of the data

3

set. From this mixed data set, we randomly select half the examples for use in training, and hold out the other halfas a test set. Multiple machine learning models are trained, such as linear regression, linear regression with logarithmicfeature mapping, locally weighted linear regression with logarithmic feature mapping, decision tree, boosting, randomforest, and K-nearest neighbors. Figure 3 compares the various models based on different testing error metrics (mean,standard deviation, median, and 90th percentile) for our 2D case. Here, we can see random forest gives us the lowestmean testing error, which is about 5% and the lowest 90th percentile error, which is about 10%. Figure 4 shows the testresiduals of random forest for our 2D and 3D cases.

Mean StdDev Median 90%0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45T

estin

g E

rror

s

Error Metrics

LinearLinear with LogLoc. Weighted Linear with LogDecision TreeBoostingRandom ForestK−nearest Neighbor

Figure 3: Testing error metrics of all machine learnings method applied to the 2D case with mixed sensor readings

●

●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●

●●●●

●

●●

●

●

●

●

●

●●

●

●●

●●●

●●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

0 50 100 150 200

−0.

20.

00.

20.

4

Test Examples

Res

idua

ls

●●●●●●●

●●●●●●●●●●●●●●

●

●

●

●●●●●●

●

●●●●●●●

●●●●

●

●●

●

●●●●

●

●●●●●●●●●

●

●●●●●●●●●●●●●●

●●●●

●

●

●●●●●●●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●

●

●●

●

●●●●●

●

●●●●●●●●●●●●

●●●●●●●●

●

●

●●●●●●●●●●

●

●●●●

●

●●●●●●

●

●●●●●●●●●●●●●●●●●●

●

●●●●

●

●●●●●●●

●

●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●

●

●●

●

●●●●●●●●●

●

●●●●●●

●

●

●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●

●

●

●●●

●●●●●●●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●

●

●●●

●●●●

●

●●●●●●

●

●

●

●●●●●●●

●

●●●●●●●●●●●●

0 100 200 300

−1.

0−

0.5

0.0

0.5

1.0

1.5

Test Examples

Res

idua

ls

Figure 4: Test residuals of random forest applied on the 2D (left) and 3D (right) cases with mixed sensor readings. Notethat these figures are not on the same scale because the 2D and 3D cases have different ranges for their dimensions.

4

3.5 Comparison with MCMC

We compare our results with statistical results from MCMC with single and ensemble walkers presented in[5]. Using MCMCand noisy sensor readings of 1× 10−2, we obtained less than 1% in predicting X for both 2D and 3D cases. Although theresults from MCMC are far more accurate, it takes substantial time to obtain a single prediction from MCMC becausegenerating a set of sensor readings at multiple proposed location of the MCMC walker(s) is time consuming. For instance,the 2D case converges in about 4 minutes using 32 processors and the 3D case converges in about 6 minutes using 100processors, On the other hand, we can train a Random Forest and evaluate hundreds of examples in less than 1.5 secondson a single processor.

4 Conclusion

To summarize, we make the following contributions in this paper.

• With perfect sensor data, the relationship between sensor readings and the contaminant source is a simple log-linearone.

• Out of all the models we experimented with, Random Forest proved to be the most robust against noisy data.

• Comparing to MCMC, supervised learning requires far less computational power, but is less accurate and less robustto noisy data.

• To increase the robustness of supervised learning to noisy data, more research is required.

To the best of our knowledge, predicting the location of contaminant release in a realistic setting remains an open problem.

5 Acknowledgement

We gratefully acknowledge Dr. Fidkowski at the University of Michigan: Ann Arbor for the use of his computing resourcesand simulation software (XFLOW) in generating our training and test data.

References

[1] J. Goodman and J. Weare. Ensemble samplers with affine invariance. Communications in Applied Mathematics andComputational Science, 5(1):65–80, 2010.

[2] J. Hadamard. Lectures on the Cauchy Problem in Linear Partial Differential Equations. Yale University Press, 1923.

[3] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109,1970.

[4] C. Lieberman, K. Fidkowski, K. Willcox, and B. van Bloemen Waanders. Hessian-based model reduction: large-scaleinversion and prediction. International Journal for Numerical Methods in Fluids, 2012.

[5] D. Sanjaya, I. Tobasco, and K. Fidkowski. Adjoint-accelerated statistical and deterministic inversion of atmosphericcontaminant transport. ”Unpublished”.

5

Documents

Discovery of the Source of Contaminant Releasecs229.stanford.edu/proj2013/SanjayaQin-DiscoveryOfThe...Discovery of the Source of Contaminant Release Devina Sanjaya Henry Qin 1 Introduction