Upload
benjamin-neal
View
236
Download
1
Embed Size (px)
Citation preview
Statistics Presentation
Ch En 475Unit Operations
Quantifying variables(i.e. answering a question with a number)
1. Directly measure the variable. - referred to as “measured” variable
ex. Temperature measured with thermocouple
2. Calculate variable from “measured” or “tabulated” variables - referred to as “calculated” variable
ex. Flow rate m = r A v (measured or tabulated)
Each has some error or uncertainty
Some definitions:
x = sample means = sample standard deviation
m = exact means = exact standard deviation
As the sampling becomes larger:
x m s s
t chart z chart
not valid if bias exists (i.e. calibration is off)
A. Error of Measured Variable
Several measurementsare obtained for a single variable (i.e. T).
• What is the true value?• How confident are you?• Is the value different on different days?
Questions
• Let’s assume “normal” Gaussian distribution • For small sampling: s is known• For large sampling: s is assumed
How do you determine the error?
i
i
nx
x
small
large(n>30)
i
i xxn
s 22
11
i
i xxn
22
11
we’ll pursue this approach
Use z tables for this approach
Use t tables for this approach
Don’t often have this much data
Example
n Temp
1 40.1
2 39.2
3 43.2
4 47.2
5 38.6
6 40.4
7 37.7
9.407
)7.374.406.382.472.432.391.40( x
7.10
9.407.37
9.404.409.406.38
9.402.479.402.43
9.402.399.401.40
17
1
2
22
22
22
2
s
27.3s
Standard Deviation Summary
(normal distribution)
40.9 ± (3.27) 1s: 68.3% of data are within this range
40.9 ± (3.27x2) 2s: 95.4% of data are within this range 40.9 ± (3.27x3) 3s: 99.7% of data are within this range
If normal distribution is questionable, use Chebyshev's inequality:
At least 50% of the data are within 1.4 s from the mean. At least 75% of the data are within 2 s from the mean. At least 89% of the data are within 3 s from the mean.
The above ranges don’t state how accurate the mean is - only the % of data within the given range
Student t-test (gives confidence of where m (not data) is located)
1,f t where
nns
tx
2a=1- probabilityr = n-1 = 6
Prob. a t +-
90% .05 1.943 2.40
95% .025 2.447 3.02
99% .005 3.707 4.58
___9.40
5% 5%
ttrue mean
measured mean
2-tail
T-test Summary
41 ± 2 90% confident m is somewhere in this range
41 ± 3 95% confident m is somewhere in this range 41 ± 5 99% confident m is somewhere in this range
m= exact mean40.9 is sample mean
Comparing averages of measured variables
Day 1:
Day 2: 9n 2.67 s 2.37
7n 3.27s 9.40
yy
xx
y
x
What is your confidence that mx≠my (i.e. they are different)?
5.211
2
)1()1( 22
yxyx
yyxx
nnnn
snsn
yxt
nx+ny-2
1- confident different confident same
Larger t:More likelydifferent
1-tail
Problem 1
1. Calculate average and s for both sets of data2. Find range in which 95.4% of the data falls (for each set).3. Determine range for m for each set at 95% probability4. At what confidence are pressures different each day?
Data points
PressureDay 1
PressureDay 2
1 750 730
2 760 750
3 752 762
4 747 749
5 754 737
Example: You take measurements of r, A, v to determine m = rAv. What is the range of m and its associated uncertainty?
Calculate variable from multiple input (measured, tabulated, …) variables (i.e. m = rAv)
What is the uncertainty of your “calculated” value?
Each input variable has its own error
B. Uncertainty of Calculated Variable
Details provided in Applied Engineering Statistics, Chapters 8 and 14, R.M. Bethea and R.R. Rhinehart, 1991).
To obtain uncertainty of “calculated” variable
• DO NOT just calculate variable for each set of data and then average and take standard deviation
• DO calculate uncertainty using error from input variables: use uncertainty for “calculated” variables and error for input variables
Plan: obtain max error (d) for each input variable then obtain uncertainty of calculated variable Method 1: Propagation of max error - brute force Method 2: Propagation of max error - analytical Method 3: Propagation of variance - analytical Method 4: Propagation of variance - brute force -
Monte Carlo simulation
Value and Uncertainty
• Value used to make decisions - need to know uncertainty of value• Potential ethical and societal impact• How do you determine the uncertainty of the value?
Sources of uncertainty (from Rhinehart, Applied Engineering Statistics, 1991):1. Estimation - we guess!2. Discrimination - device accuracy (single data point)3. Calibration - may not be exact (error of curve fit)4. Technique - i.e. measure ID rather than OD5. Constants and data - not always exact!6. Noise - which reading do we take?7. Model and equations - i.e. ideal gas law vs. real gas8. Humans - transposing, …
Estimates of Error (d) for input variables(d ’s are propagated to find uncertainty)
1. Measured: measure multiple times; obtain s; d ≈ 2.5s Reason: 99% of data is within ± 2.5s
Example: s = 2.3 ºC for thermocouple, d = 5.8 ºC
2. Tabulated : d ≈ 2.5 times last reported significant digit (with 1) Reason: Assumes last digit is ± 2.5 (± 0 assumes perfect, ± 5 assumes next left digit is fuzzy)
Example: r = 1.3 g/ml at 0º C, d = 0.25 g/ml Example: People = 127,000 d = 2500 people
Estimates of Error (d) for input variables
3. Manufacturer spec or calibration accuracy: use given spec or accuracy data Example: Pump spec is ± 1 ml/min, d = 1 ml/min
4. Variable from regression (i.e. calibration curve): d ≈ 2.5*standard error (std error is stdev of residual) Example: Velocity is slope with std error = 2 m/s
5. Judgment for a variable: use judgment for d Example: Read pressure to ± 1 psi, d = 1 psi
Estimates of Error (d) for input variables
If none of the above rulesapply, give your best guess
Example: Data from a computer show that the flow rate is 562 ml/min ± 3 ml/min (stdev of computer noise). Your calibration shows 510 ml/min ± 8 ml/min (stdev). What flow rate do you use and what is d?
In the following propagation methods, it’s assumed that there is no bias in the values used - let’s assume this for all lab projects.
Method 4: Monte Carlo Simulation (propagation of variance – brute
force) Choose N (N is very large, e.g. 100,000) random
±δi from a normal distribution of standard deviation σi for each variable and add to the mean to obtain N values with errors: • rnorm(N,μ,σ) in Mathcad generates N random numbers
from a normal distribution with mean μ and std dev σ Find N values of the calculated variable using the
generated x’i values. Determine mean and standard deviation of the N
calculated variables.• y = yavg ± 1.96 SQRT(s2
y) 95%
• y = yavg ± 2.57 SQRT(s2y) 99%
d 'i i ix x
Monte Carlo Simulation Example
Estimate the uncertainty in the critical compressibility factor of a fluid if Tc = 514 ± 2 K, Pc = 61.37 ± 0.6 bar, and Vc = 0.168 ± 0.002 m3/kmol?
Example: Propagation of variance
Calculate r and its 95% probable error
)4/( 2DLM
All independent variables were measuredmultiple times (Rule 1); averages and s are given
M = 5.0 kg s = 0.05 kgL = 0.75 m s = 0.01 mD = 0.14 m s = 0.005 m
Monte CarloExample Problem
M
L D
2
4
Mav 5 Lav 0.75 Dav 0.14
sM 0.05 sL 0.01 sD 0.005
dM 2.5 sM dL 2.5 sL dD 2.5 sD
Monte Carlo
Mtrial rnorm N Mav sM Ltrial rnorm N Lav sL Dtrial rnorm N Dav sD
av mean4 Mtrial
Ltrial Dtrial2
434.738 s stdev4 Mtrial
Ltrial Dtrial2
32.091
uncertainty 1.96s 62.899
Problem 2 The DIPPR Database reports the following
measured property values with associated uncertainties for methyl propionate:• TC = 530.6 K ± 1%• PC = 40.04 bar ± 3%• VC = 0.282 m3/kmol ± 5%
Uncertainties are essentially 95% confidence intervals or 1.96s.
Determine ZC and use a Monte Carlo calculation to determine a 95% confidence interval.
Overall Summary
• measured variables: use average, std dev (data range),
and student t-test (mean range and mean comparison)
• calculated variable: determine uncertainty -- Max error: propagating error with brute force -- Max error: propagating error analytically -- Probable error: propagating variance analytically
-- Probable error: propagating variance with brute force (Monte Carlo)
Data and Statistical Expectations
1. Summary of raw data (table format)2. Sample calculations– including statistical
calculations3. Summary of all calculations- table format
is helpful4. If measured variable: average and standard
deviation, confidence of mean5. If calculated variable: Uncertainty using
Monte Carlo (if possible)