Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and

Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics

By Prof. Tim Johnson, PEWentworth Institute of Technology

Boston, MATheory and Design for Mechanical Measurement

by Richard Figliola

Content

• Introduction• Statistical Measurement Theory• Infinite & Finite Statistics• Chi2 distribution• Regression Analysis

Introduction

• For any set of measurement data an average and standard deviation from the average can be determined.

• The question is how close does this average represent all the measurements in the set?

• Would a different set of measurements be exactly the same?

• Do the variations meet the tolerances?• How well do the results describe the

measurement?

Statistical Goals

1. A single value that best characterizes the average of the data set.

2. A value that gives the variation in the data set from the average.

3. A probability that indicates how well the single average value represents the true average value of the variable measured.

Statistical Measurement Theory

• Definition: a sample is a set of data obtained during repeated measurements of a variable under fixed operating conditions.

• An assumption is that systematic error in the measurement is negligible—the average error in a data set is zero.

• The true value is denoted: x’. The average is . The average is also known as the mean. The uncertainty interval is ux. The probability level is P%

Random Variables

• One characteristic about measurements is a random scattering of the values obtained that collect around a central value. This behavior is called central tendency.

• In this sense, the measured variable behaves as a random variable.

• If the variable is continuous in time or space then it is a continuous random variable.

• If the variable is continuous but has only discrete values then it is discrete random variable.

• Probability deals with the concept that certain values of a variable will repeat with some frequency of occurrence.

Probability Density Functions

• The accumulation of the data points repeating about a central point creates a density that occurs with a certain probability.

• The central value and those values scattered about it can be determined from the probability density of the measured variable.

• The frequency with which the measured variable assumes a particular value is described by it probability density.

Problem Example 4.1 for small data sets

• This problem develops a statistical analysis of the data set from 20 sample measurements.

• Each sample, x, is numbered sequentially i from 1 to 20 where 20 is the total number of samples, N.

• The conditions for this sampling is that the readings taken under identical operating conditions.

Problem example continued

• The data is grouped into K small intervals… • Where the interval is defined as:

• The value of is determined by the formula: • Rule: at least one interval has 5 members. • A formula to calculate K is

• As N becomes large this formula tends to K • nj represents the number of data points in each interval

where j=1 to K.

Problem example continued

• The formula above states that the sum of the number of occurrences in each interval is equal to the total number of samples.

• Let fj be equal to the frequency of occurrences in each interval then the area under the percent frequency distribution curve will always equal the total frequency of occurrence or 100%:

where fj= nj/N

Probability Density Function (PDF)formula for this example

𝑝 (𝑥 )= lim𝑁→∞ ,𝛿 𝑥→0

𝑛 𝑗

𝑁 (2 𝛿𝑥 )

Figure 4.2 Histogram and frequency distribution for data in Table 4.1

INSERT FIGURE

The probability density function, p(x), above defines the probability that a measured variable might assume a particular value upon any individual measurement and graphically displays the central tendency interval wherein is contained the best estimate of the true mean value.

Other Types of DistributionsNormal—is used for most physical properties that are continuous or regular in time or space. Variations due to random error.

Log normal—used for failure or durability projections; events whose outcomes tend to be skewed toward the extremity of the distribution.

Poisson—used for events that occur randomly in time; p(x) refers to probability of observing x events over time.

Types of Distributions con’t

Weibull—Used in fatigue test; similar to log normal applications.

Binomial—Used in situations describing the number of occurrences, n, of a particular outcome during N independent tests where the probability of any outcome, P, is the same.

Rule

• Regardless of the type of distribution a variable can be described by its mean value and variance.

Calculation of the true mean value x’

𝑥′= lim𝑇→∞

1𝑇 ∫

0

𝑇

𝑥 (𝑡 )𝑑𝑡≈∫−∞

∞

𝑥𝑝 (𝑥 )𝑑𝑥

If the measured variable is described by discrete data xi where i=1 to N

𝑥′= lim𝑁→∞

1𝑁∑

𝑖=1

𝑁

𝑥 𝑖

Calculation of the true variance, or the width of the data variation

The standard deviation, …So there is one last step in calculating the standard deviation and that is to take the square root of the variance:

𝜎 2= lim𝑇→∞

1𝑇∫

0

𝑇

[𝑥 (𝑡 )−𝑥 ′ ]2𝑑𝑡≈∫−∞

∞

(𝑥−𝑥 ′ )2𝑝 (𝑥 ) 𝑑𝑥

Or for discrete data:

𝜎 2= lim𝑁→∞

1𝑁 ∑

𝑖=1

𝑁

(𝑥¿¿ 𝑖−𝑥 ′ )2¿

Infinite Statistics

• There are some fundamental difficulties in working with infinite sets…indicated in the integrals calculating the mean value and variance.

• Infinite statistics introduces the connection between probability and statistics.

• One useful distribution used to introduce infinite statistics is the normal or Gaussian distribution.

Gaussian Distribution

• This is a data set that is symmetrical about the central tendency such as the familiar bell curve.

• The PDF of a Gaussian distribution is:

• Let as the standardized normal deviation for the z variable which specifies an interval on p(x).

INSERT Figure 4.3, page 118

How to use this chart: If (x1-x’)/σ = 1.00 then p(z1) = .3413Probability would be 34.13 % or one standard deviation double-sided value is 68.26%. If (x1-x’)/σ = 2.00 then p(z1) = .4772Probability would be 47.72 % or two standard deviations double-sided value is 95.44%. Two standard deviations means 95% of the values for x are included in the confidence value.

Finite Statistics• Finite statistics is used to estimate the true

mean and true variance of a finite sample.• It provides only an estimate of these values

and describes only the behavior of the sample.• It estimates are called: • the sample mean value, • The sample variance, • The sample standard deviation, • These equations are reasonable regardless of

the type of PDF for the sample.

Extending finite statisticsthe t estimator

• The degrees of freedom, v, in a statistical estimate equate to the number of data points minus the number of previously determined statistical parameters used in estimating that value.

• The weight of z, the interval for the standard deviation, can be weighted to compensate for the difference between the statistical estimate and the expected infinite statistics for the variable.

• The variable tv,P is the t estimator which represents a precision

interval given at probability, P%, within which one should expect any measured value to fall. In table 4-4, you obtain t using v and Pxx (where xx is the probability desired. See example next slide.

From the example where N=20, to calculate xi subtract 3 to get v =17Then pick % of confidence that xi is include in the range, say 90%. In that case tv,P = 1.74This is the cofactor Sx is multiplied by in the equation on the last slide.

Standard Deviation of the Means• Finite sample sets will have somewhat

different statistic • The variation in the sample statistics will be a

normal distribution from the sample mean values about the true mean.

• The variance of the distribution of the mean values that could be expected can be estimated through the standard deviation of the means,

Pooled Statistics

• Replication are independent estimates of the same measured value, their data represents separate data samples that can be combined to provide a better statistical estimates of a measured variable.

• Samples that are grouped in a manner so as to determine a common set of statistics are said to be pooled.

• Use M replications of N samples and the equations on the next slide for pooled data.

Pooled Statistics Equations

• Pooled means of x:

• Pooled standard deviation of x:

• Pooled standard deviation of the means of x:

Pooled Statistics Equations if data set are not equal amounts

• The replications can be weighted by their particular degrees of freedom…

• Pooled mean of x is defined by its weighted mean:

• where j refers particular data set• Pooled standard deviation:• • and other definitions for degrees of freedom

from the text.

Chi-Squared Distribution

• The Chi-Squared distribution allows you to give an estimate of the variance (σ2) interval within a stated probability for N data points.

• For the the Chi-squared statistic is χ²= where v is the degrees of freedom defined as N-1.

1)( 22/

222/1P

Level of Significance

• In summary: • Thus the Chi-square probability is equal to

1- α• α is called the level of significance• The lower the χ² value the better a data set fits

the assumed distribution function. • Thus a high α (level of significance) the better the

fit. • This is called the Goodness-of-Fit Test

𝑃 ( 𝜒2 )=1−𝛼

Regression Analysis

• Regression analysis is used to establish a relationship between the measured variable and an independent variable.

• The analysis develops a formula that allows you to calculate one value given the other.

• This analysis is used to fit a curve to data. • The deviation between the actual data and

the curve is denoted by the notation: Sxy

Applying Statistics

• Excel provides some statistical analysis useful with this lab; such as, Average, Variance, Standard Deviation, and Regression Analysis.

• Using insert formula to add these formulas at the bottom of a column of numbers (remember to label what the number represents…)

• Regression analysis is available using Trendlines (click add R2 to graph).

• At the end of this lab you should be able to point to the better of the two measurement systems and state reason based on your mathematical analysis.

Sample spreadsheet

Measurement System Error estimateThe error in the system is directly related to the error in the calculation of the permeability. Using standard values for area, length, current, and number of turns, you should be able to calculate the permeability coming up with the same value as 4π*10-7. Column G is my calculated value and the difference (being the error) is in Column H. Finding the average of the error and its standard deviation is adequate assessment of the error in the measurement system.

ReproducibilityReproducibility refers to the closeness of agreement in results obtained from duplicate test carried out under changed conditions of measurements. Combining the results from all the various lab groups will allow us to measure that using Trendlines (linear) upon graphing the difference of the implied µ and the actual value of µ. Adding the R2 value to the chart shows the closeness of the fit and in this case using the sample date for Example 4-1, the lack of reproducibility.

Here we’ve added a linear Trendline to the graph of the differences to determine the R2 value. The value shows the repeatability error. If R2 is large there is a good likelihood of being able to repeat the test in a predictable fashion. For the instance shown, R2 is very, very small.

Creating a Histograph

• It is impossible from looking at the data on the previous slide to detect any pattern to the numbers.

• The only observation is that it looks like noise.• A histograph can bring order to disarray.• Statistical software packages can fix this or you

can write your own software. • Complete the homework on this topic and learn

how to make a histograph using Excel.

Documents

Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and