Certified Quality Engineer Programme (CQE)
Module 6 Quantitative Methods Part 1ByAssociate Professor Dr Sha’ri M. YusofFaculty of Mechanical EngineeringUniversiti Teknologi Malaysia, Skudai, Johor
2
Basic Concepts Of Probability
Probability is a measure that describes the chance that an event will occur.
Dimensionless number ranges from zero to one - with 0 meaning an impossible event and 1 refer to event that is certain to occur.
Probability of 0.5 means the event is just as likely to occur as not.
3
Basic Concepts Of Statistics The word statistics has two
generally accepted meaning: A collection of quantitative data
pertaining to any subject or group, especially when the data are systematically gathered and collated.
The science that deals with the collection, tabulation, analysis, interpretation and presentation of quantitative data.
4
Basic Concepts Of Statistics
The use of statistics in quality engineering deals with the second meaning and involves
Collecting Tabulating Analyzing Interpreting Presenting data
5
Collecting And Summarizing Data
Descriptive Statistics to describe and analyze a subject or
group analytical techniques summarize
data by computing a measure of central tendency a measure of the dispersion.
6
Measure of Central Tendency
A measure of central tendency of a distribution is a number that describes the central position of the data or how the data tend to build up in the center.
Three measures commonly used : 1) average2) median3) mode
7
Average
It is the sum of the all the observations divided by the number of observations
3 different techniques available for calculating the average
1) ungrouped data2) grouped data3) weighted average
8
Average
Ungrouped data.This method is used when the data are unorganized.The average is represented by the symbol x, which is read as “x bar” and is given by the formula;
x = xi / n = (x1 + x2 +….+xn)/nwhere x = averagen = observed values
x1, x2,...,xn = observed value identified by the subscripts 1,2,..n or general
subscript i = symbol meaning “sum of ”
9
ExampleA food inspector examined a random
sample of 7 cans of tuna to determine the percent of foreign impurities. The following data were recorded :
1.8, 2.1, 1.7, 1.6, 0.9, 2.7 and 1.8Compute the sample mean.x = xi / n =
(1.8+2.1+1.7+1.6+0.9+2.7+1.8)/7 = 1.8% impurities
10
Exercise
In studying the drying time of a new acrylic paint, the data in hours, were coded by subtracting 5.0 from the observation.
Find the sample mean and sample standard deviation (s) for the drying times of 10 panels of wood using the paint if the coded measurements are :1.4 , 0.8, 2.4, 0.5, 1.3, 2.8, 3.6, 3.2, 2.0,
1.9
11
Grouped data.When data have been grouped into frequency distribution, the following technique is applicable. Formula for the average of grouped data x = (fiXi)/n = (fiX1 + f2X2 + …+fhXh) / (f1 + f2+…+fh )
where n = sum of the frequency
fi = frequency in a cell or frequency of an observed value
xi = cell midpoint or an observed value
h = no. of cells or no. of observed values
12
Example
Frequency Distribution for Weights of 50 componentsClass
IntervalWeight
(g)
Class Boundary
Class
mid-point (xi)
No of pieces
(fi)fixi fixi
2
7 – 9 6.5 –9.5 8 2
10 – 12 9.5 – 12.5 11 8
13 – 15 12.5 – 15.5 14 14
16 – 18 15.5 – 18.5 17 19
19- 21 18.5 – 21.5 20 7
Totals () 50
13
Weighted average
When a number of averages combined with different frequencies, a weighted average can be computed
The formula for the weighted average is given by :xw = wixi
wi
where xw = weighted average
wi = weight of the i th average
14
Example – weighted average
On a trip a family bought 21.3 litres of gasoline at 1.21 per litre, 18.7 litres at 1.29 cents per litre, and 23.5 litres at 1.25. Find the mean price per litre.
15
Median
Median is the middle value for a set of data arranged in an increasing or decreasing order
Case 1 - when the number of data in the series is odd – middle value
Case 2 - when the number of data is even - median is the average of the two middle numbers
Example (case 1) – 5 test results 82, 93, 86, 92, 79 What is the median? Arrange data. Answer = 86 Example (Case 2) – The nicotine contents for a random
sample of 6 cigarettes of a certain brand are found to be 2.3, 2.7, 2.5, 2.9, 3.1 and 1.9
If we arrange in increasing order of magnitude , we get = 1.9 2.3 2.5 2.7 2.9 3.1 , and the median is the mean of 2.5 and 2.7.
Therefore, x = (2.5+2.7)/2 = 2.6 milligrams
16
MedianGrouped Data When data grouped into frequency distribution, the median is obtained by finding the cell that has the middle number and then interpolate within the cell.Formula for computing median :
Md = Lm + I
where Md = medianLm = lower boundary of the cell with the median
n = total no. of observationsfm = frequency of median cell
cfm = cumulative frequency of all cells below Lm
I = cell interval
n/2 –cfm
fm
17
Mode Mode of a set of numbers (data) is the value that
occurs with the highest frequency Possible for mode to be nonexistent in a series of
numbers or to have more than one value. A series of numbers is referred to as unimodal if
it has one mode, bimodal if it has two modes and multimodal if there are more than two modes.
Data grouped into frequency distribution, the midpoint of the cell with the highest frequency is the mode, since this point represents the highest point (highest frequency) of the histogram
18
Measure of Dispersion
Measures of dispersion describe how the data are spread out from the average or scattered on each side of the central value.
Common measures Range (simplest) Standard deviation Variance
19
Range Range of a series of numbers is the
difference between the largest and smallest values or observations.
R = Xh - Xl
where R = rangeXh = highest observation in a series
Xl = lowest observation in a series Example – The temperature for a process
recorded 40.2 , 38.7, 42.5, 39.6, 40.9. What is the value of range?
20
Standard Deviation Standard deviation - numerical value in the units of the
observed values that measures the spreading (variation) of the data.
Large standard deviation - greater variability of the data than smaller standard deviation, given by the formula:
s = (xi – x)2 / (n-1)
where s = sample standard deviation
xi = observed value ith
x = average
n = number of data (observed values) It is reference value that measures the dispersion in the
data
21
Exercise
A car manufacturer tested a random sample of 10 steel-belted tyres of a certain brand and recorded the following tread wear: 48000, 53000, 45000, 61000, 59000, 56000, 63000, 49000, 53000 and 54000 kilometers. Find the standard deviation of this set of data.
22
Collecting AndSummarizing Data
Consider the data below which represents the lives of 40 similar car batteries recorded to nearest tenth of a year. What can you learn from these numbers?
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
23
Frequency distribution Group large number of data into different
classes (groups) and determining the number of observations that fall into each group
Decide no of classes – too few lose information, too many also no meaning
Usually choose 5 – 20 classes Let us choose 7 classes – class width must be
enough to put in all the data Approximate width – find range divide by no of
classes = (4.7-1.6)/7 = 0.443 should have same no of significant places as data, therefore choose the value 0.5
24
Frequency distribution Decide where to start bottom interval – start at
1.5 and lower boundary is 1.45. Then add width 1.45 +0.5 = 1.95 continue for the others
Midpoint is (1.5+1.9)/2 = 1.7 Count the no of observations and record in the
table Total the frequency to check all data has been
counted
25
Frequency distribution
Class interval
Class boundaries
Class midpoint
Frequency
1.5 – 1.9 1.45 – 1.95 1.7 2
2.0 – 2.4 1.95 –2.45 2.2 1
2.5 – 2.9 2.45 – 2.95 2.7 4
3.0 – 3.4 2.95 – 3.45 3.2 15
3.5 – 3.9 3.45 – 3.95 3.7 10
4.0 – 4.4 3.95 – 4.45 4.2 5
4.5 – 4.9 4.45 – 4.95 4.7 3
26
Graphical Representation
Frequency Histogram
02468
10121416
1.45 –1.95
1.95–2.45
2.45 –2.95
2.95 –3.45
3.45 –3.95
3.95 –4.45
4.45 –4.95
Battery lives
Freq
uenc
y
27
General steps for Constructing FD
1. Decide number of class intervals (groups) required2. Determine the range3. Divide the range by no. of classes to estimate
approximate width of interval4. List lower class limit of bottom interval and lower class
boundary. Add lower class width to lower class boundary to get upper class boundary
5. List all the class limits and class boundaries by adding class width to the limits and boundaries of previous interval
6. Determine the class marks (midpoint) by averaging the class limits or class boundaries
7. Tally the frequencies for each class8. Sum the frequency column and check against total no.
of observations
28
PROBABILITY DISTRIBUTION
1) Discrete Distribution Specific values such as the
integers 0, 1, 2, 3 are used. Typical discrete probability
distributions are, binomial and Poisson.
29
Binomial Probability Distribution
Applicable to discrete probability problems that have an infinite number of items or that have a steady stream of items coming from a work center.
The binomial is applied to problems that have attributes such as conforming or nonconforming, success or failure, pass or fail and heads or tails.
It corresponds to successive terms in the binomial expansion which is,
(p+q)n = pn + npn-1 +[ n(n-1)/2]pn-2q2 +….+qn
Where p = probability of an event such as nonconforming unit (proportion nonconforming) q = 1- p = probability of a nonevent such as conforming unit (proportion conforming) n = number of trials or the sample size.
30
The binomial formula for a single term isP (d) = n! po
dqon-d
d! (n – d)!Where P (d) = probability of d nonconforming units
n = number in the sample d = number nonconforming in the sample. po = proportion (fraction) nonconforming in the
population qo = proportion (fraction) conforming (1-po) in
the population
31
Poisson Probability Distribution
Applicable to many situations that involve observations per unit of times or observations per unit of amount.
Applicable when n is quite large and Po is small. The formula for Poisson distribution is ;
P (c) = (nPo)c e-npo
c!Where c = count or number of events of a given
classification occurring in a sample, such as count of nonconformities, cars, customers or machine breakdowns. nPo = average count or average number of events of a given classification occurring in a sample. e = 2.718281
The Poisson distribution can be used as an approximation for the binomial in some situations, then the symbol c has the same meaning as d.
32
Probability Distribution
2) Continuous Distributions When measurable data such as
meters, kilograms and ohms are used.
Only the normal distribution is of sufficient importance in quality control.
33
Normal Probability Distribution
All normal distributions of continuous variables can be converted to the standardized normal distribution by using the standardized normal value, z.
The formula for the standardized normal curve is ;
f (z) = 1 e -z2/2 = 0.3989e –z2/2
√2πWhere π = 3.14159
e = 2.71828 z = xi – μ
σ
34
i. Relationship to the Mean and Standard Deviation
There is a definite relationship among the mean, the standard deviation and the normal curve.
μ,mean is the value at which the center of the mountain is located.
σ, is called standard deviation which is a lateral length of the mountain from the center at approximately ⅔ of its height.
The larger the standard deviation, the flatter the curve (data are widely dispersed) and the smaller the standard deviation, the more peaked the curve (data are narrowly dispersed).
If the standard deviation is zero, all values are identical to the mean and there is no curve. Refer to the figure below;
μ
σ
x0
Approx.⅔
35
ii. What is 4-sigma control A relationship exist between the standard deviation and
the area under the normal curve as shown in figure below
Its relation with ± σ tells that product within the range of finished μ±σ are 68.3% of all produced.
The relation of the mountain with ± 4σ means that products within the range of μ ± 4σ are 99.99% off all produced.
±σ(68.3%)
± 2σ(95%)
± 3σ (99.7%)
± 4σ (99.99%)
σ
x
Relation of σ with mountain
36
1) Hypothesis Testing• The hypothesis may be
concerned with a parameter or with the type population.
• We are concerned with one (or more) parameters and compare the observed sample statistics with the hypothesized parameter.
STATISTICAL DECISION MAKING
37
Element of Testing a Hypothesis on One Parameter, for example, µ
1. Basic assumptions are made which are assumed true and not open to question in the test. Commonly the type of population is assumed, for example, that is normal.
2. Although in a sense a null hypothesis is an assumption. Instead the hypothesis is under test and may be rejected, whereas we never use our test reject.
3. Rejecting a hypothesis when it is actually true is committing an error of the first kind.
4. Because of variability it is in general impossible or infeasible to make a test for which the probability α of an error of the first kind is zero. Nevertheless we do want to keep the risk α at some specified low value, perhaps 0.01.
38
5. A sample statistics (an estimator) for the parameter in question is chosen.
6. An alternative hypothesis is next chosen, containing other values of the parameter considered possible, or of economic or scientific interest.
7. The chosen risk α in 4, the type of alternative hypothesis in 6 and knowledge of the distribution of the statistic in 5 enables us to set a critical region or rejection region for the statistic. The critical region will have two parts for alternatives such as µ ≠ 100 but one part for those such as µ < 100 or µ > 100.
8. An error of the second kind is committed when we accept the null hypothesis when in fact it is not true. The probability of an error of the second kind is called β.
9. In general, it is well to draw an operating characteristic (oc) curve giving the probability of acceptance of the null-hypothesis for each value of the parameter.
39
STATISTICAL DECISION MAKING
2) Analysis of Variances Theoretical Formulas• If z = ax + by where a and b are constant
coefficients, the mean of z is given by z = ax +by
where x and y are the means of x and y.• If x and y are independent, the variance of z is given
by z
2 = a2 x2 + b2 y
2 where x2 and y
2 are the variances of x and y.
• The variance becomes a sum even when the particular case involves a difference of random variables. This characteristic is called the additivity of variances.
40
The Expectation and The Variance Of Sample Means.
When n measurements are taken from a population with population mean and population variance 2 and the values of the measurement are x1, x2,…, xn and their mean is y, then
y = [1/n] xi = [1/n] x1 +[ 1/n] x2 +….+[1/n] xn
The expectation and the variance of y are obtained by y = and y
2 = (1/n)2n 2 = 2/n.
This is a well known formula for the distribution of the sample mean.
41
When Random Variables Are Not Independent. The additivity of variances works well when the random
variables are mutually independent. Two random variables are said to be independent when
the value of one variable varies without any relation to the other variable.
If x and y are independent, the mean and the variance z will be z = x – y and z
2 = x2 + y
2 When x and y are not independent, the mean and the
variance of z = ax + by are given by z = ax + by and z
2 = a2x
2 + b2 y2 + 2ab x
y
is the correlation coefficient which shows the degree of relationship between two variables. The values of is between –1 and +1.
The stronger is the relationship between the two variables, the closer is the absolute value of to 1.
42
MODEL RELATIONSHIPS BETWEEN VARIABLES
1) Simple Linear Regression Such a straight line is generally called a
regression line, where y is the response variable (or dependent variable) and x is the explanatory (or independent) variable.
Also is a constant and is called a regression coefficient.
The quantitative way of grasping the relation between42 x and y in a regression form of x and y is called regression analysis.
43
Various Scatter Diagram Having The Same Regression Line.
Figure 1 Figure 2
Figure 3 Figure 4
44
Model Relationship Between Variables
2) Simple Linear Correlation Many types of scattering patterns and
some representative types are as follows;
Positive correlation Negative correlation
45
When y increases with x, this is a positive correlation and the opposite of the positive correlation, since as x increases, y decreases; this is called a negative correlation.
The method of judging the existence of correlation by making a scatter diagram and calculating the correlation coefficient is called correlation analysis.
For either correlation analysis or regression analysis, the starting point is a scatter diagram.