Download ppt - Certified Quality Engineer Programme (CQE)

Certified Quality Engineer Programme (CQE)

Module 6 Quantitative Methods Part 1ByAssociate Professor Dr Sha’ri M. YusofFaculty of Mechanical EngineeringUniversiti Teknologi Malaysia, Skudai, Johor

2

Basic Concepts Of Probability

Probability is a measure that describes the chance that an event will occur.

Dimensionless number ranges from zero to one - with 0 meaning an impossible event and 1 refer to event that is certain to occur.

Probability of 0.5 means the event is just as likely to occur as not.

3

Basic Concepts Of Statistics The word statistics has two

generally accepted meaning: A collection of quantitative data

pertaining to any subject or group, especially when the data are systematically gathered and collated.

The science that deals with the collection, tabulation, analysis, interpretation and presentation of quantitative data.

4

Basic Concepts Of Statistics

The use of statistics in quality engineering deals with the second meaning and involves

Collecting Tabulating Analyzing Interpreting Presenting data

5

Collecting And Summarizing Data

Descriptive Statistics to describe and analyze a subject or

group analytical techniques summarize

data by computing a measure of central tendency a measure of the dispersion.

6

Measure of Central Tendency

A measure of central tendency of a distribution is a number that describes the central position of the data or how the data tend to build up in the center.

Three measures commonly used : 1) average2) median3) mode

7

Average

It is the sum of the all the observations divided by the number of observations

3 different techniques available for calculating the average

1) ungrouped data2) grouped data3) weighted average

8

Average

Ungrouped data.This method is used when the data are unorganized.The average is represented by the symbol x, which is read as “x bar” and is given by the formula;

x = xi / n = (x1 + x2 +….+xn)/nwhere x = averagen = observed values

x1, x2,...,xn = observed value identified by the subscripts 1,2,..n or general

subscript i = symbol meaning “sum of ”

9

ExampleA food inspector examined a random

sample of 7 cans of tuna to determine the percent of foreign impurities. The following data were recorded :

1.8, 2.1, 1.7, 1.6, 0.9, 2.7 and 1.8Compute the sample mean.x = xi / n =

(1.8+2.1+1.7+1.6+0.9+2.7+1.8)/7 = 1.8% impurities

10

Exercise

In studying the drying time of a new acrylic paint, the data in hours, were coded by subtracting 5.0 from the observation.

Find the sample mean and sample standard deviation (s) for the drying times of 10 panels of wood using the paint if the coded measurements are :1.4 , 0.8, 2.4, 0.5, 1.3, 2.8, 3.6, 3.2, 2.0,

1.9

11

Grouped data.When data have been grouped into frequency distribution, the following technique is applicable. Formula for the average of grouped data x = (fiXi)/n = (fiX1 + f2X2 + …+fhXh) / (f1 + f2+…+fh )

where n = sum of the frequency

fi = frequency in a cell or frequency of an observed value

xi = cell midpoint or an observed value

h = no. of cells or no. of observed values

12

Example

Frequency Distribution for Weights of 50 componentsClass

IntervalWeight

(g)

Class Boundary

Class

mid-point (xi)

No of pieces

(fi)fixi fixi

2

7 – 9 6.5 –9.5 8 2

10 – 12 9.5 – 12.5 11 8

13 – 15 12.5 – 15.5 14 14

16 – 18 15.5 – 18.5 17 19

19- 21 18.5 – 21.5 20 7

Totals () 50

13

Weighted average

When a number of averages combined with different frequencies, a weighted average can be computed

The formula for the weighted average is given by :xw = wixi

wi

where xw = weighted average

wi = weight of the i th average

14

Example – weighted average

On a trip a family bought 21.3 litres of gasoline at 1.21 per litre, 18.7 litres at 1.29 cents per litre, and 23.5 litres at 1.25. Find the mean price per litre.

15

Median

Median is the middle value for a set of data arranged in an increasing or decreasing order

Case 1 - when the number of data in the series is odd – middle value

Case 2 - when the number of data is even - median is the average of the two middle numbers

Example (case 1) – 5 test results 82, 93, 86, 92, 79 What is the median? Arrange data. Answer = 86 Example (Case 2) – The nicotine contents for a random

sample of 6 cigarettes of a certain brand are found to be 2.3, 2.7, 2.5, 2.9, 3.1 and 1.9

If we arrange in increasing order of magnitude , we get = 1.9 2.3 2.5 2.7 2.9 3.1 , and the median is the mean of 2.5 and 2.7.

Therefore, x = (2.5+2.7)/2 = 2.6 milligrams

16

MedianGrouped Data When data grouped into frequency distribution, the median is obtained by finding the cell that has the middle number and then interpolate within the cell.Formula for computing median :

Md = Lm + I

where Md = medianLm = lower boundary of the cell with the median

n = total no. of observationsfm = frequency of median cell

cfm = cumulative frequency of all cells below Lm

I = cell interval

n/2 –cfm

fm

17

Mode Mode of a set of numbers (data) is the value that

occurs with the highest frequency Possible for mode to be nonexistent in a series of

numbers or to have more than one value. A series of numbers is referred to as unimodal if

it has one mode, bimodal if it has two modes and multimodal if there are more than two modes.

Data grouped into frequency distribution, the midpoint of the cell with the highest frequency is the mode, since this point represents the highest point (highest frequency) of the histogram

18

Measure of Dispersion

Measures of dispersion describe how the data are spread out from the average or scattered on each side of the central value.

Common measures Range (simplest) Standard deviation Variance

19

Range Range of a series of numbers is the

difference between the largest and smallest values or observations.

R = Xh - Xl

where R = rangeXh = highest observation in a series

Xl = lowest observation in a series Example – The temperature for a process

recorded 40.2 , 38.7, 42.5, 39.6, 40.9. What is the value of range?

20

Standard Deviation Standard deviation - numerical value in the units of the

observed values that measures the spreading (variation) of the data.

Large standard deviation - greater variability of the data than smaller standard deviation, given by the formula:

s = (xi – x)2 / (n-1)

where s = sample standard deviation

xi = observed value ith

x = average

n = number of data (observed values) It is reference value that measures the dispersion in the

data

21

Exercise

A car manufacturer tested a random sample of 10 steel-belted tyres of a certain brand and recorded the following tread wear: 48000, 53000, 45000, 61000, 59000, 56000, 63000, 49000, 53000 and 54000 kilometers. Find the standard deviation of this set of data.

22

Collecting AndSummarizing Data

Consider the data below which represents the lives of 40 similar car batteries recorded to nearest tenth of a year. What can you learn from these numbers?

2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6

3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7

2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1

3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4

4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5

23

Frequency distribution Group large number of data into different

classes (groups) and determining the number of observations that fall into each group

Decide no of classes – too few lose information, too many also no meaning

Usually choose 5 – 20 classes Let us choose 7 classes – class width must be

enough to put in all the data Approximate width – find range divide by no of

classes = (4.7-1.6)/7 = 0.443 should have same no of significant places as data, therefore choose the value 0.5

24

Frequency distribution Decide where to start bottom interval – start at

1.5 and lower boundary is 1.45. Then add width 1.45 +0.5 = 1.95 continue for the others

Midpoint is (1.5+1.9)/2 = 1.7 Count the no of observations and record in the

table Total the frequency to check all data has been

counted

25

Frequency distribution

Class interval

Class boundaries

Class midpoint

Frequency

1.5 – 1.9 1.45 – 1.95 1.7 2

2.0 – 2.4 1.95 –2.45 2.2 1

2.5 – 2.9 2.45 – 2.95 2.7 4

3.0 – 3.4 2.95 – 3.45 3.2 15

3.5 – 3.9 3.45 – 3.95 3.7 10

4.0 – 4.4 3.95 – 4.45 4.2 5

4.5 – 4.9 4.45 – 4.95 4.7 3

26

Graphical Representation

Frequency Histogram

02468

10121416

1.45 –1.95

1.95–2.45

2.45 –2.95

2.95 –3.45

3.45 –3.95

3.95 –4.45

4.45 –4.95

Battery lives

Freq

uenc

y

27

General steps for Constructing FD

1. Decide number of class intervals (groups) required2. Determine the range3. Divide the range by no. of classes to estimate

approximate width of interval4. List lower class limit of bottom interval and lower class

boundary. Add lower class width to lower class boundary to get upper class boundary

5. List all the class limits and class boundaries by adding class width to the limits and boundaries of previous interval

6. Determine the class marks (midpoint) by averaging the class limits or class boundaries

7. Tally the frequencies for each class8. Sum the frequency column and check against total no.

of observations

28

PROBABILITY DISTRIBUTION

1) Discrete Distribution Specific values such as the

integers 0, 1, 2, 3 are used. Typical discrete probability

distributions are, binomial and Poisson.

29

Binomial Probability Distribution

Applicable to discrete probability problems that have an infinite number of items or that have a steady stream of items coming from a work center.

The binomial is applied to problems that have attributes such as conforming or nonconforming, success or failure, pass or fail and heads or tails.

It corresponds to successive terms in the binomial expansion which is,

(p+q)n = pn + npn-1 +[ n(n-1)/2]pn-2q2 +….+qn

Where p = probability of an event such as nonconforming unit (proportion nonconforming) q = 1- p = probability of a nonevent such as conforming unit (proportion conforming) n = number of trials or the sample size.

30

The binomial formula for a single term isP (d) = n! po

dqon-d

d! (n – d)!Where P (d) = probability of d nonconforming units

n = number in the sample d = number nonconforming in the sample. po = proportion (fraction) nonconforming in the

population qo = proportion (fraction) conforming (1-po) in

the population

31

Poisson Probability Distribution

Applicable to many situations that involve observations per unit of times or observations per unit of amount.

Applicable when n is quite large and Po is small. The formula for Poisson distribution is ;

P (c) = (nPo)c e-npo

c!Where c = count or number of events of a given

classification occurring in a sample, such as count of nonconformities, cars, customers or machine breakdowns. nPo = average count or average number of events of a given classification occurring in a sample. e = 2.718281

The Poisson distribution can be used as an approximation for the binomial in some situations, then the symbol c has the same meaning as d.

32

Probability Distribution

2) Continuous Distributions When measurable data such as

meters, kilograms and ohms are used.

Only the normal distribution is of sufficient importance in quality control.

33

Normal Probability Distribution

All normal distributions of continuous variables can be converted to the standardized normal distribution by using the standardized normal value, z.

The formula for the standardized normal curve is ;

f (z) = 1 e -z2/2 = 0.3989e –z2/2

√2πWhere π = 3.14159

e = 2.71828 z = xi – μ

σ

34

i. Relationship to the Mean and Standard Deviation

There is a definite relationship among the mean, the standard deviation and the normal curve.

μ,mean is the value at which the center of the mountain is located.

σ, is called standard deviation which is a lateral length of the mountain from the center at approximately ⅔ of its height.

The larger the standard deviation, the flatter the curve (data are widely dispersed) and the smaller the standard deviation, the more peaked the curve (data are narrowly dispersed).

If the standard deviation is zero, all values are identical to the mean and there is no curve. Refer to the figure below;

μ

σ

x0

Approx.⅔

35

ii. What is 4-sigma control A relationship exist between the standard deviation and

the area under the normal curve as shown in figure below

Its relation with ± σ tells that product within the range of finished μ±σ are 68.3% of all produced.

The relation of the mountain with ± 4σ means that products within the range of μ ± 4σ are 99.99% off all produced.

±σ(68.3%)

± 2σ(95%)

± 3σ (99.7%)

± 4σ (99.99%)

σ

x

Relation of σ with mountain

36

1) Hypothesis Testing• The hypothesis may be

concerned with a parameter or with the type population.

• We are concerned with one (or more) parameters and compare the observed sample statistics with the hypothesized parameter.

STATISTICAL DECISION MAKING

37

Element of Testing a Hypothesis on One Parameter, for example, µ

1. Basic assumptions are made which are assumed true and not open to question in the test. Commonly the type of population is assumed, for example, that is normal.

2. Although in a sense a null hypothesis is an assumption. Instead the hypothesis is under test and may be rejected, whereas we never use our test reject.

3. Rejecting a hypothesis when it is actually true is committing an error of the first kind.

4. Because of variability it is in general impossible or infeasible to make a test for which the probability α of an error of the first kind is zero. Nevertheless we do want to keep the risk α at some specified low value, perhaps 0.01.

38

5. A sample statistics (an estimator) for the parameter in question is chosen.

6. An alternative hypothesis is next chosen, containing other values of the parameter considered possible, or of economic or scientific interest.

7. The chosen risk α in 4, the type of alternative hypothesis in 6 and knowledge of the distribution of the statistic in 5 enables us to set a critical region or rejection region for the statistic. The critical region will have two parts for alternatives such as µ ≠ 100 but one part for those such as µ < 100 or µ > 100.

8. An error of the second kind is committed when we accept the null hypothesis when in fact it is not true. The probability of an error of the second kind is called β.

9. In general, it is well to draw an operating characteristic (oc) curve giving the probability of acceptance of the null-hypothesis for each value of the parameter.

39

STATISTICAL DECISION MAKING

2) Analysis of Variances Theoretical Formulas• If z = ax + by where a and b are constant

coefficients, the mean of z is given by z = ax +by

where x and y are the means of x and y.• If x and y are independent, the variance of z is given

by z

2 = a2 x2 + b2 y

2 where x2 and y

2 are the variances of x and y.

• The variance becomes a sum even when the particular case involves a difference of random variables. This characteristic is called the additivity of variances.

40

The Expectation and The Variance Of Sample Means.

When n measurements are taken from a population with population mean and population variance 2 and the values of the measurement are x1, x2,…, xn and their mean is y, then

y = [1/n] xi = [1/n] x1 +[ 1/n] x2 +….+[1/n] xn

The expectation and the variance of y are obtained by y = and y

2 = (1/n)2n 2 = 2/n.

This is a well known formula for the distribution of the sample mean.

41

When Random Variables Are Not Independent. The additivity of variances works well when the random

variables are mutually independent. Two random variables are said to be independent when

the value of one variable varies without any relation to the other variable.

If x and y are independent, the mean and the variance z will be z = x – y and z

2 = x2 + y

2 When x and y are not independent, the mean and the

variance of z = ax + by are given by z = ax + by and z

2 = a2x

2 + b2 y2 + 2ab x

y

is the correlation coefficient which shows the degree of relationship between two variables. The values of is between –1 and +1.

The stronger is the relationship between the two variables, the closer is the absolute value of to 1.

42

MODEL RELATIONSHIPS BETWEEN VARIABLES

1) Simple Linear Regression Such a straight line is generally called a

regression line, where y is the response variable (or dependent variable) and x is the explanatory (or independent) variable.

Also is a constant and is called a regression coefficient.

The quantitative way of grasping the relation between42 x and y in a regression form of x and y is called regression analysis.

43

Various Scatter Diagram Having The Same Regression Line.

Figure 1 Figure 2

Figure 3 Figure 4

44

Model Relationship Between Variables

2) Simple Linear Correlation Many types of scattering patterns and

some representative types are as follows;

Positive correlation Negative correlation

45

When y increases with x, this is a positive correlation and the opposite of the positive correlation, since as x increases, y decreases; this is called a negative correlation.

The method of judging the existence of correlation by making a scatter diagram and calculating the correlation coefficient is called correlation analysis.

For either correlation analysis or regression analysis, the starting point is a scatter diagram.