Lean Six Sigma New GB v 5.0 Analyze

TCS Internal

Lean Six Sigma Green Belt Training ANALYSE PHASE

22 November 2013 Copyright 2013 Tata Consultancy Services limited 2

DMAIC Roadmap

Define

Identify Project CTQs

Develop Project Charter

Prepare High Level Process map, SIPOC

Measure

Establish Performance

Standard

Assess Measurement

System Variation

Estimate Current

Capability

Identify Potential

Causes

Sampling & Data Collection

Analyze

Identify variation using Graphical

analysis

Prioritize & Validate causes

Improve

Define y= f (x)

Identify Solutions

Prioritize And

Implement Solutions

Control

Optimize & refine solutions

Control X's &

Monitor Y's

Measure actual benefits

Close &

Hand-over project

Understand As-Is

process

Complete Stakeholder

analysis

Measure improvements

Define Measure Analyze Improve Control


Why Analyze ?

To understand the problem and identify root causes

To avoid actions based on intuition, preconceived ideas & symptoms

To develop sustainable process improvements for long term benefits

Recalibrate project scope Establish performance goals for the

process

Find the Xs that affect Y most


Identify The Vital Few

Process Measures ( Xs)

Process

Input Measures

(Xs)

Outputs (Ys)

Variation in Output Y depends on process as well as input variables (Xs)


The Funnel Effect

Y = f(x1,x2,x3,x4,,xn)

Root-cause identification is the task of elimination

30+ variables

15-20 variables

10-15 variables

5-10 variables

3-5 variables

A N A L Y S E


Identify Variation using Graphical Tools

Validate Causes

Analyze Phase FLOW :

Box Plot

Hypothesis testing

Scatter Plot

Pareto Analysis


Source: Donald Wheeler: Understanding Variation

Why Graphical Analysis

Graphs help us understand the nature of variation Graphs make nature of data more accessible to the human mind

Graphs help display the context of the data Graphs should be the primary presentation tool in data analysis

If you cant show it graphically, you probably dont have a good conclusion


Box Plot

Purpose: To begin an

understanding of the distribution of the data

To get a quick, graphical comparison of two or more processes

When:

First stages of data analysis

* Outlier any point outside the lower or upper limit

Maximum Observation that falls within the upper limit = Q3 + 1.5 (Q3 - Q1)

75th Percentile (Q3) Median (50th Percentile)

25th Percentile (Q1)

Minimum Observation that falls within the lower limit = Q1 - 1.5 (Q3 - Q1)


Box Plot

Things to look for in a Box Plot: Are the boxes about equal or different? Do the groups appear normal (symmetrical box

halves and whiskers) or skewed? Are there outliers?

Op1 Cycl Op2 Cycl

0

10

20

Boxplots of Op1 Cycl and Op2 Cycl(means are indicated by solid circles)


Box Plot Example

Minitab Command: Graph > Box plot Graph > Histogram


Box Plot - Example

TAT- Agent 2TAT- Agent 1

70

60

50

40

30

20

Da

ta

Boxplot of TAT- Agent 1, TAT- Agent 2

7060504030

9

8

7

6

5

4

3

2

1

0

T A T - A gent 1

Fre

qu

en

cy

6050403020

10

8

6

4

2

0

T A T - A gent 2

Fre

qu

en

cy

Histogram of TAT- Agent 1 Histogram of TAT- Agent 2

Can you now interpret Box Plots?


Scatter Plot

Statistical significance of that strength is denoted by,

Coefficient of Correlation r

Scatter Plot tool can be used when

Both X and Y are in continuous format

If we want to associate Y with a single X

To judge the strength of relationship between Y and X


Correlation

r is always between 1 & +1.

Positive value of r means direction of movement in both variables is same

Negative value of r means direction of movement in both variables is inverse

Zero value of r means no correlation between the two variables

Higher the absolute value of r, stronger the correlation between Y & X


Correlation measures the linear association between the

Output (Y) and one input variable (X) only

y-e

ffect

x-cause

Positive Correlation

n=30 r=0.9

y-e

ffect

x-cause

Negative Correlation

n=30 r=-0.9

Positive Correlation May Be Present

n=30 r=0.6

Negative Correlation May Be Present

n=30 r=-0.6

No Linear Correlation

n=30 r=0.0

No Correlation

n=30 r=0.0

Types of Correlations


Scatter Plot & Correlation - Example

Minitab Command: Stat > Basic Statistics > Correlation Variables: On-boarding Test score & Floor Performance Score


Scatter Plot & Correlation - Example Minitab Command: Graph>Scatter plot Y variables: Floor performance Score, X variables: On-boarding Test scores


Scatter Plot & Correlation - Example

Minitab output: Correlations: On-boarding Test Score, Floor Performance Score Pearson correlation ( r ) of On-boarding Test Score and Floor Performance Score = 0.786

757065605550

100

95

90

85

80

75

70

On-boarding Test Score

Flo

or

Pe

rfo

rma

nce

Sco

re

Scatterplot of Floor Performance Score vs On-boarding Test Score

r value is indicating reasonably strong Positive Correlation.


Scatter Plot Vs Correlation Analysis

Scatter Plot Suggests relationship between two variables but does not quantifies

Correlation Analysis Quantifies strength or degree of relationship in terms of Correlation of Coefficient r


Pareto

What is it ?

The Pareto Principle states that only a "vital few" factors are responsible for producing most of the problems. This principle can be applied to quality improvement to the extent that a great majority of problems (80%) are produced by a few key causes (20%). If we correct these few key causes, we will have a greater probability of success.

Why use it ?

For the team to quickly focus its efforts on the key causes of a problem.

When to use it ?

Data is Discrete, i.e., Classified into types

with frequencies for each type.


Pareto - Example

Minitab Command: Stat > Quality Tools> Pareto Chart Chart defects table: Query Type for Labels in & Total received for Frequencies in


Pareto - Example

Which factors do you consider as vital from the above Pareto chart?

Total 247 234 8644116 3431 27092685 1749 800 506 269Percent 1.4 1.3 4.923.4 19.5 15.4 15.2 9.9 4.5 2.9 1.5Cum % 93.8 95.1100.023.4 42.9 58.2 73.5 83.4 88.0 90.8 92.4

Sub typeOt

her

ANON

YMOU

S

FAST

CAR

D &

PIN

PAYM

ENT

QUER

Y

CLAR

IFIC

ATIO

N MA

IL

CARD

S RE

LATE

D

FRAU

D RE

LATE

D

CARD

CAN

CELLAT

ION

CUST

OMER

INFO

RMAT

ION

DISP

UTE

ADDR

ESS CH

ANGE

20000

15000

10000

5000

0

100

80

60

40

20

0

Tota

l

Pe

rce

nt

Pareto Chart of No. of Queries rec'd


Identify Variation using

GraphicalTools

Validate Causes

Analyze Phase FLOW :

Box Plot

Hypothesis testing

Scatter Plot

Pareto Analysis


Measurements are organized into statistics to provide insight into spread, shape, consistency and location of the process

A hypothesis test is simply comparing reality to an assumption and asking Did things change ?

A hypothesis test is testing whether real data fits the model A hypothesis test is comparing statistic to a hypothesis

What is Hypothesis Testing


= mean of the population = standard deviation of the population x = mean of the sample s = standard deviation of the sample

Parameters: ,

Sampling From a Population

Entire Population

of Data Sample

Statistical Inference

Statistics: x, s, etc.

Analysis

Sampling saves costs and time. Sampling provides a good alternative to collecting all the data. Identifying a specific confidence level allows us to make reasonable

business decisions.

Statistical Inference Relies On Sampling From A Population Of Data

Core of Hypothesis Testing


Common terms in Hypothesis Testing

The Null Hypothesis (H0) There is no evidence of difference. It is assumed to be true unless proven otherwise. You never prove it, you only fail to reject it.

The Alternative Hypothesis (Ha) The statement that we would like to show is true. It usually defines the direction of desirable change. The alternative hypothesis

can be : >,


Hypothesis Testing P Value

P < : Reject Ho P > : Accept Ho

Alpha is the maximum acceptable probability of making type I error. (In other words, USL for type I error).

The p-value is the probability that you will be wrong if you select the alternative hypothesis. This is a Type I error.

For most decisions, acceptance level of a Type I error is set at = 0.05.

Thus, any p-value less than 0.05 means we reject the null hypothesis.


Hypothesis Testing Road Map

Determining statistical differences within and between

populations

Discrete data

Continuous data

Test of equal variances

2-sample t-test

ANOVA

Hypothesis testing

two samples

one sample

multiple samples

two sample

Comparing Means

Chi-square

test 1- sample t-test

Comparing Variances

Comparing Proportions

2- Proportio

n test

1-Proportio

n test


Process Scenarios for Hypothesis tests

Tool Process Scenarios

1 Sample t-test To compare a teams performance against target Data set containing performance scores like Daily/ Weekly scores Sample size can be less than 30 as well but higher is better.

2 Sample t-test To compare one teams performance against other or To compare performance of a team before and after improvement.

Data set containing performance scores like Daily/ Weekly scores Sample size can be less than 30 as well but higher is better.

ANOVA To compare performance of multiple teams on a metric like Quality score. Data set containing performance scores like Daily/ weekly scores of multiple teams.

Test of equal variances

To compare variance or Std deviation of one teams performance with another. Data set containing performance scores like Daily/ Weekly scores

1-Proportion test

To compare proportion defects/ defectives of a team against a target

2-Proportion test

To compare proportion defects/ defectives of a team against another team.

Chi-square test To check association between variables like whether there is any association between two teams w.r.t. their Error types.


Hypothesis Testing t-Test Procedure

t Test is mainly used to calculate differences in means. Theoretically t test can be used for even small sample sizes (as small as 10) when data is normally distributed.

Null hypothesis is averages of two groups are same.

Ho : 1 = 2 Ha : 1 >< 2


One Sample T-test

One sample T- test is used to compare the performance of a process with the set standard/ historical data/ target.

e.g. The historical average CSI of a process is 4.35. Process Manager is interested in understanding the present CSI based on the data collected in last 15 days.


One-sample T-test

Example: Organization ABC is measuring the no. of days to get money from XYZ

after invoices are sent. Historical data suggests that earlier payments were received within 25 days, however some improvement actions were implemented. Process wanted to check whether improvement plans have any impact on the performance.

The sample data was collected. The time taken for receiving the payments are : 22, 23, 22, 25, 28, 27, 28, 25, 23, 21 days.

Establish whether we get money in 25 days with 95 % Confidence.

Instructions Stat > Basic stat > 1 sample t

Enter data as: Variable C1 Days Test Mean: 25 , Alternative Not Equal


One-sample T-test using Minitab

Minitab Output

T-Test of the Mean

Test of mu = 25.000 vs mu not = 25.000

Variable N Mean StDev SE Mean T P

Days 10 24.400 2.591 0.819 -0.73 0.48

Interpretation: Since p > 0.05, the improvement plan did not make any difference in the process performance.

Stat > Basic Stat > 1 Sample t.

Since P is >0.05,

_________

Null Hypothesis


2 Sample T test is used for comparing the averages of 2 sets of readings

Test is used when the dependent variable (response or Y) is continuous and the independent variable (factor or X) is discrete.

Test can be performed on data from independent samples stacked in a single column with a second discrete variable in another column.

Variances may be equal or unequal.

The null hypothesis is that the sample means are not different.

H0: m1 = m2

Ha: m1 > < m2

Hypothesis Testing 2 Sample T Test



Example :

The time required for installing a software by new and experienced engineers is given below. Establish whether experienced engineers are better.

Experienced 15.80,14.19, 15.32, 14.65, 12.25, 15.42, 12.92, 13.98, 16.28,14.53

New 16.10, 17.24, 17.65, 16.8, 18.42, 18.12, 15.24, 16.14, 15.26, 14.65



Stat > Basic Statistics > 2 Sample t..


Hypothesis Testing 2 Sample T Test Two-Sample T-Test and CI: Experienced,New Two-sample T for Experienced vs New N Mean StDev SE Mean Experienced 10 14.53 1.26 0.40 New 10 16.56 1.29 0.41 Difference = mu (Experienced) - mu (New) Estimate for difference: -2.028 95% CI for difference: (-3.234, -0.822) T-Test of difference = 0 (vs not =): T-Value = -3.55 P-Value = 0.002 DF = 17 Two-Sample T-Test and CI: Experienced, New Two-sample T for Experienced vs New N Mean StDev SE Mean Experienced 10 14.53 1.26 0.40 New 10 16.56 1.29 0.41 Difference = mu (Experienced) - mu (New) Estimate for difference: -2.028 95% upper bound for difference: -1.034 T-Test of difference = 0 (vs


Analysis Of Variance (ANOVA)

One-way ANOVA is used to compare several sample means for two or more levels of a single factor (groups of data). In this sense, it is an extension of a two-sample t-test.

Comparing all groups at once with ANOVA is preferable to comparing two groups at a time with the two-sample t-test (pooled variance).

Hypothesis: H0: m1 = m2 = m3 = versus Ha: there is at least one difference


ANOVA Assumption

The purpose of one-way ANOVA is to compare means. The means of different groups of data can only be compared if the variances within each group are statistically the same.

ANOVA has two assumptions:

Data for each group should be normal The data sets have equal variances. H0: s12 = s22 = s32 = versus

Ha: there is at least one difference

Test of ANOVA is robust enough to give good result even if the assumptions are not met.


ANOVA: Example

A contact centre used to receive call for different processes within organization. The Contact Centre head wanted to understand whether the response time is affected by different processes.

Response time data was collected for the 3 processes for doing ANOVA analysis.

6.5 7 6

6 6 5.4

6 6.5 4.4

7 5.5 5.5

6.5 6 4.5

7 6.5 5

6 7 6

7.5 6 7

6 5.5 6 5.5 5 4.5

5.5 4 3.5

6 6.5 3.5

7 6 4

Process C Process B Process A


7.06.56.05.55.04.54.0

Median

Mean

6.506.256.005.755.50

1st Q uartile 5.5000Median 6.00003rd Q uartile 6.5000Maximum 7.0000

5.4611 6.4620

5.5000 6.5000

0.5939 1.3671

A -Squared 0.46P-V alue 0.212

Mean 5.9615StDev 0.8282V ariance 0.6859Skew ness -1.02716Kurtosis 1.44419N 13

M inimum 4.0000

A nderson-Darling Normality Test

95% C onfidence Interv al for Mean

95% C onfidence Interv al for Median

95% C onfidence Interv al for S tDev9 5 % Confidence Intervals

Summary for Process B

7.57.06.56.05.5

Median

Mean

7.06.86.66.46.26.0

1st Q uartile 6.0000M edian 6.00003rd Q uartile 7.0000M aximum 7.5000

5.9683 6.7240

6.0000 7.0000

0.4484 1.0322

A -Squared 0.59P -V alue 0.101

M ean 6.3462S tDev 0.6253V ariance 0.3910S kew ness 0.387879Kurtosis -0.844201N 13

M inimum 5.5000


95% C onfidence Interv al for M ean

95% C onfidence Interv al for M edian

95% C onfidence Interv al for S tDev9 5 % C onfidence Inter vals

Summary for Process C

7.06.56.05.55.04.54.0

Median

Mean

6.506.256.005.755.50

1st Q uartile 5.5000Median 6.00003rd Q uartile 6.5000Maximum 7.0000

5.4611 6.4620

5.5000 6.5000

0.5939 1.3671

A -Squared 0.46P-V alue 0.212

Mean 5.9615StDev 0.8282V ariance 0.6859Skewness -1.02716Kurtosis 1.44419N 13

Minimum 4.0000


95% C onfidence Interv al for Mean

95% C onfidence Interv al for Median

95% C onfidence Interv al for S tDev95 % Confidence Intervals

Summary for Process B

All three process response time data pass the normality test.

Even if the data is not normal, one can go ahead with test of equal variances.

ANOVA: Assumptions Testing

Stat > Basic Statistics > Graphical Summary


ANOVA: Assumptions Testing

Assumption Testing:

Variances testing requires stacked data.

Stat > ANOVA > Test for Equal Variances


ANOVA: Example

Test for Equal Variances: Stacked versus Process 95% Bonferroni confidence intervals for standard deviations Process N Lower StDev Upper Process A 13 0.717000 1.07094 2.00301 Process B 13 0.554475 0.82819 1.54898 Process C 13 0.418654 0.62532 1.16955 Bartlett's Test (Normal Distribution) Test statistic = 3.25, p-value = 0.197 Levene's Test (Any Continuous Distribution) Test statistic = 1.84, p-value = 0.173

Process C

Process B

Process A

2.001.751.501.251.000.750.50Pr

oces

s95% Bonferroni Confidence Intervals for StDevs

Test Statistic 3.25P-Value 0.197

Test Statistic 1.84P-Value 0.173

Bartlett's Test

Levene's Test

Test for Equal Variances for Stacked

Since p value > 0.05 through Bartletts test, data passes the test of equal variances assumption.


ANOVA: Example

Stat > ANOVA > One way (Unstacked)..


ANOVA: Example

One-way ANOVA: Process A, Process B, Process C Source DF SS MS F P Factor 2 12.043 6.022 8.12 0.001 Error 36 26.686 0.741 Total 38 38.729 S = 0.8610 R-Sq = 31.10% R-Sq(adj) = 27.27% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+----- Process A 13 5.0231 1.0709 (-------*-------) Process B 13 5.9615 0.8282 (-------*-------) Process C 13 6.3462 0.6253 (-------*-------) ----+---------+---------+---------+----- 4.80 5.40 6.00 6.60 Pooled StDev = 0.8610

Interpretation:

Since p < 0.05, the difference in the response time is significant and the process can be called a significant factor.


Proportion Testing

Proportion Testing is used to understand whether the proportion created by the factor level is significant.

It can be of 2 types:

One Proportion Test:

Ho : PA = P0

Ha : PA > = < P0

Ho : PA = PB

Ha : PA > = < PB

Two Proportion Test:


Proportion Testing Example

1 Proportion Test:

A HR Services complaints resolution process is meant for resolving the complaints raised by associates. The data provided in the table suggests the % of complaints resolved by the process within 8 Hrs of timeline. Process manager claims that the process is resolving at least 30% of the complaints on more than 80% of the occasions. Is it possible to use 1 P test for validating the claim of process manager ?


1 Proportion Test: Example

Day Complaints Resolved %1 252 353 304 365 326 337 348 369 28

10 3011 2912 3213 3114 2815 3516 2517 3518 3019 3620 3221 3322 3423 3624 2825 3026 2927 3228 3129 2830 35

Data Suggests:

Total no. of trials: 30

No. of events of complaints resolved >= 30% : 22

One Proportion Test:

Ho : PA = 0.8

Ha : PA > 0.8

Issues



Stat > Basic Statistics > 1 Proportion



Minitab Project Report Test and CI for One Proportion Test of p = 0.8 vs p > 0.8 95% Lower Exact Sample X N Sample p Bound P-Value 1 22 30 0.733333 0.570066 0.871

Interpretation:

Since p > 0.05 through 1 P test, it is not advisable to say that the team is resolving at least 30% of complaints per day more than 80% of the times. Process managers claim of providing resolution on more than 80% of the occasions is not valid.



Data Suggests:

Team Member A:

Total no. of invoices resolved: 60

Total no. of invoices without error: 32

Team Member B:

Total no. of invoices resolved: 65

Total no. of invoices without error: 48

2 Proportion Tests: In a invoice processing process, the process manager is thinking of

giving promotion to one of the team members A and B. For this he wants to look at the last 7days of invoices processed by them for getting a feel of better performer. Can you use 2P test for identifying better performer ?



Stat > Basic Statistics > 2 Proportion

Test and CI for Two Proportions Sample X N Sample p 1 32 60 0.533333 2 48 65 0.738462 Difference = p (1) - p (2) Estimate for difference: -0.205128 95% upper bound for difference: -0.0663404 Test for difference = 0 (vs < 0): Z = -2.43 P-Value = 0.008 Fisher's exact test: P-Value = 0.014

Interpretation:

Since p < 0.05 through 2 P test, the performance of Team Member A can be considered significantly less than performance of team member B. Hence process manager can select member B for promotion.


2 Proportion Test - Exercise

On auditing two pizza outlets, 7 deliveries were late out of 155 in first one and 22 deliveries were late out of 200 in the second one. Find with 99% of confidence if the two proportions are different.


Contingency Table

Contingency table is used when both output and input variables are attribute in nature. It uses Chi square test for reaching to the conclusion.

Chi Square Test:

Ho : Y is independent of X

Ha : Y is not independent of X


During a project for looking into the recruitment possibilities, the

Personnel Department wanted to understand whether the chances

of being hired is dependent upon the age of the person. Can the

linkage between age and chances of being hired be statistically

validated ?

Hypothesis:

Ho : Hiring of a person is independent of his/ her age

Ha : Hiring of a person is not independent of his/ her age

Contingency Table : Example


Hired Not Hired

Old 30

45

150

230 Young

Total

Total 75 380 455

275

180

Data was collected for all the candidates who were taken through the recruitment process in last one year.

Old: > 35 Years

Young:


Stat > Table > Chi Square Test

Each cell must have a count of >=5 for going ahead

with the test.

Contingency Table : Example


Chi-Square Test Expected counts are printed below observed counts Hired Not Hire Total 1 30 150 180 29.67 150.33 2 45 230 275 45.33 229.67 Total 75 380 455 Chi-Sq = 0.004 + 0.001 + 0.002 + 0.000 = 0.007 DF = 1, P-Value = 0.932

Interpretation:

Since p > 0.05 , the hiring of a candidate is not dependent upon his/ her age.

0 1 2 3 4 5

Contingency table generate numbers by calculating observed values and expected values. In a chi square distribution, If there is independence, we expect the difference to be close to 0. The further away we are, the more likely the variables are dependent. To help us make that decision, we only need to look at p value.

Contingency Table: Analysis in Minitab


58

Are ladies more likely to be right handed compared to gentlemen?

Hypothesis:

Ho : There is no relationship between gender & dexterity

Ha : There is a relationship between gender & dexterity

Chi Square Test : Exercise


End of Analyse Phase

Lean Six Sigma Green Belt Training ANALYSE PHASE Slide Number 2Slide Number 3 Identify The Vital FewSlide Number 5 Analyze Phase FLOW : Why Graphical AnalysisBox PlotBox Plot Box Plot Example Box Plot - ExampleScatter Plot CorrelationSlide Number 14Scatter Plot & Correlation - ExampleScatter Plot & Correlation - ExampleScatter Plot & Correlation - ExampleScatter Plot Vs Correlation Analysis ParetoPareto - Example Pareto - Example Analyze Phase FLOW : Slide Number 23Statistical Inference Relies On Sampling From A Population Of Data Common terms in Hypothesis Testing Hypothesis Testing P ValueHypothesis Testing Road MapProcess Scenarios for Hypothesis tests Hypothesis Testing t-Test Procedure One Sample T-test One-sample T-test One-sample T-test using MinitabSlide Number 33 Hypothesis Testing 2 Sample T Test Hypothesis Testing 2 Sample T Test Hypothesis Testing 2 Sample T Test Analysis Of Variance (ANOVA) ANOVA Assumption ANOVA: Example ANOVA: Assumptions Testing ANOVA: Assumptions Testing ANOVA: Example ANOVA: Example ANOVA: ExampleProportion Testing Proportion Testing Example1 Proportion Test: Example1 Proportion Test: Example1 Proportion Test: Example2 Proportion Test: Example2 Proportion Test: Example2 Proportion Test - Exercise Contingency Table Contingency Table : ExampleSlide Number 55Slide Number 56Slide Number 57Slide Number 58End of Analyse Phase

Documents

Lean Six Sigma New GB v 5.0 Analyze