Upload
ravi-kumar
View
11
Download
2
Embed Size (px)
DESCRIPTION
six sigma
Citation preview
TCS Internal
Lean Six Sigma Green Belt Training ANALYSE PHASE
22 November 2013 Copyright 2013 Tata Consultancy Services limited 2
DMAIC Roadmap
Define
Identify Project CTQs
Develop Project Charter
Prepare High Level Process map, SIPOC
Measure
Establish Performance
Standard
Assess Measurement
System Variation
Estimate Current
Capability
Identify Potential
Causes
Sampling & Data Collection
Analyze
Identify variation using Graphical
analysis
Prioritize & Validate causes
Improve
Define y= f (x)
Identify Solutions
Prioritize And
Implement Solutions
Control
Optimize & refine solutions
Control X's &
Monitor Y's
Measure actual benefits
Close &
Hand-over project
Understand As-Is
process
Complete Stakeholder
analysis
Measure improvements
Define Measure Analyze Improve Control
22 November 2013 Copyright 2013 Tata Consultancy Services limited 3
Why Analyze ?
To understand the problem and identify root causes
To avoid actions based on intuition, preconceived ideas & symptoms
To develop sustainable process improvements for long term benefits
Recalibrate project scope Establish performance goals for the
process
Find the Xs that affect Y most
22 November 2013 Copyright 2013 Tata Consultancy Services limited 4
Identify The Vital Few
Process Measures ( Xs)
Process
Input Measures
(Xs)
Outputs (Ys)
Variation in Output Y depends on process as well as input variables (Xs)
22 November 2013 Copyright 2013 Tata Consultancy Services limited 5
The Funnel Effect
Y = f(x1,x2,x3,x4,,xn)
Root-cause identification is the task of elimination
30+ variables
15-20 variables
10-15 variables
5-10 variables
3-5 variables
A N A L Y S E
22 November 2013 Copyright 2013 Tata Consultancy Services limited 6
Identify Variation using Graphical Tools
Validate Causes
Analyze Phase FLOW :
Box Plot
Hypothesis testing
Scatter Plot
Pareto Analysis
22 November 2013 Copyright 2013 Tata Consultancy Services limited 7
Source: Donald Wheeler: Understanding Variation
Why Graphical Analysis
Graphs help us understand the nature of variation Graphs make nature of data more accessible to the human mind
Graphs help display the context of the data Graphs should be the primary presentation tool in data analysis
If you cant show it graphically, you probably dont have a good conclusion
22 November 2013 Copyright 2013 Tata Consultancy Services limited 8
Box Plot
Purpose: To begin an
understanding of the distribution of the data
To get a quick, graphical comparison of two or more processes
When:
First stages of data analysis
* Outlier any point outside the lower or upper limit
Maximum Observation that falls within the upper limit = Q3 + 1.5 (Q3 - Q1)
75th Percentile (Q3) Median (50th Percentile)
25th Percentile (Q1)
Minimum Observation that falls within the lower limit = Q1 - 1.5 (Q3 - Q1)
22 November 2013 Copyright 2013 Tata Consultancy Services limited 9
Box Plot
Things to look for in a Box Plot: Are the boxes about equal or different? Do the groups appear normal (symmetrical box
halves and whiskers) or skewed? Are there outliers?
Op1 Cycl Op2 Cycl
0
10
20
Boxplots of Op1 Cycl and Op2 Cycl(means are indicated by solid circles)
22 November 2013 Copyright 2013 Tata Consultancy Services limited 10
Box Plot Example
Minitab Command: Graph > Box plot Graph > Histogram
22 November 2013 Copyright 2013 Tata Consultancy Services limited 11
Box Plot - Example
TAT- Agent 2TAT- Agent 1
70
60
50
40
30
20
Da
ta
Boxplot of TAT- Agent 1, TAT- Agent 2
7060504030
9
8
7
6
5
4
3
2
1
0
T A T - A gent 1
Fre
qu
en
cy
6050403020
10
8
6
4
2
0
T A T - A gent 2
Fre
qu
en
cy
Histogram of TAT- Agent 1 Histogram of TAT- Agent 2
Can you now interpret Box Plots?
22 November 2013 Copyright 2013 Tata Consultancy Services limited 12
Scatter Plot
Statistical significance of that strength is denoted by,
Coefficient of Correlation r
Scatter Plot tool can be used when
Both X and Y are in continuous format
If we want to associate Y with a single X
To judge the strength of relationship between Y and X
22 November 2013 Copyright 2013 Tata Consultancy Services limited 13
Correlation
r is always between 1 & +1.
Positive value of r means direction of movement in both variables is same
Negative value of r means direction of movement in both variables is inverse
Zero value of r means no correlation between the two variables
Higher the absolute value of r, stronger the correlation between Y & X
22 November 2013 Copyright 2013 Tata Consultancy Services limited 14
Correlation measures the linear association between the
Output (Y) and one input variable (X) only
y-e
ffect
x-cause
Positive Correlation
n=30 r=0.9
y-e
ffect
x-cause
Negative Correlation
n=30 r=-0.9
Positive Correlation May Be Present
n=30 r=0.6
Negative Correlation May Be Present
n=30 r=-0.6
No Linear Correlation
n=30 r=0.0
No Correlation
n=30 r=0.0
Types of Correlations
22 November 2013 Copyright 2013 Tata Consultancy Services limited 15
Scatter Plot & Correlation - Example
Minitab Command: Stat > Basic Statistics > Correlation Variables: On-boarding Test score & Floor Performance Score
22 November 2013 Copyright 2013 Tata Consultancy Services limited 16
Scatter Plot & Correlation - Example Minitab Command: Graph>Scatter plot Y variables: Floor performance Score, X variables: On-boarding Test scores
22 November 2013 Copyright 2013 Tata Consultancy Services limited 17
Scatter Plot & Correlation - Example
Minitab output: Correlations: On-boarding Test Score, Floor Performance Score Pearson correlation ( r ) of On-boarding Test Score and Floor Performance Score = 0.786
757065605550
100
95
90
85
80
75
70
On-boarding Test Score
Flo
or
Pe
rfo
rma
nce
Sco
re
Scatterplot of Floor Performance Score vs On-boarding Test Score
r value is indicating reasonably strong Positive Correlation.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 18
Scatter Plot Vs Correlation Analysis
Scatter Plot Suggests relationship between two variables but does not quantifies
Correlation Analysis Quantifies strength or degree of relationship in terms of Correlation of Coefficient r
22 November 2013 Copyright 2013 Tata Consultancy Services limited 19
Pareto
What is it ?
The Pareto Principle states that only a "vital few" factors are responsible for producing most of the problems. This principle can be applied to quality improvement to the extent that a great majority of problems (80%) are produced by a few key causes (20%). If we correct these few key causes, we will have a greater probability of success.
Why use it ?
For the team to quickly focus its efforts on the key causes of a problem.
When to use it ?
Data is Discrete, i.e., Classified into types
with frequencies for each type.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 20
Pareto - Example
Minitab Command: Stat > Quality Tools> Pareto Chart Chart defects table: Query Type for Labels in & Total received for Frequencies in
22 November 2013 Copyright 2013 Tata Consultancy Services limited 21
Pareto - Example
Which factors do you consider as vital from the above Pareto chart?
Total 247 234 8644116 3431 27092685 1749 800 506 269Percent 1.4 1.3 4.923.4 19.5 15.4 15.2 9.9 4.5 2.9 1.5Cum % 93.8 95.1100.023.4 42.9 58.2 73.5 83.4 88.0 90.8 92.4
Sub typeOt
her
ANON
YMOU
S
FAST
CAR
D &
PIN
PAYM
ENT
QUER
Y
CLAR
IFIC
ATIO
N MA
IL
CARD
S RE
LATE
D
FRAU
D RE
LATE
D
CARD
CAN
CELLAT
ION
CUST
OMER
INFO
RMAT
ION
DISP
UTE
ADDR
ESS CH
ANGE
20000
15000
10000
5000
0
100
80
60
40
20
0
Tota
l
Pe
rce
nt
Pareto Chart of No. of Queries rec'd
22 November 2013 Copyright 2013 Tata Consultancy Services limited 22
Identify Variation using
GraphicalTools
Validate Causes
Analyze Phase FLOW :
Box Plot
Hypothesis testing
Scatter Plot
Pareto Analysis
22 November 2013 Copyright 2013 Tata Consultancy Services limited 23
Measurements are organized into statistics to provide insight into spread, shape, consistency and location of the process
A hypothesis test is simply comparing reality to an assumption and asking Did things change ?
A hypothesis test is testing whether real data fits the model A hypothesis test is comparing statistic to a hypothesis
What is Hypothesis Testing
22 November 2013 Copyright 2013 Tata Consultancy Services limited 24
= mean of the population = standard deviation of the population x = mean of the sample s = standard deviation of the sample
Parameters: ,
Sampling From a Population
Entire Population
of Data Sample
Statistical Inference
Statistics: x, s, etc.
Analysis
Sampling saves costs and time. Sampling provides a good alternative to collecting all the data. Identifying a specific confidence level allows us to make reasonable
business decisions.
Statistical Inference Relies On Sampling From A Population Of Data
Core of Hypothesis Testing
22 November 2013 Copyright 2013 Tata Consultancy Services limited 25
Common terms in Hypothesis Testing
The Null Hypothesis (H0) There is no evidence of difference. It is assumed to be true unless proven otherwise. You never prove it, you only fail to reject it.
The Alternative Hypothesis (Ha) The statement that we would like to show is true. It usually defines the direction of desirable change. The alternative hypothesis
can be : >,
22 November 2013 Copyright 2013 Tata Consultancy Services limited 26
Hypothesis Testing P Value
P < : Reject Ho P > : Accept Ho
Alpha is the maximum acceptable probability of making type I error. (In other words, USL for type I error).
The p-value is the probability that you will be wrong if you select the alternative hypothesis. This is a Type I error.
For most decisions, acceptance level of a Type I error is set at = 0.05.
Thus, any p-value less than 0.05 means we reject the null hypothesis.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 27
Hypothesis Testing Road Map
Determining statistical differences within and between
populations
Discrete data
Continuous data
Test of equal variances
2-sample t-test
ANOVA
Hypothesis testing
two samples
one sample
multiple samples
two sample
Comparing Means
Chi-square
test 1- sample t-test
Comparing Variances
Comparing Proportions
2- Proportio
n test
1-Proportio
n test
22 November 2013 Copyright 2013 Tata Consultancy Services limited 28
Process Scenarios for Hypothesis tests
Tool Process Scenarios
1 Sample t-test To compare a teams performance against target Data set containing performance scores like Daily/ Weekly scores Sample size can be less than 30 as well but higher is better.
2 Sample t-test To compare one teams performance against other or To compare performance of a team before and after improvement.
Data set containing performance scores like Daily/ Weekly scores Sample size can be less than 30 as well but higher is better.
ANOVA To compare performance of multiple teams on a metric like Quality score. Data set containing performance scores like Daily/ weekly scores of multiple teams.
Test of equal variances
To compare variance or Std deviation of one teams performance with another. Data set containing performance scores like Daily/ Weekly scores
1-Proportion test
To compare proportion defects/ defectives of a team against a target
2-Proportion test
To compare proportion defects/ defectives of a team against another team.
Chi-square test To check association between variables like whether there is any association between two teams w.r.t. their Error types.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 29
Hypothesis Testing t-Test Procedure
t Test is mainly used to calculate differences in means. Theoretically t test can be used for even small sample sizes (as small as 10) when data is normally distributed.
Null hypothesis is averages of two groups are same.
Ho : 1 = 2 Ha : 1 >< 2
22 November 2013 Copyright 2013 Tata Consultancy Services limited 30
One Sample T-test
One sample T- test is used to compare the performance of a process with the set standard/ historical data/ target.
e.g. The historical average CSI of a process is 4.35. Process Manager is interested in understanding the present CSI based on the data collected in last 15 days.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 31
One-sample T-test
Example: Organization ABC is measuring the no. of days to get money from XYZ
after invoices are sent. Historical data suggests that earlier payments were received within 25 days, however some improvement actions were implemented. Process wanted to check whether improvement plans have any impact on the performance.
The sample data was collected. The time taken for receiving the payments are : 22, 23, 22, 25, 28, 27, 28, 25, 23, 21 days.
Establish whether we get money in 25 days with 95 % Confidence.
Instructions Stat > Basic stat > 1 sample t
Enter data as: Variable C1 Days Test Mean: 25 , Alternative Not Equal
22 November 2013 Copyright 2013 Tata Consultancy Services limited 32
One-sample T-test using Minitab
Minitab Output
T-Test of the Mean
Test of mu = 25.000 vs mu not = 25.000
Variable N Mean StDev SE Mean T P
Days 10 24.400 2.591 0.819 -0.73 0.48
Interpretation: Since p > 0.05, the improvement plan did not make any difference in the process performance.
Stat > Basic Stat > 1 Sample t.
Since P is >0.05,
_________
Null Hypothesis
22 November 2013 Copyright 2013 Tata Consultancy Services limited 33
2 Sample T test is used for comparing the averages of 2 sets of readings
Test is used when the dependent variable (response or Y) is continuous and the independent variable (factor or X) is discrete.
Test can be performed on data from independent samples stacked in a single column with a second discrete variable in another column.
Variances may be equal or unequal.
The null hypothesis is that the sample means are not different.
H0: m1 = m2
Ha: m1 > < m2
Hypothesis Testing 2 Sample T Test
22 November 2013 Copyright 2013 Tata Consultancy Services limited 34
Hypothesis Testing 2 Sample T Test
Example :
The time required for installing a software by new and experienced engineers is given below. Establish whether experienced engineers are better.
Experienced 15.80,14.19, 15.32, 14.65, 12.25, 15.42, 12.92, 13.98, 16.28,14.53
New 16.10, 17.24, 17.65, 16.8, 18.42, 18.12, 15.24, 16.14, 15.26, 14.65
22 November 2013 Copyright 2013 Tata Consultancy Services limited 35
Hypothesis Testing 2 Sample T Test
Stat > Basic Statistics > 2 Sample t..
22 November 2013 Copyright 2013 Tata Consultancy Services limited 36
Hypothesis Testing 2 Sample T Test Two-Sample T-Test and CI: Experienced,New Two-sample T for Experienced vs New N Mean StDev SE Mean Experienced 10 14.53 1.26 0.40 New 10 16.56 1.29 0.41 Difference = mu (Experienced) - mu (New) Estimate for difference: -2.028 95% CI for difference: (-3.234, -0.822) T-Test of difference = 0 (vs not =): T-Value = -3.55 P-Value = 0.002 DF = 17 Two-Sample T-Test and CI: Experienced, New Two-sample T for Experienced vs New N Mean StDev SE Mean Experienced 10 14.53 1.26 0.40 New 10 16.56 1.29 0.41 Difference = mu (Experienced) - mu (New) Estimate for difference: -2.028 95% upper bound for difference: -1.034 T-Test of difference = 0 (vs
22 November 2013 Copyright 2013 Tata Consultancy Services limited 37
Analysis Of Variance (ANOVA)
One-way ANOVA is used to compare several sample means for two or more levels of a single factor (groups of data). In this sense, it is an extension of a two-sample t-test.
Comparing all groups at once with ANOVA is preferable to comparing two groups at a time with the two-sample t-test (pooled variance).
Hypothesis: H0: m1 = m2 = m3 = versus Ha: there is at least one difference
22 November 2013 Copyright 2013 Tata Consultancy Services limited 38
ANOVA Assumption
The purpose of one-way ANOVA is to compare means. The means of different groups of data can only be compared if the variances within each group are statistically the same.
ANOVA has two assumptions:
Data for each group should be normal The data sets have equal variances. H0: s12 = s22 = s32 = versus
Ha: there is at least one difference
Test of ANOVA is robust enough to give good result even if the assumptions are not met.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 39
ANOVA: Example
A contact centre used to receive call for different processes within organization. The Contact Centre head wanted to understand whether the response time is affected by different processes.
Response time data was collected for the 3 processes for doing ANOVA analysis.
6.5 7 6
6 6 5.4
6 6.5 4.4
7 5.5 5.5
6.5 6 4.5
7 6.5 5
6 7 6
7.5 6 7
6 5.5 6 5.5 5 4.5
5.5 4 3.5
6 6.5 3.5
7 6 4
Process C Process B Process A
22 November 2013 Copyright 2013 Tata Consultancy Services limited 40
7.06.56.05.55.04.54.0
Median
Mean
6.506.256.005.755.50
1st Q uartile 5.5000Median 6.00003rd Q uartile 6.5000Maximum 7.0000
5.4611 6.4620
5.5000 6.5000
0.5939 1.3671
A -Squared 0.46P-V alue 0.212
Mean 5.9615StDev 0.8282V ariance 0.6859Skew ness -1.02716Kurtosis 1.44419N 13
M inimum 4.0000
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for S tDev9 5 % Confidence Intervals
Summary for Process B
7.57.06.56.05.5
Median
Mean
7.06.86.66.46.26.0
1st Q uartile 6.0000M edian 6.00003rd Q uartile 7.0000M aximum 7.5000
5.9683 6.7240
6.0000 7.0000
0.4484 1.0322
A -Squared 0.59P -V alue 0.101
M ean 6.3462S tDev 0.6253V ariance 0.3910S kew ness 0.387879Kurtosis -0.844201N 13
M inimum 5.5000
A nderson-Darling Normality Test
95% C onfidence Interv al for M ean
95% C onfidence Interv al for M edian
95% C onfidence Interv al for S tDev9 5 % C onfidence Inter vals
Summary for Process C
7.06.56.05.55.04.54.0
Median
Mean
6.506.256.005.755.50
1st Q uartile 5.5000Median 6.00003rd Q uartile 6.5000Maximum 7.0000
5.4611 6.4620
5.5000 6.5000
0.5939 1.3671
A -Squared 0.46P-V alue 0.212
Mean 5.9615StDev 0.8282V ariance 0.6859Skewness -1.02716Kurtosis 1.44419N 13
Minimum 4.0000
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for S tDev95 % Confidence Intervals
Summary for Process B
All three process response time data pass the normality test.
Even if the data is not normal, one can go ahead with test of equal variances.
ANOVA: Assumptions Testing
Stat > Basic Statistics > Graphical Summary
22 November 2013 Copyright 2013 Tata Consultancy Services limited 41
ANOVA: Assumptions Testing
Assumption Testing:
Variances testing requires stacked data.
Stat > ANOVA > Test for Equal Variances
22 November 2013 Copyright 2013 Tata Consultancy Services limited 42
ANOVA: Example
Test for Equal Variances: Stacked versus Process 95% Bonferroni confidence intervals for standard deviations Process N Lower StDev Upper Process A 13 0.717000 1.07094 2.00301 Process B 13 0.554475 0.82819 1.54898 Process C 13 0.418654 0.62532 1.16955 Bartlett's Test (Normal Distribution) Test statistic = 3.25, p-value = 0.197 Levene's Test (Any Continuous Distribution) Test statistic = 1.84, p-value = 0.173
Process C
Process B
Process A
2.001.751.501.251.000.750.50Pr
oces
s95% Bonferroni Confidence Intervals for StDevs
Test Statistic 3.25P-Value 0.197
Test Statistic 1.84P-Value 0.173
Bartlett's Test
Levene's Test
Test for Equal Variances for Stacked
Since p value > 0.05 through Bartletts test, data passes the test of equal variances assumption.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 43
ANOVA: Example
Stat > ANOVA > One way (Unstacked)..
22 November 2013 Copyright 2013 Tata Consultancy Services limited 44
ANOVA: Example
One-way ANOVA: Process A, Process B, Process C Source DF SS MS F P Factor 2 12.043 6.022 8.12 0.001 Error 36 26.686 0.741 Total 38 38.729 S = 0.8610 R-Sq = 31.10% R-Sq(adj) = 27.27% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+----- Process A 13 5.0231 1.0709 (-------*-------) Process B 13 5.9615 0.8282 (-------*-------) Process C 13 6.3462 0.6253 (-------*-------) ----+---------+---------+---------+----- 4.80 5.40 6.00 6.60 Pooled StDev = 0.8610
Interpretation:
Since p < 0.05, the difference in the response time is significant and the process can be called a significant factor.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 45
Proportion Testing
Proportion Testing is used to understand whether the proportion created by the factor level is significant.
It can be of 2 types:
One Proportion Test:
Ho : PA = P0
Ha : PA > = < P0
Ho : PA = PB
Ha : PA > = < PB
Two Proportion Test:
22 November 2013 Copyright 2013 Tata Consultancy Services limited 46
Proportion Testing Example
1 Proportion Test:
A HR Services complaints resolution process is meant for resolving the complaints raised by associates. The data provided in the table suggests the % of complaints resolved by the process within 8 Hrs of timeline. Process manager claims that the process is resolving at least 30% of the complaints on more than 80% of the occasions. Is it possible to use 1 P test for validating the claim of process manager ?
22 November 2013 Copyright 2013 Tata Consultancy Services limited 47
1 Proportion Test: Example
Day Complaints Resolved %1 252 353 304 365 326 337 348 369 28
10 3011 2912 3213 3114 2815 3516 2517 3518 3019 3620 3221 3322 3423 3624 2825 3026 2927 3228 3129 2830 35
Data Suggests:
Total no. of trials: 30
No. of events of complaints resolved >= 30% : 22
One Proportion Test:
Ho : PA = 0.8
Ha : PA > 0.8
Issues
22 November 2013 Copyright 2013 Tata Consultancy Services limited 48
1 Proportion Test: Example
Stat > Basic Statistics > 1 Proportion
22 November 2013 Copyright 2013 Tata Consultancy Services limited 49
1 Proportion Test: Example
Minitab Project Report Test and CI for One Proportion Test of p = 0.8 vs p > 0.8 95% Lower Exact Sample X N Sample p Bound P-Value 1 22 30 0.733333 0.570066 0.871
Interpretation:
Since p > 0.05 through 1 P test, it is not advisable to say that the team is resolving at least 30% of complaints per day more than 80% of the times. Process managers claim of providing resolution on more than 80% of the occasions is not valid.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 50
2 Proportion Test: Example
Data Suggests:
Team Member A:
Total no. of invoices resolved: 60
Total no. of invoices without error: 32
Team Member B:
Total no. of invoices resolved: 65
Total no. of invoices without error: 48
2 Proportion Tests: In a invoice processing process, the process manager is thinking of
giving promotion to one of the team members A and B. For this he wants to look at the last 7days of invoices processed by them for getting a feel of better performer. Can you use 2P test for identifying better performer ?
22 November 2013 Copyright 2013 Tata Consultancy Services limited 51
2 Proportion Test: Example
Stat > Basic Statistics > 2 Proportion
Test and CI for Two Proportions Sample X N Sample p 1 32 60 0.533333 2 48 65 0.738462 Difference = p (1) - p (2) Estimate for difference: -0.205128 95% upper bound for difference: -0.0663404 Test for difference = 0 (vs < 0): Z = -2.43 P-Value = 0.008 Fisher's exact test: P-Value = 0.014
Interpretation:
Since p < 0.05 through 2 P test, the performance of Team Member A can be considered significantly less than performance of team member B. Hence process manager can select member B for promotion.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 52
2 Proportion Test - Exercise
On auditing two pizza outlets, 7 deliveries were late out of 155 in first one and 22 deliveries were late out of 200 in the second one. Find with 99% of confidence if the two proportions are different.
22 November 2013 Copyright 2013 Tata Consultancy Services limited 53
Contingency Table
Contingency table is used when both output and input variables are attribute in nature. It uses Chi square test for reaching to the conclusion.
Chi Square Test:
Ho : Y is independent of X
Ha : Y is not independent of X
22 November 2013 Copyright 2013 Tata Consultancy Services limited 54
During a project for looking into the recruitment possibilities, the
Personnel Department wanted to understand whether the chances
of being hired is dependent upon the age of the person. Can the
linkage between age and chances of being hired be statistically
validated ?
Hypothesis:
Ho : Hiring of a person is independent of his/ her age
Ha : Hiring of a person is not independent of his/ her age
Contingency Table : Example
22 November 2013 Copyright 2013 Tata Consultancy Services limited 55
Hired Not Hired
Old 30
45
150
230 Young
Total
Total 75 380 455
275
180
Data was collected for all the candidates who were taken through the recruitment process in last one year.
Old: > 35 Years
Young:
22 November 2013 Copyright 2013 Tata Consultancy Services limited 56
Stat > Table > Chi Square Test
Each cell must have a count of >=5 for going ahead
with the test.
Contingency Table : Example
22 November 2013 Copyright 2013 Tata Consultancy Services limited 57
Chi-Square Test Expected counts are printed below observed counts Hired Not Hire Total 1 30 150 180 29.67 150.33 2 45 230 275 45.33 229.67 Total 75 380 455 Chi-Sq = 0.004 + 0.001 + 0.002 + 0.000 = 0.007 DF = 1, P-Value = 0.932
Interpretation:
Since p > 0.05 , the hiring of a candidate is not dependent upon his/ her age.
0 1 2 3 4 5
Contingency table generate numbers by calculating observed values and expected values. In a chi square distribution, If there is independence, we expect the difference to be close to 0. The further away we are, the more likely the variables are dependent. To help us make that decision, we only need to look at p value.
Contingency Table: Analysis in Minitab
22 November 2013 Copyright 2013 Tata Consultancy Services limited 58
58
Are ladies more likely to be right handed compared to gentlemen?
Hypothesis:
Ho : There is no relationship between gender & dexterity
Ha : There is a relationship between gender & dexterity
Chi Square Test : Exercise
22 November 2013 Copyright 2013 Tata Consultancy Services limited 59
End of Analyse Phase
Lean Six Sigma Green Belt Training ANALYSE PHASE Slide Number 2Slide Number 3 Identify The Vital FewSlide Number 5 Analyze Phase FLOW : Why Graphical AnalysisBox PlotBox Plot Box Plot Example Box Plot - ExampleScatter Plot CorrelationSlide Number 14Scatter Plot & Correlation - ExampleScatter Plot & Correlation - ExampleScatter Plot & Correlation - ExampleScatter Plot Vs Correlation Analysis ParetoPareto - Example Pareto - Example Analyze Phase FLOW : Slide Number 23Statistical Inference Relies On Sampling From A Population Of Data Common terms in Hypothesis Testing Hypothesis Testing P ValueHypothesis Testing Road MapProcess Scenarios for Hypothesis tests Hypothesis Testing t-Test Procedure One Sample T-test One-sample T-test One-sample T-test using MinitabSlide Number 33 Hypothesis Testing 2 Sample T Test Hypothesis Testing 2 Sample T Test Hypothesis Testing 2 Sample T Test Analysis Of Variance (ANOVA) ANOVA Assumption ANOVA: Example ANOVA: Assumptions Testing ANOVA: Assumptions Testing ANOVA: Example ANOVA: Example ANOVA: ExampleProportion Testing Proportion Testing Example1 Proportion Test: Example1 Proportion Test: Example1 Proportion Test: Example2 Proportion Test: Example2 Proportion Test: Example2 Proportion Test - Exercise Contingency Table Contingency Table : ExampleSlide Number 55Slide Number 56Slide Number 57Slide Number 58End of Analyse Phase