View
22
Download
0
Category
Preview:
Citation preview
A Six Sigma Analysis of Mobile Data
Usage
Minitab Insights Conference
September 13, 2016
Brandon Theiss, PE, ESQBrandon.Theiss@gmail.com
Motivation
Is my current mobile data plan with Republic Wireless
Optimal Given my data usage?
Learning Objectives
• Apply the Six Sigma Methodology to Non
Traditional Applications
• Utilize Monte Carlo simulations to make
predictions
• Utilize Non Parametric Hypothesis testing
• Utilize Process Capability to determine
specification limitations for non-normal data
4 Major Mobile Phone Carriers
Plans Offered By Verizon
20% of Verizon customers charged overages
in past year*
Plans Offered By AT&T
28% of AT&T customers charged overages
in past year*
Plans Offered By T-Mobile
12% of T-Mobile customers charged overages
in past year*
5% of Sprint customers charged overages
in past year*
Plans Offered By Sprint
Plans Offered By Republic Wireless
The Data Set
Data was collected from March 23, 2015
Through March 24, 2016
Comparison of Carriers Small Data Plans
Data Speed Potentially Decreased
Comparison of Carriers Medium Data Plans
Data Speed Potentially Decreased
Comparison of Carriers Large Data Plans
Comparison of Carriers X-Large Data Plans
ATT (1
5GB)
Verizon
(12G
B)
Veri zon
(1GB)
Sprin
t (1G
B)
ATT (2
GB)
Republic
(5GB)
Verizon
(3GB)
Sprin
t (12
GB)
T-M
obile (1
0GB)
Verizon (6
GB)
ATT (5
GB)
Sprin
t (3G
B)
Sprin
t (6G
B)
T-M
obile (6
GB)
Repub
lic (3
GB)
T-M
obile (2
GB)
Republic
(2GB)
$ 1,600.00
$ 1,400.00
$ 1,200.00
$ 1,000.00
$ 800.00
$ 600.00
$ 400.00
$ 200.00
$ 0.00
Plan
An
nu
al
Chart of Annual Cost
How Much Would Each Plan have cost for the Year?
1st Quartile 3053.3
Median 3504.6
3rd Quartile 4588.6
Maximum 5905.4
3052.5 4459.2
3054.0 4587.4
784.2 1879.6
A-Squared 0.25
P-Value 0.687
Mean 3755.8
StDev 1107.0
Variance 1225527.2
Skewness 0.314666
Kurtosis -0.123559
N 12
Minimum 1911.6
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
60005000400030002000
Median
Mean
4500425040003750350032503000
95% Confidence Intervals
Summary Report for Total Monthly Usage
A First Statistical Approach (monthly data)
Is The Data Normally Distributed?
121110987654321
6000
4000
2000
Observation
Ind
ivid
ua
l Va
lue
_
X=3756
UCL=6424
LCL=1088
121110987654321
3000
2000
1000
0
Observation
Mo
vin
g R
an
ge
__
MR=1003
UCL=3278
LCL=0
I-MR Chart of Total Monthly Usage
Is The Data Is Statistical Control?
Plan Annual Cost
ATT (2GB) $ 1,065.00
Sprint (1GB) $ 1,065.00
Is a 1GB (1,000MB) Limit Appropriate?
Is a 2GB (2,000MB) Limit Appropriate?
Plan Annual Cost
Republic (2GB) $ 480.00
T-Mobile (2GB) $ 600.00
ATT(2GB) $ 1,065.00
Plan Annual Cost
Republic (3GB) $ 660.00
Sprint (3GB) $ 840.00
Verizon (3GB) $ 1,020.00
Is a 3GB (3,000MB) Limit Appropriate?
Plan Annual Cost
ATT (5GB) $ 915.00
Republic (5GB) $ 1,020.00
ATT (5GB)
$ 1,500.00
Is a 5GB (5,000MB) Limit Appropriate?
Plan Annual Cost
T-Mobile (6GB) $ 780.00
Sprint (6GB) $ 780.00
Verizon (6GB) $ 960.00
Is a 6GB (6,000MB) Limit Appropriate?
Plan Annual Cost
T-Mobile (10GB) $ 960.00
Is a 10GB (10,000MB) Limit Appropriate?
~6 Sigma !
Plan Annual Cost
Sprint (12GB) $ 960.00
Verizon (12GB) $ 1200.00
Is a 12GB (12,000MB) Limit Appropriate?
Greater than 6 Sigma!
2/19/
2016
1/13
/201
6
12/7
/201
5
10/3
1/201
5
9/24
/201
5
8/18
/201
5
7/12/
2015
6/5/
2015
4/29
/201
5
3/24/
2015
1200
1000
800
600
400
200
0
Date
Data
Usa
ge
Time Series Plot of Data Usage
A Second Statistical Approach (daily data)
1st Quartile 69.13
Median 96.70
3rd Quartile 138.00
Maximum 1100.00
112.59 133.69
88.25 102.97
95.71 110.67
A-Squared 27.78
P-Value <0.005
Mean 123.14
StDev 102.64
Variance 10535.65
Skewness 3.9407
Kurtosis 26.1682
N 366
Minimum 0.00
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
10008006004002000
Median
Mean
14013012011010090
95% Confidence Intervals
Summary Report for Data Usage
Descriptive Statistics On Daily Usage
If The Data Is Not Normal What Approximates The Data?
The Johnson Transformation of the Data
111098754321
3.0
1.5
0.0
-1.5
-3.0
Billing Cycle
Ind
ivid
ua
l Va
lue
_X=-0.003
UCL=2.430
LCL=-2.436
111098754321
4
3
2
1
0
Billing Cycle
Mo
vin
g R
an
ge
__MR=0.915
UCL=2.989
LCL=0
1
11
11
11
11
1
I-MR Chart of Transformed Data Usage
Is the Data In Statistical Control?
121110987654321
1200
1000
800
600
400
200
0
Billing Cycle
Data
Usa
ge
106.75261.6645
103.34889.1433104.029150.58156.348
190.497
101.303122.071153.747137.239
Boxplot of Data Usage
A Third Statistical Approach
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Billing Cycle 11 429109 39010 4.04 0.000
Error 354 3416405 9651
Total 365 3845514
Model Summary
S R-sq R-sq(adj) R-sq(pred)
98.2388 11.16% 8.40% 4.99%
Method
Null hypothesis All means are equal
Alternative hypothesis At least one mean is different
Significance level α = 0.05
Equal variances were assumed for the analysis.
Factor Information
Factor Levels Values
Billing Cycle 12 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
One-way ANOVA: Data Usage versus
Billing Cycle
Is There Statistically Significant Difference Between The Months?
But ANOVA Requires The Data to be Normal
Kruskal-Wallis Test: Data Usage versus Billing Cycle
Kruskal-Wallis Test on Data Usage
Billing
Cycle N Median Ave Rank Z
1 31 108.80 217.0 1.84
2 30 130.50 249.8 3.58
3 31 88.40 187.4 0.21
4 30 85.60 160.9 -1.22
5 31 137.90 265.5 4.51
6 31 129.40 234.7 2.82
7 30 88.15 182.3 -0.07
8 31 93.80 187.9 0.24
9 30 75.70 135.9 -2.57
10 31 75.00 148.9 -1.90
11 31 62.50 86.3 -5.35
12 29 73.20 142.6 -2.17
Overall 366 183.5
H = 82.19 DF = 11 P = 0.000
H = 82.19 DF = 11 P = 0.000 (adjusted for
ties)
A First Non-Parametric Approach
20
10
0
1050
900
750
600
450
300
150
0 1050
900
750
600
450
300
150
0
20
10
0
105090
075
060
045030
015
00
20
10
0
105090
075
060
045030
015
00
1
Data Usage
Fre
qu
en
cy
2 3 4
5 6 7 8
9 10 11 12
Histogram of Data Usage
Panel variable: Billing Cycle
Kruskal-Wallis Test Requires
The Distributions To Have Similar Shapes
Mood Median Test: Data Usage versus Billing Cycle
Mood median test for Data Usage
Chi-Square = 70.53 DF = 11 P = 0.000
Billing Individual 95.0% CIs
Cycle N≤ N> Median Q3-Q1 --+---------+---------+---------
+----
1 10 21 109 68 (*--)
2 4 26 131 45 (-*---)
3 17 14 88 59 (-*-----)
4 19 11 86 46 (-*-)
5 5 26 138 156 (-----*---------------)
6 8 23 129 81 (----*----)
7 16 14 88 78 (--*-----)
8 17 14 94 44 (-*-)
9 21 9 76 44 (-*--)
10 22 9 75 46 (-*-)
11 26 5 63 36 (-*-)
12 18 11 73 83 (--*-----)
--+---------+---------+---------
+----
60 120 180 240
Overall median = 97
A Second Non-Parametric Approach
A Fourth Statistical Approach
SaturdayFridayThursdayWednesdayTuesdayMondaySunday
1200
1000
800
600
400
200
0
Day Of Week
Data
Usa
ge
124.35117.36125.612116.7687.934127.687163.094
Boxplot of Data Usage
A Fifth Statistical Approach (by days of the week)
20
10
0
10509007506004503001500
10509007506004503001500
20
10
0
10509007506004503001500
20
10
0
Sunday
Data Usage
Fre
qu
en
cy
Monday Tuesday
Wednesday Thursday Friday
Saturday
Histogram of Data Usage
Panel variable: Day Of Week
What Do The Distributions Of Each Day Look Like?
SUNDAY
1st Quartile 74.32
Median 120.15
3rd Quartile 197.35
Maximum 1100.00
115.83 210.36
84.71 152.41
142.27 210.53
A-Squared 4.65
P-Value <0.005
Mean 163.09
StDev 169.77
Variance 28821.13
Skewness 3.7420
Kurtosis 18.3156
N 52
Minimum 0.00
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
9607204802400
Median
Mean
225200175150125100
95% Confidence Intervals
Summary Report for Data Usage
Sunday Descriptive Statistics
What Distribution Models Sunday?
9607204802400
40
30
20
10
0
Shape # 1.369
Scale # 125.2
Thresh # 26.73
N 50
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
# This estimated historical parameter is used in the calculations.
A 3-Parameter Weibull Models Sunday Data
Red Bars indicate outliers that were excluded from parameter determination
MONDAY
1st Quartile 67.92
Median 89.85
3rd Quartile 134.92
Maximum 619.30
94.34 161.03
78.71 108.15
100.37 148.52
A-Squared 5.23
P-Value <0.005
Mean 127.69
StDev 119.76
Variance 14342.75
Skewness 2.55234
Kurtosis 7.19780
N 52
Minimum 0.00
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
6004803602401200
Median
Mean
16014012010080
95% Confidence Intervals
Summary Report for Data Usage
Monday Descriptive Statistics
What Distribution Models Monday?
600480360240120
35
30
25
20
15
10
5
0
Shape # 1.916
Scale # 74.12
Thresh # 29.30
N 48
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
# This estimated historical parameter is used in the calculations.
A 3-Parameter Weibull Models Monday Data
Red Bars indicate outliers that were excluded from parameter determination
TUESDAY
1st Quartile 61.250
Median 81.400
3rd Quartile 105.600
Maximum 289.700
75.526 100.342
72.217 89.345
37.785 55.699
A-Squared 1.76
P-Value <0.005
Mean 87.934
StDev 45.017
Variance 2026.544
Skewness 2.02797
Kurtosis 7.44336
N 53
Minimum 0.000
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
300240180120600
Median
Mean
100959085807570
95% Confidence Intervals
Summary Report for Data Usage
Tuesday Descriptive Statistics
What Distribution Models Tuesday?
30024018012060
25
20
15
10
5
0
Shape # 1.882
Scale # 57.02
Thresh # 34.60
N 51
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
# This estimated historical parameter is used in the calculations.
A 3-Parameter Weibull Models Tuesday Data
Red Bars indicate outliers that were excluded from parameter determination
WEDNESDAY
1st Quartile 69.00
Median 97.10
3rd Quartile 154.00
Maximum 321.50
97.32 136.20
77.27 113.79
59.20 87.27
A-Squared 2.07
P-Value <0.005
Mean 116.76
StDev 70.53
Variance 4974.95
Skewness 1.10549
Kurtosis 0.67508
N 53
Minimum 0.00
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
320240160800
Median
Mean
1401301201101009080
95% Confidence Intervals
Summary Report for Data Usage
Wednesday Descriptive Statistics
What Distribution Models Wednesday?
3202802402001601208040
20
15
10
5
0
Shape 1.430
Scale 104.1
Thresh 24.66
N 52
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
A 3-Parameter Weibull Models Wednesday Data
THURSDAY
1st Quartile 74.90
Median 102.00
3rd Quartile 163.27
Maximum 449.50
105.25 145.97
81.45 134.75
61.29 90.69
A-Squared 1.99
P-Value <0.005
Mean 125.61
StDev 73.13
Variance 5347.80
Skewness 2.03570
Kurtosis 6.42894
N 52
Minimum 42.10
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
400300200100
Median
Mean
1401301201101009080
95% Confidence Intervals
Summary Report for Data Usage
Thursday Descriptive Statistics
What Distribution Models Thursday?
400300200100
25
20
15
10
5
0
Shape # 1.364
Scale # 85.54
Thresh # 40.89
N 52
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
# This estimated historical parameter is used in the calculations.
A 3-Parameter Weibull Models Thursday Data
Red Bars indicate outliers that were excluded from parameter determination
FRIDAY
1st Quartile 67.70
Median 100.95
3rd Quartile 122.95
Maximum 435.30
94.58 140.14
84.26 105.70
68.58 101.49
A-Squared 4.30
P-Value <0.005
Mean 117.36
StDev 81.84
Variance 6697.42
Skewness 2.21566
Kurtosis 5.35910
N 52
Minimum 10.70
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
4003002001000
Median
Mean
1401301201101009080
95% Confidence Intervals
Summary Report for Data Usage
Friday Descriptive Statistics
What Distribution Models Friday?
400300200100
35
30
25
20
15
10
5
0
Shape # 1.670
Scale # 61.32
Thresh # 39.17
N 51
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
# This estimated historical parameter is used in the calculations.
A 3-Parameter Weibull Models Friday Data
Red Bars indicate outliers that were excluded from parameter determination
SATURDAY
1st Quartile 69.73
Median 101.85
3rd Quartile 137.40
Maximum 597.70
96.46 152.24
82.46 121.30
83.94 124.22
A-Squared 4.52
P-Value <0.005
Mean 124.35
StDev 100.17
Variance 10033.47
Skewness 2.79744
Kurtosis 9.97571
N 52
Minimum 0.00
Anderson-Darling Normality Test
95% Confidence Interval for Mean
95% Confidence Interval for Median
95% Confidence Interval for StDev
6004803602401200
Median
Mean
16014012010080
95% Confidence Intervals
Summary Report for Data Usage
Saturday Descriptive Statistics
What Distribution Models Saturday?
600480360240120
35
30
25
20
15
10
5
0
Shape # 1.246
Scale # 69.33
Thresh # 44.41
N 50
Data Usage
Fre
qu
en
cy
Histogram of Data Usage3-Parameter Weibull
# This estimated historical parameter is used in the calculations.
A 3-Parameter Weibull Models Saturday Data
Red Bars indicate outliers that were excluded from parameter determination
THE SIMULATION
The Simulation Equation
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Sunday Monday Tuesday Wednesday Thursday
Tuesday Wednesday Thursday Friday Saturday
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +=Bill 1
Sunday Monday Tuesday Wednesday Thursday Friday Saturday Total
Bill1 4 4 5 5 5 4 4 31
Bill2 4 4 4 4 4 5 5 30
Bill3 5 5 5 4 4 4 4 31
Bill4 4 4 4 5 5 4 4 30
Bill5 5 4 4 4 4 5 5 31
Bill6 4 5 5 5 4 4 4 31
Bill7 4 4 4 4 5 5 4 30
Bill8 5 5 4 4 4 4 5 31
Bill9 4 4 5 5 4 4 4 30
Bill10 4 4 4 4 5 5 5 31
Bill11 5 5 5 4 4 4 4 31
Bill12 4 4 4 5 4 4 4 29
The Simulation Parameters
The Simulation Results
The Simulation Results
The Simulation Results
ASSESSING CAPABILITY
FROM SIMULATION RESULTS
Is a 1GB (1,000MB) Limit Appropriate?
Is a 2GB (2,000MB) Limit Appropriate?
Is a 3GB (3,000MB) Limit Appropriate?
Is a 4GB (4,000MB) Limit Appropriate?
Is a 5GB (5,000MB) Limit Appropriate?
Is a 6GB (6,000MB) Limit Appropriate?
Is a 10GB (10,000MB) Limit Appropriate?
Is a 12GB (12,000MB) Limit Appropriate?
Data Usage <1 1-2 2-3 3-4 4-5 5-6 >6 Expected Monthly Charge0.000% 0.330% 29.190% 52.890% 15.850% 1.650% 0.090%
Sprint (1GB) $ 40.00 $ 0.05 $ 4.38 $ 7.93 $ 2.38 $ 0.25 $ 0.01 $ 55.00
Sprint (3GB) $ 50.00 $ 7.93 $ 2.38 $ 0.25 $ 0.01 $ 60.57
Sprint (6GB) $ 65.00 $ 65.00
VZ (1Gb) $ 50.00 $ 0.05 $ 4.38 $ 7.93 $ 2.38 $ 0.25 $ 0.01 $ 65.00
ATT (2GB) $ 55.00 $ 4.38 $ 7.93 $ 2.38 $ 0.25 $ 0.01 $ 69.95
ATT (5GB) $ 75.00 $ 0.25 $ 0.01 $ 75.26
VZ (3GB) $ 65.00 $ 7.93 $ 2.38 $ 0.25 $ 0.01 $ 75.57
Sprint (12GB) $ 80.00 $ 80.00
VZ (6GB) $ 80.00 $ 0.01 $ 80.01
VZ(12GB) $ 100.00 $ 100.00
ATT (15GB) $ 125.00 $ 125.00
Plan Selection Based on Simulation
Measured Simulation
Ppk % Ppk %
1GB -0.83 99.36% -1.22 100%
2GB -0.53 94.36% -0.7047 99.67
3GB -0.23 75.26% -0.1984 70.48%
5GB 0.37 13.05% 0.81 1.74%
6GB 0.68 2.13% 1.31 0.09%
10GB 1.88 0.00% 3.33 0.00%
12GB 2.48 0.00% 4.35 0.00%
Comparison of Simulated and Measured Capability
Conclusion
• Mobile Phone Data usage can be analyzed using:– Descriptive Statistics
– Run Charts
– Probability Plots
– Control Chart
– Process Capability
• Non-Normal Data requires different hypothesis test including:– Kruskal-Wallis
– Mood Median
• A Stochastic Simulation Model can be created by:– Determining a distribution that characterized each factor
– Specifying a mathematical relationship between the factors
• A Process Capability on simulated data can be used to determine specification limits
Questions?
Contact Information:
Brandon R. Theiss, PE
Rutgers School of Law- Camden
Brandon.Theiss@Rutgers.edu
Recommended