Practical Statistical Testing

Practical Statistical Testing

Adrian Cuyugan18 August 2014

I. T-test and Z-test 3

II. Chi-square Test of Independence 6

III. I-MR Control Charts 8

IV. Binary Logistic Regression 11

V. Data Sources 15

Agenda

T-Test and Z-testTesting the Difference of Means on a Two-Tailed Test

Problem Statement

Is there significant difference in mean between Forecasted and Calls Offered?

Data Overview

Daily forecasted call volume is done automatically using Blue Pumpkin; this is prepared by Global Workforce Management Team. Staffing, two-year historical data and other factors are used to produce this forecast.Calls offered are the actual calls that came into the IVR as initiated by the user. The data sample is from April to July 2014.

The data has a bimodal shape due to weekends having fewer number of calls. It is wiser to perform two separate analysis on weekdays and weekends if the sensitivity of the underlying problem is too high. For the sake of looking for the comparison of the two groups, forecasted and offered, the weekdays and weekends are combined. Another implication of removing the weekend is that the data will be extremely skewed to the left. Samples are collected from two different population at the same time-period, less than 10 % of the population and it is more than 30 observations which are enough to perform inference.Assuming that the data is normally distributed, we can start exploring the data further.

Hypothesis

H0 – Forecasted Calls = Calls Offered, μ = 0.

HA – Forecasted Calls ≠ Calls Offered.


Exploratory

The differences of the forecasted calls and the calls offered vary each month.Even when looked as a whole, the difference in average is 6.4 calls in favor of forecasted calls. It is also noticeable that the difference in the standard deviation is just 0.25 calls in favor of calls offered.

Calls ForecastedMin. 1st Qu. Median Mean 3rd Qu. Max. SD n13 23 188 155.1 221 278 89.7 122

Calls OfferedMin. 1st Qu. Median Mean 3rd Qu. Max. SD n

7 23.25 177.5 148.7 210.5 287 89.95 122


Variance testA non-significant p-value is not interpreted as meaning that the variances are equal, only that there is insufficient evidence to reject the null hypothesis that the variances are equal.

F-test = 0.9945, Numerator DF = 121, Denominator DF = 12195% confidence interval = 0.695185, 1.422628p-value = 0.9758

T-testt = 0.5588df = 241.99895 CI = -16.22862, 29.08108Mean of Forecasted Calls = 155.1230Mean of Calls Offered = 148.6967p-value = 0.5768

Hypothesis

H0 – Forecasted Calls = Calls Offered, μ = 0.

HA – Forecasted Calls ≠ Calls Offered.

Z-testt = 0.558895 CI = -16.11532, 28.96778Mean of Forecasted Calls = 155.1230Mean of Calls Offered = 148.6967p-value = 0.5763

ResultsStatistics LanguageThe probability of 0.5768 of having a t-score of 0.5588 in 241.998 degrees of freedom is more extreme that having less than or greater than 0.5588 from independent samples, therefore the null hypothesis is not rejected. Since the bounds of confidence intervals are beyond 0, this further supports the non-rejection of null-hypothesis.

Business LanguageThe daily number of forecasted calls provided by the Global Workforce Management is not statistically significant to say that there is difference with the number of calls offered to Voice Center; it is expected that without any unusual events, the number of agents forecasted to answer the calls is sufficient to pass the abandoned %; this test supports of having a pass rate of Abandoned % (although what is just measured are abandoned calls more than 30 seconds.)

Chi-square Test of IndependenceFinding association between two categorical variables

Problem Statement

Is there significant relationship between CSAT Survey Result and Reported Source?

Hypothesis

H0 – CSAT Survey Result and Source are independent.

HA – CSAT Survey Result and Source are dependent.

ExploratoryData Overview

A survey is sent to the user after the logged ticket has been marked as resolved. There are 8 questions included in the survey and these are scored from 1 to 5, where the latter is the highest.Since this test can only be done on categorical variables, CSAT Survey Result is used a dichotomous response variable that indicates the survey result as 1 = Positive (success) and 0 = Negative (failure).The explanatory variable is Reported Source. Service Desk only creates tickets from two sources: Phone and Email. Samples are collected within less than 10 % of the population. The expected frequencies is at least 5 counts. As the observed counts is more than enough to perform inference, bootstrapping method calculation is not done.

Chi-square Test of IndependenceFinding association between two categorical variables

Contingency Table

CSAT Result

Reported Source Contents Pos Neg Row Total

Email

Observed 241 50 291

Expected 252.5 38.5

Row % 82.8% 17.2% 28.9%

Col % 27.6% 37.6%

Phone

Observed 632 83 715

Expected 620.5 94.5

Row % 88.4% 11.6% 71.1%

Col % 72.4% 62.4%

Col Total873 133 1006

86.8% 13.2%

Results

Statistics LanguageChi-square = 5.6DF = 1P-value = 0.0180

The probability that a chi-square statistic having one degree of freedom is more extreme than 5.6 where the p-value is less than 0.05.

Business LanguageThere might be other confounding factors that may affect the relationship between the result of the CSAT and the reported source or where the ticket originated from; it may be the resolution time; how the ticket was responded by the service desk or the resolver, the problem itself, etc. But this cannot be assumed as this test is done to test the relationship between the given variables (CSAT ~ Reported Source). It is concluded that that there is strong relationship between CSAT and Reported Source; that these two vary.

Hypothesis

H0 – CSAT Survey Result and Source are independent.

HA – CSAT Survey Result and Source are dependent.

Individuals-Moving Range Control ChartsIs it within control?

Problem Statement

Are there any highly unusual events that spiked the number of calls or systemic pattern received by Voice Center?

Data Overview

When plotted on an individuals chart, you can see a seasonal pattern which occurs every 5 data points, these data points are the weekends. To produce a more sensible observation, this analysis only covers weekdays, a separate analysis can be done covering weekends, if needed.


Observations – Nelson Rules

Special CausesRule #1 - Two data points fall below the lower control limit which are highly unusual to happen on a normal working day. This is caused by the two holidays in the United States: Memorial Day and Fourth of July.

Common Causes• Rule #2 - There are no more than 8 consecutive points that fall

below or above the center line. • Rule #3 - There are no 6 consecutive points show increasing or

decreasing trend.• Rule #5 - No points that are very close to the limits. • Rule #4 - Close to oscillation as the data points are very random.


Trend

Looking at the whole dataset plotted in time-series and not just each of the data points that are within control, we can further test if there’s an overall trend.Based on the decomposition trend, there’s no obvious pattern. Additive model is used as we do not assume that the calls increase as the time progresses.An exponential smoothing 5-day forecasting can be done, if needed.

Binary Logistic RegressionPredicting the outcome of a binary categorical dependent variable

Problem Statement

What are the odds and the probabilities of predicting the CSAT survey result based on the ticket age (resolution time), reported source, VIP user status, status reason of the ticket and resolution method.

Data Overview

Each case in the dataset is the survey result responded by the user. The sample is from August 2013 to July 2014. Only tickets that have been resolved by groups are included in this analysis.The CSAT survey consists of 9 questions in which the first 8 questions can be rated by the respondent 1 to 5, where the latter is the highest. The last question contains free-form text in which the respondent can provide comments based on the respondent’s feelings. The result of the survey is the sum of the eight questions. It is easier to regress the outcome as the total score is computed mathematically based on the questions but for this analysis, different variables were used to predict the outcome of the survey.As previously tested, the survey result is dependent on the reported source. In this test, we can determine if this predictor is still significant.Several data munging were done to produce categorical variables from continuous variables (dummy coding). Continuous variable can also be predictors.

ResponseCSAT Result1 – Positive (Success)0 – Negative (Failure)

PredictorsTicket Age Reported Source VIP Status Reason Resolution Method1 – < 3 Days 1 – Phone 1 – Yes 1 – First Call Resolved 1 – Service Desk Assisted2 – < 7 Days 2 – Email 2 – No 2 – Status Call 2 – Remote Control3 – < 15 Days 3 – Others 3 – On-site Support4 – < 30 Days 4 – Self-service5 – 30+ Days


Model 1

Assessing the fit of the modelThe probability that a chi-square statistic having 11 degrees of freedom is more extreme than 31.56 where the p-value is less than 0.05 (p-value = 0.0008970423). This means that when there is no residual deviance left and when all of the degrees of freedom have been used, only the following predictors are significant to the response variable:• Ticket Age• Status Reason

Having a lot of insignificant predictors, a better model can be built even this one is a good fit against an empty model.


Model 2

Assessing the fit of the modelThe probability that a chi-square statistic having 6 degrees of freedom is more extreme than 26.71 where the p-value is less than 0.05 (p-value = 0.0001640877). Overall effectSince the model is a good fit against the null, we can proceed with other diagnosis to test the two predictors of the overall effect since:

Ticket Age has 5 categories Status Reason has 3 categoriesWald test Chi-square = 10.2 Wald test Chi-square = 7.6

df = 4 df = 2p-value = 0.037 p-value = 0.022

Both of the categorical predictors are significant. This means that the difference between two categories is statistically significant – the difference between < 7 Days and < 15 Days, and Status Call and Others.Dummy coding and base categoriesYou might notice that < 3 Days and First Call Resolved are both missing from the generalized linear model summary, it is because that R uses these categories as the base in computing the coefficients, where if the ticket has been resolved in less than 3 days, the coefficient is 0. This also follows the same calculation for Status Reason.These coefficients are very hard to interpret not like in a linear regression because it follows the logit of the value. We can compute for the odds ratio of each predictor and category by using the exponential form against the log and the probability by computing for the scale and location of the parameters.

Binary Logistic RegressionPredicting the outcome of a binary categorical dependent variable Odds RatioFor a unit increase in each of the categorical predictors, the odds of having a positive survey is the value in the OddsRatio table, this is more interpretable compared to the logit coefficients from the previous slide.

TicketAgeClass StatusReason2 Prob

< 3 Days First Call Resolved 100.00%



< 3 Days Status Call 100.00%



< 3 Days Others 100.00%





















+ 30 Days First Call Resolved 37.50%

+ 30 Days Status Call 37.50%

+ 30 Days Others 37.50%

Since the base variables are <3 Days and First Call Resolved, it is more likely that the CSAT survey will be positive 10 times when tickets that have been resolved less than 3 days and where the status reason is First Call Resolved compared to the other tickets that have been resolved longer and have a different status reason.ProbabilityThe probability of having a success or positive CSAT survey may range because of the variation in the data. Although we have an idea what would be the outcome of the survey based on the significant predictors that we have finalized, Ticket Age and Status Reason.

Practical Statistical Testing

Thank You

Documents

Practical Statistical Testing