Upload
andrew-sorensen
View
4
Download
0
Embed Size (px)
Citation preview
THE W PROJECT HAILEE REIST, ANDY SORENSEN, RYAN KEMP
BACKGROUND INFORMATION
• The main objective of this project was to work with Sheila
Kittleson in the W to figure out what time the W should
close in the summer by utilizing data analysis techniques.
• To do this, data was received containing check-in times
and dates from 6:00pm onward of the previous calendar
year and the past four summers.
• This problem was approached mainly by looking at the
customers that tend to return throughout the summer.
SAMPLE RAW DATAThe following is a sample of the data set (10,372 observations) of all check-ins by non-student members in the last year (May 2013-April 2014).
SAMPLE RAW DATA (CONT.)
The following is a sample of the data set (11,592 observations) of all check-ins in the months of May, June, July, and August in the years 2010-2013.
PLOT SUMMARIES OF DATA
This graph shows the number of members who have checked into the W in the past year, starting with May 2013 and ending with April 2014.
May June July August September October November December January February March April0
200
400
600
800
1000
1200
1400
1600
1800
Total Activity per Month (2013-2014)
This plot shows the age of all of the members of the W who have been active (checked-in) in the last year (May 2013-April 2014).
This plot shows the times that members checked-in over the past year (May 2013-April 2014; After 6:00 PM).
This plot shows the activity of members after 6:00 PM in respective months starting in May 2013 and ending in April 2014.
May
June Ju
ly
Augus
t
Septe
mbe
r
Octob
er
Novem
ber
Decem
ber
Janu
ary
Febru
ary
Mar
chApr
il635
640
645
650
655
660
Average Time
Average Time
This graph represents the average time in which members checked-in in respective months starting in May 2013 and ending in April 2014 (all data is after 6:00 PM).
This chart shows the number of members (by month) that have checked into the W after 8:00 PM.
May June July August September October NovemberDecember January February March April0
20
40
60
80
100
120
After 8:00 PM
After 800
SUMMARY OF DATA ANALYSIS TECHNIQUES
Finding All Even Return Customers
R INPUT:
may2013 <- subset(summer, Month=="May" & Year=="2013")
june2013 <- subset(summer, Month=="June" & Year=="2013")
j13 <- NULL
for(i in 1:nrow(may2013)) may2013$j13[i] <- (match(may2013$Account..Number[i], june2013$Account..Number, nomatch=0))
mj2013 <- (subset(may2013, j13!=0 & Time>=800))
write.csv(mj2013, file="mj2013.csv")
R OUTPUT:
Percentage Return of Clients in Summer Months
R INPUT and OUTPUT:
EXCEL OUTPUT:
This graph shows the number of people who checked in during June, July or August sometime after 6pm who also checked in after 8pm in May...divided by the total number of people who checked in after 8pm in May during the respective year.
may
june1
0
may
july1
0
may
augu
st10
may
june1
1
may
july1
1
may
augu
st11
may
june1
2
may
july1
2
may
augu
st12
may
june1
3
may
july1
3
may
augu
st13
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
Percent Return
PercentReturn
This plot demonstrates the amount decline from average check-ins in 2010-2012 compared to 2013 for each of the three summer months in 2013 compared to May 2013.
mayjune
mayjuly
mayaugust
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0%
Percent Decline
HYPOTHESIS TESTING
H0: Age has no effect on whether or not a customer returns in
the summer, when the hours are changed.
HA: Age has an effect on whether or not a customer returns in
the summer, when the hours are changed.
TESTING THE HYPOTHESIS EXCEL INPUT:
=IF(COUNTIF('Macintosh HD:Users:Andy:Desktop:[Summer Repeat Customers.xlsx]Summer Repeats'!$H$3:$H$73,F2)>0, TRUE, FALSE
EXCEL OUTPUT:
R-INPUT AND OUTPUTJune:
> mod=glm(June== "TRUE"~Age, data=mayrepeats, family="binomial")
> summary(mod)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07109 0.25441 -4.210 2.55e-05 ***
Age 0.02694 0.00585 4.605 4.12e-06 ***
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
> xyplot(June + fitted(mod)~ Age, data=mayrepeats)
JUNE
Age 0.02694 0.00585 4.605 4.12e-06 ***
JULY
Age 0.033744 0.006032 5.594 2.21e-08 ***
AUGUST
Age 0.042307 0.006844 6.182 6.35e-10 ***
CONCLUSION OF HYPOTHESIS TESTING Reject the Null Hypothesis
Small P-values: 4.12e-06, 2.21e-08, and 6.35e-10
Example: Does a 40 year old customer in May return in June?
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.07109 0.25441 -4.210 2.55e-05 ***
Age 0.02694 0.00585 4.605 4.12e-06 ***
. 02694 * (40) - 1.07109 = .00651
Link Value= .00651
Exponential Model : e^(00651)/(1+e^(00651)) = 0.501625
.501625 Probability a 40 year old customer returns in June
Conclusion: The general linear model suggests that older individuals are more likely to return in the summer.
JUNE EXAMPLE
FINAL CONCLUSION• From the data analysis conducted as summarized in this report, a
closing time of 9:00pm during June, July, and August is
suggested.
• Total Customers and Total Returning Customers both declined
when closing time was changes from 9:00 PM to 8:45 PM.
• The hypothesis testing summarized in this report suggests that
older clients are more likely to return in the summer.
• From this finding, older clients should be contacted first when
asking members about their preferences for the W’s hours.
• Financial data was not made available, so the conclusions drawn
in this report may not reflect what is most cost effective.
• For example, with the percentage decline in returning customers,
one could use this information to compare with the financial
consequences of staying open later or closing earlier.