W Project Presentation

Preview:

Citation preview

THE W PROJECT HAILEE REIST, ANDY SORENSEN, RYAN KEMP

BACKGROUND INFORMATION

• The main objective of this project was to work with Sheila

Kittleson in the W to figure out what time the W should

close in the summer by utilizing data analysis techniques.

• To do this, data was received containing check-in times

and dates from 6:00pm onward of the previous calendar

year and the past four summers.

• This problem was approached mainly by looking at the

customers that tend to return throughout the summer.

SAMPLE RAW DATAThe following is a sample of the data set (10,372 observations) of all check-ins by non-student members in the last year (May 2013-April 2014).

SAMPLE RAW DATA (CONT.)

The following is a sample of the data set (11,592 observations) of all check-ins in the months of May, June, July, and August in the years 2010-2013.

PLOT SUMMARIES OF DATA

This graph shows the number of members who have checked into the W in the past year, starting with May 2013 and ending with April 2014.

May June July August September October November December January February March April0

200

400

600

800

1000

1200

1400

1600

1800

Total Activity per Month (2013-2014)

This plot shows the age of all of the members of the W who have been active (checked-in) in the last year (May 2013-April 2014).

This plot shows the times that members checked-in over the past year (May 2013-April 2014; After 6:00 PM).

This plot shows the activity of members after 6:00 PM in respective months starting in May 2013 and ending in April 2014.

May

June Ju

ly

Augus

t

Septe

mbe

r

Octob

er

Novem

ber

Decem

ber

Janu

ary

Febru

ary

Mar

chApr

il635

640

645

650

655

660

Average Time

Average Time

This graph represents the average time in which members checked-in in respective months starting in May 2013 and ending in April 2014 (all data is after 6:00 PM).

This chart shows the number of members (by month) that have checked into the W after 8:00 PM.

May June July August September October NovemberDecember January February March April0

20

40

60

80

100

120

After 8:00 PM

After 800

SUMMARY OF DATA ANALYSIS TECHNIQUES

Finding All Even Return Customers

R INPUT:

may2013 <- subset(summer, Month=="May" & Year=="2013")

june2013 <- subset(summer, Month=="June" & Year=="2013")

j13 <- NULL

for(i in 1:nrow(may2013)) may2013$j13[i] <- (match(may2013$Account..Number[i], june2013$Account..Number, nomatch=0))

mj2013 <- (subset(may2013, j13!=0 & Time>=800))

write.csv(mj2013, file="mj2013.csv")

R OUTPUT:

Percentage Return of Clients in Summer Months

R INPUT and OUTPUT:

EXCEL OUTPUT:

This graph shows the number of people who checked in during June, July or August sometime after 6pm who also checked in after 8pm in May...divided by the total number of people who checked in after 8pm in May during the respective year.

may

june1

0

may

july1

0

may

augu

st10

may

june1

1

may

july1

1

may

augu

st11

may

june1

2

may

july1

2

may

augu

st12

may

june1

3

may

july1

3

may

augu

st13

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Percent Return

PercentReturn

This plot demonstrates the amount decline from average check-ins in 2010-2012 compared to 2013 for each of the three summer months in 2013 compared to May 2013.

mayjune

mayjuly

mayaugust

0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0%

Percent Decline

HYPOTHESIS TESTING

H0: Age has no effect on whether or not a customer returns in

the summer, when the hours are changed.

HA: Age has an effect on whether or not a customer returns in

the summer, when the hours are changed.

TESTING THE HYPOTHESIS EXCEL INPUT:

=IF(COUNTIF('Macintosh HD:Users:Andy:Desktop:[Summer Repeat Customers.xlsx]Summer Repeats'!$H$3:$H$73,F2)>0, TRUE, FALSE

EXCEL OUTPUT:

R-INPUT AND OUTPUTJune:

> mod=glm(June== "TRUE"~Age, data=mayrepeats, family="binomial")

> summary(mod)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.07109 0.25441 -4.210 2.55e-05 ***

Age 0.02694 0.00585 4.605 4.12e-06 ***

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

> xyplot(June + fitted(mod)~ Age, data=mayrepeats)

JUNE

Age 0.02694 0.00585 4.605 4.12e-06 ***

JULY

Age 0.033744 0.006032 5.594 2.21e-08 ***

AUGUST

Age 0.042307 0.006844 6.182 6.35e-10 ***

CONCLUSION OF HYPOTHESIS TESTING Reject the Null Hypothesis

Small P-values: 4.12e-06, 2.21e-08, and 6.35e-10

Example: Does a 40 year old customer in May return in June?

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.07109 0.25441 -4.210 2.55e-05 ***

Age 0.02694 0.00585 4.605 4.12e-06 ***

. 02694 * (40) - 1.07109 = .00651

Link Value= .00651

Exponential Model : e^(00651)/(1+e^(00651)) = 0.501625

.501625 Probability a 40 year old customer returns in June

Conclusion: The general linear model suggests that older individuals are more likely to return in the summer.

JUNE EXAMPLE

FINAL CONCLUSION• From the data analysis conducted as summarized in this report, a

closing time of 9:00pm during June, July, and August is

suggested.

• Total Customers and Total Returning Customers both declined

when closing time was changes from 9:00 PM to 8:45 PM.

• The hypothesis testing summarized in this report suggests that

older clients are more likely to return in the summer.

• From this finding, older clients should be contacted first when

asking members about their preferences for the W’s hours.

• Financial data was not made available, so the conclusions drawn

in this report may not reflect what is most cost effective.

• For example, with the percentage decline in returning customers,

one could use this information to compare with the financial

consequences of staying open later or closing earlier.