14
Regression Model Building Predicting Number of Crew Members of Cruise Ships

Regression Model Building Predicting Number of Crew Members of Cruise Ships

Embed Size (px)

Citation preview

Page 1: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Regression Model Building

Predicting Number of Crew Members of Cruise Ships

Page 2: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Data Description

• n=158 Cruise Ships• Dependent Variable – Crew Size (100s)• Potential Predictor Variables

Age (2013 – Year Built) Tonnage (1000s of Tons) Passengers (100s) Length (100s of feet) Cabins (100s) Passenger Density (Passengers/Space)

Page 3: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Data – First 20 CasesShip Cruise Line Age Tonnage Pssngrs Length Cabins PassDens CrewJ ourney Azamara 6 30.277 6.94 5.94 3.55 42.64 3.55Quest Azamara 6 30.277 6.94 5.94 3.55 42.64 3.55Celebration Carnival 26 47.262 14.86 7.22 7.43 31.8 6.7Conquest Carnival 11 110 29.74 9.53 14.88 36.99 19.1Destiny Carnival 17 101.353 26.42 8.92 13.21 38.36 10Ecstasy Carnival 22 70.367 20.52 8.55 10.2 34.29 9.2Elation Carnival 15 70.367 20.52 8.55 10.2 34.29 9.2Fantasy Carnival 23 70.367 20.56 8.55 10.22 34.23 9.2Fascination Carnival 19 70.367 20.52 8.55 10.2 34.29 9.2Freedom Carnival 6 110.239 37 9.51 14.87 29.79 11.5Glory Carnival 10 110 29.74 9.51 14.87 36.99 11.6Holiday Carnival 28 46.052 14.52 7.27 7.26 31.72 6.6Imagination Carnival 18 70.367 20.52 8.55 10.2 34.29 9.2Inspiration Carnival 17 70.367 20.52 8.55 10.2 34.29 9.2Legend Carnival 11 86 21.24 9.63 10.62 40.49 9.3Liberty* Carnival 8 110 29.74 9.51 14.87 36.99 11.6Miracle Carnival 9 88.5 21.24 9.63 10.62 41.67 10.3Paradise Carnival 15 70.367 20.52 8.55 10.2 34.29 9.2Pride Carnival 12 88.5 21.24 9.63 11.62 41.67 9.3Sensation Carnival 20 70.367 20.52 8.55 10.2 34.29 9.2

Page 4: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Full Model (6 Predictors, 7 Parameters, n=158)Regression Statistics

Multiple R 0.9615R Square 0.9245Adjusted R Square 0.9215Standard Error 0.9819Observations 158

ANOVAdf SS MS F Significance F

Regression 6 1781.5 296.9 308.0 0.0000Residual 151 145.6 1.0Total 157 1927.1

CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Intercept -0.52134 1.05703 -0.493 0.6226 -2.610 1.567Age -0.01254 0.01420 -0.884 0.3783 -0.041 0.016Tonnage 0.01324 0.01189 1.113 0.2673 -0.010 0.037Pssngrs -0.14976 0.04759 -3.147 0.0020 -0.244 -0.056Length 0.40348 0.11445 3.525 0.0006 0.177 0.630Cabins 0.80163 0.08922 8.985 0.0000 0.625 0.978PassDens -0.00066 0.01581 -0.042 0.9669 -0.032 0.031

Page 5: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Backward Elimination – Model Based AIC (minimize)

ModelModel ln 2 parms(Model) constant

145.57Full Model (7 Parms, constant=0) 158 ln 2(7) 1.055

158

145.57Round 2: 158 ln 2(6) 0.943 Round 3:

158

SSEAIC n

n

AIC

AIC AIC

146.39158 ln 2(5) 2.062

158

FullMod Df SS RSS AIC Round2-passdens 1 0.002 145.57 -0.943 - age 1 0.815 146.39 -2.062-age 1 0.753 146.32 -0.13 <none> 145.57 -0.943-tonnage 1 1.195 146.77 0.347 - tonnage 1 2.007 147.58 -0.78<none> 145.57 1.055 - length 1 12.069 157.64 9.641-passengers 1 9.548 155.12 9.092 - passengers 1 14.027 159.6 11.591-length 1 11.98 157.55 11.551 - cabins 1 79.556 225.13 65.944-cabins 1 77.821 223.39 66.721

Round3<none> 146.39 -2.062- tonnage 1 3.866 150.25 0.056- length 1 11.739 158.13 8.126- passengers 1 14.275 160.66 10.64--cabins 1 78.861 225.25 64.028

Page 6: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Forward Selection (AIC Based)

1927.08TOTAL 1927.08 Null Model 158 ln 2(1) 397.18

158SS AIC

Null Model Df SS RSS AIC Round2 Df SS RSS AIC+ cabins 1 1742.21 184.88 28.82 + length 1 22.9636 161.91 9.8661+ tonnage 1 1658.03 269.05 88.1 + passdens 1 14.9541 169.92 17.4948+ passengers 1 1614.23 312.86 111.94 + tonnage 1 12.5135 172.36 19.748+ length 1 1546.6 380.49 142.86 + passengers 1 7.0656 177.81 24.6647+ age 1 542.66 1384.42 346.93 + age 1 5.4442 179.43 26.0989+ passdens 1 46.6 1880.48 395.32 <none> 184.88 28.8215<none> 1927.08 397.18

Round3 Df SS RSS AIC Round4 Df SS RSS AIC+ passengers 1 11.6609 150.25 0.0565 + tonnage 1 3.8656 146.39 -2.06164+ passdens 1 6.3732 155.54 5.5212 + age 1 2.6733 147.58 -0.77996<none> 161.91 9.8661 + passdens 1 2.5635 147.69 -0.66241+ age 1 1.9702 159.94 9.9317 <none> 150.25 0.0565+ tonnage 1 1.2514 160.66 10.6402

Round5 Df SS RSS AIC<none> 146.39 -2.06164+ age 1 0.81467 145.57 -0.94339+ passdens 1 0.06366 146.32 -0.13037

Page 7: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Stepwise Regression (AIC Based)Null Model Df SS RSS AIC Round2 Df SS RSS AIC+ cabins 1 1742.21 184.88 28.82 + length 1 22.96 161.91 9.87+ tonnage 1 1658.03 269.05 88.1 + passdens 1 14.95 169.92 17.49+ passengers 1 1614.23 312.86 111.94 + tonnage 1 12.51 172.36 19.75+ length 1 1546.6 380.49 142.86 + passengers 1 7.07 177.81 24.66+ age 1 542.66 1384.42 346.93 + age 1 5.44 179.43 26.1+ passdens 1 46.6 1880.48 395.32 <none> 184.88 28.82<none> 1927.08 397.18 - cabins 1 1742.21 1927.08 397.18

Round3 Df SS RSS AIC Round4 Df SS RSS AIC+ passengers 1 11.661 150.25 0.056 + tonnage 1 3.866 146.39 -2.062+ passdens 1 6.373 155.54 5.521 + age 1 2.673 147.58 -0.78<none> 161.91 9.866 + passdens 1 2.563 147.69 -0.662+ age 1 1.97 159.94 9.932 <none> 150.25 0.056+ tonnage 1 1.251 160.66 10.64 - passengers 1 11.661 161.91 9.866- length 1 22.964 184.88 28.821 - length 1 27.559 177.81 24.665- cabins 1 218.571 380.49 142.859 - cabins 1 95.781 246.03 75.974

Round5

<none> 146.39 -2.062+ age 1 0.815 145.57 -0.943+ passdens 1 0.064 146.32 -0.13- tonnage 1 3.866 150.25 0.056- length 1 11.739 158.13 8.126- passengers 1 14.275 160.66 10.64- cabins 1 78.861 225.25 64.028

Page 8: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Summary of Automated Models

• Backward Elimination Drop Passenger Density (AIC drops from 1.055 to -0.943) Drop Age (AIC drops from -0.943 to -2.062) Stop: Keep Tonnage, Passengers, Length, Cabins

• Forward Selection Add Cabins (AIC drops from 397.18 to 28.82) Add Length (AIC drops from 28.82 to 9.8661) Add Passengers (AIC drops from 9.8661 to -0.0565) Add Tonnage (AIC drops from -0.0565 to -2.06) Stop: Keep Tonnage, Passengers, Length, Cabins

• Stepwise – Same as Forward Selection

Page 9: Regression Model Building Predicting Number of Crew Members of Cruise Ships

All Possible (Subset) Regressions

2

2

' Number of parameters (including intercept) in Model

Regression(Model) Residual(Model)Model 1 Goal:Maximize within reason

Total Total

Residual(Model)1Adj- Model 1

' Total

p

SS SSR

SS SS

SSnR

n p SS

22

Goal:Maximize

Residual(Model)Model 2 ' Goal: ' where Residual(Full Model)

Residual(Model)Model ln ln( ) ' constant Goal:Minimize

p p

SSC p n C p s MS

s

SSBIC n n p

n

Page 10: Regression Model Building Predicting Number of Crew Members of Cruise Ships

All Possible (Subset) Regressions (Best 4 per Grp)

#preds Int Age Ton Pass Lngth Cabin PassDen R-Sq Adj-R2 Cp BIC1 1 0 0 0 0 1 0 0.904 0.903 37.772 -360.2381 1 0 1 0 0 0 0 0.86 0.859 125.086 -300.9541 1 0 0 1 0 0 0 0.838 0.837 170.523 -277.1221 1 0 0 0 1 0 0 0.803 0.801 240.675 -246.2012 1 0 0 0 1 1 0 0.916 0.915 15.952 -376.1312 1 0 0 0 0 1 1 0.912 0.911 24.261 -368.5022 1 0 1 0 0 1 0 0.911 0.909 26.792 -366.2492 1 0 0 1 0 1 0 0.908 0.907 32.443 -361.3323 1 0 0 1 1 1 0 0.922 0.921 5.857 -382.8783 1 0 0 0 1 1 1 0.919 0.918 11.341 -377.4133 1 0 1 1 0 1 0 0.918 0.916 14.023 -374.8083 1 1 0 0 1 1 0 0.917 0.915 15.909 -373.0024 1 0 1 1 1 1 0 0.924 0.922 3.847 -381.9334 1 1 0 1 1 1 0 0.923 0.921 5.084 -380.6524 1 0 0 1 1 1 1 0.923 0.921 5.197 -380.5344 1 0 1 0 1 1 1 0.919 0.917 13.056 -372.6315 1 1 1 1 1 1 0 0.924 0.922 5.002 -377.7525 1 0 1 1 1 1 1 0.924 0.922 5.781 -376.9395 1 1 0 1 1 1 1 0.924 0.921 6.24 -376.4625 1 1 1 0 1 1 1 0.92 0.917 14.904 -367.7176 1 1 1 1 1 1 1 0.924 0.921 7 -372.692

BIC

Adj-R2

Cp

Page 11: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Cross-Validation

• Hold-out Sample (Training Sample = 100, Validation = 58) Fit Model on Training Sample, and obtain Regression Estimates Apply Regression Estimates from Training Sample to Validation

Sample X levels for Predicted MSEP = sum(obs-pred)2/n Fit Model on Validation Sample and Compare regression

coefficients with model for Training Sample

• PRESS Statistic (Delete observations 1-at-a-time) Fit model with each observation deleted 1-at-a-time Obtain Residual for each observation when it was deleted PRESS = sum(obs-pred(deleted))2

• K-fold Cross-validation Extension of PRESS to where K groups of cases are deleted Useful for computationally intensive models (not OLS)

Page 12: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Hold-Out Sample – nin = 100 nout = 58Training Sample Estimate Std Err t-stat P-Value(Intercept) -1.1018 0.7735 -1.424 0.1576tonnage 0.0048 0.0118 0.407 0.6851passengers -0.1919 0.0545 -3.525 0.0007length 0.4565 0.1457 3.132 0.0023cabins 0.9506 0.1451 6.551 0.0000

Validation Sample Estimate Std Err t-stat P-Value(Intercept) -0.0970 0.9142 -0.106 0.9159tonnage 0.0286 0.0124 2.303 0.0252passengers -0.1234 0.0582 -2.119 0.0388length 0.2321 0.1917 1.211 0.2313cabins 0.7058 0.1060 6.656 0.0000

Coefficients keep signs, but significance levels change a lot.See Tonnage and Length.

2^ 2

2( )

1

2

' 10.7578 0.0005182738

0.0005182738Percent Bias of MSEP = 100 / 100 0.06838787 (%)

0.7578

Vn

iV iV TiV

MSEP y y Biasn n

Bias MSEP

Page 13: Regression Model Building Predicting Number of Crew Members of Cruise Ships

Testing Bias = 0 from Training data to Validation

-0.02276563 0.8778456

0.87784560.1152668

58

-0.02276563 -0.003405238

0.1152668

No evidence of systematic bias for samples

V

s

ss

n

ts

Page 14: Regression Model Building Predicting Number of Crew Members of Cruise Ships

PRESS Statistic

( }

( }

( }

^ ^ ^ ^

pred 0( ) 1( ) ( )1

2^

pred

1

^^

pred

where regression was fit without case

Compare with Residual for the full model

Note: 1

i i

i i

i i

i i p ii ip

n

ii

iii

ii

Y X X i

PRESS Y Y

PRESSMS

n

Y YY Y

p

where diagonal element of thiip i -1

P = X X'X X'

/ 0.9801 Residual 0.96PRESS n MS

Model appears to be valid, very little difference between PRESS/n and MS(Resid)