16
Kirandeep Kaur BSIS 610 Analysis Collaborators: Yesenia Ortiz and Divya Tanna December 12, 2014 This research report will be divided into the following sections. 1. How the National Veterans’ Organization can use a model for predicting donors’ gift amounts. 2. An explanation of using descriptive statistics on the output variable, average gift amount. 3. Graphs that portray the strength of relationships between each of the 11 variables and the output variable, average gift amount. 4. A multiple linear regression model that includes only selected input variables and their relationships with average gift amount. 5. Recommendations for the multiple regression model. 1. Introduction: A model can be used in several ways to predict donors’ average gift amounts to the National Veterans’ Organization. Average gift amount refers to how much individuals donate on average to the organization. Lawrence Henze stated four advantages for an organization to conduct a model, which are outlined next. First, a model enables the organization to analyze everyone in its database and select individuals who are more likely to donate based on their profiles. Second, a model helps the organization better understand which potential donors are likely to donate the most. Therefore, a model allows the organization to select a marketing technique that would effectively target prospective donors. In addition, the model allows the organization to save time, energy and target individuals who are most likely to donate to the organization.

Linear Regression Model-The National Veterans' Organization

Embed Size (px)

Citation preview

Page 1: Linear Regression Model-The National Veterans' Organization

Kirandeep Kaur

BSIS 610

Analysis Collaborators: Yesenia Ortiz and Divya Tanna

December 12, 2014

This research report will be divided into the following sections.

1. How the National Veterans’ Organization can use a model for predicting donors’

gift amounts.

2. An explanation of using descriptive statistics on the output variable, average gift

amount.

3. Graphs that portray the strength of relationships between each of the 11 variables

and the output variable, average gift amount.

4. A multiple linear regression model that includes only selected input variables and

their relationships with average gift amount.

5. Recommendations for the multiple regression model.

1. Introduction:

A model can be used in several ways to predict donors’ average gift amounts to the

National Veterans’ Organization. Average gift amount refers to how much individuals donate on

average to the organization. Lawrence Henze stated four advantages for an organization to

conduct a model, which are outlined next. First, a model enables the organization to analyze

everyone in its database and select individuals who are more likely to donate based on their

profiles. Second, a model helps the organization better understand which potential donors are

likely to donate the most. Therefore, a model allows the organization to select a marketing

technique that would effectively target prospective donors. In addition, the model allows the

organization to save time, energy and target individuals who are most likely to donate to the

organization.

Page 2: Linear Regression Model-The National Veterans' Organization

2. Average Gift Descriptive Statistics:

Descriptive Statistics

Mean 10.9828652

Median 9.4

Mode 15

Standard Deviation 7.50158549

On average, individuals will donate $10.98 to the National Veterans’ Organization.

Furthermore, when analyzing the data, $15 shows up the most, which implies that most

individuals donate $15 to the organization.

Since the mean is higher than the median, the higher mean implies that there are outliers,

such as $30 and $50 when looking at the raw data.

Because the mean is distorted due to outliers, standard deviation is also distorted since

values are subtracted from the mean to get the standard deviation.

Therefore, analysts could not predict precisely how much each person will donate

because people could donate ±$7.50.

Page 3: Linear Regression Model-The National Veterans' Organization

BIN Frequency Cumulative %

10 1184 56.41%

20 761 92.66%

30 120 98.38%

40 18 99.24%

50 6 99.52%

60 2 99.62%

70 3 99.76%

80 2 99.86%

90 1 99.90%

100 2 100.00%

110 0 100.00%

120 0 100.00%

More 0 100.00%

The following points pertain to the histogram and table above.

There is a 56.41 percent chance that individuals will donate $10 or less on average

There is a 98.38 percent chance that individuals will donate $30 or less on average.

There is a 100 percent chance that individuals will donate $100 or less on average.

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

0

500

1000

15001

0

20

30

40

50

60

70

80

90

10

0

11

0

12

0

13

0

14

0

15

0

Mo

re

Fre

qu

en

cy

AVGGIFT

Histogram

Frequency

Cumulative %

Page 4: Linear Regression Model-The National Veterans' Organization

3. Potential Input Variables

Regardless of being a homeowner or not, individuals donate approximately the same

amount or less on average.

There is no correlation between homeowner and average gift amount (AVGGIFT).

The linear regression model will exclude the input variable, homeowner, because the

trendline is quite straight and there is no relationship between homeowner and

AVGGIFT.

y = -0.0298x + 11.005R² = 3E-06

0

20

40

60

80

100

120

0 0.2 0.4 0.6 0.8 1 1.2

AV

GG

IFT

($)

Homeowner

Relationship Between Homeowner and AVGGIFT ($)

Page 5: Linear Regression Model-The National Veterans' Organization

The R2 states that the number of children (NUMCHILD) explains only 0.0011 percent of

the variation in average gift amount (AVGGIFT). Therefore, the R2 shows a weak

correlation between the two variables.

As parents have more children, they tend to donate less on average to the organization.

The linear regression model will exclude this input variable, because a weak correlation

between the NUMCHILD and AVGGIFT exists.

y = -0.6572x + 11.693R² = 0.0011

0

20

40

60

80

100

120

0 1 2 3 4 5 6

AV

GG

IFT

($)

NUMCHILD

Relationship Between NUMCHILD and AVGGIFT ($)

Page 6: Linear Regression Model-The National Veterans' Organization

Individuals do not donate more or less as their incomes decrease or increase.

Based on the finding listed above, a weak correlation between income and average gift

amount (AVGGIFT) exists.

The linear regression model will exclude the input variable, income, because a weak

correlation between income and AVGGIFT exists

.

y = 0.5505x + 8.8438R² = 0.0145

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8

AV

GG

IFT

($)

Income

Relationship Between Income and AVGGIFT ($)

AVGGIFT

Linear (AVGGIFT)

Page 7: Linear Regression Model-The National Veterans' Organization

Regardless of gender, both genders appear to donate approximately the same amount.

Based on the finding listed above, no relationship between gender and average gift

amount (AVGGIFT) exists.

The linear regression model will exclude the input variable, gender, because gender has

no correlation with AVGGIFT and the trendline is flat.

The R2 states that wealth only explains 0.0082 percent of variation in average gift amount

(AVGGIFT) and shows no correlation between the two variables.

The linear regression model will exclude wealth as an input variable because there is no

relationship between wealth and AVGGIFT and the trendline is flat.

y = -1.1609x + 11.686R² = 0.0057

0

20

40

60

80

100

120

0 0.2 0.4 0.6 0.8 1 1.2

AV

GG

IFT

($)

Gender

Relationship Between Gender and AVGGIFT($)

AVGGIFT Linear (AVGGIFT)

y = 0.2721x + 9.2329R² = 0.0082

0

20

40

60

80

100

120

0 2 4 6 8 10

AV

GG

IFT

($)

Wealth

Relationship Between Wealth and AVGGIFT ($)

AVGGIFT

Linear (AVGGIFT)

Page 8: Linear Regression Model-The National Veterans' Organization

The graph shows that the lower a potential donor’s average home value (HV), the

potential donor is more likely to donate.

Although the R2 is higher than most other graphs, there is still a weak relationship

between HV and average gift amount (AVGGIFT).

The linear regression model will include HV as an input because the R2 is higher for this

relationship than most other relationships (from other graphs).

Also, the linear regression model will include this input variable because the trendline’s

slope is going upward.

y = 0.0013x + 9.53R² = 0.0257

0

20

40

60

80

100

120

0 1000 2000 3000 4000 5000 6000 7000

AV

GG

IFT

($)

HV ($ in hundreds)

Relationship Between HV and AVGGIFT ($)

AVGGIFT

Linear (AVGGIFT)

Page 9: Linear Regression Model-The National Veterans' Organization

As donors’ median family income increases, fewer individuals donate to the organization,

on average.

Although the R2 is higher for median family income (ICMED) and average gift amount

(AVGGIFT) compared to other graphs, there is still a weak correlation between the two

variables.

The linear regression model will include the input variable, ICMED, because the R2

(0.0187 percent) is higher for ICMED and AVGGIFT than several other input variables

with AVGGIFT.

In addition, the linear regression model will include ICMED because the trendline’s

slope is going upward.

y = 0.006x + 8.6391R² = 0.0187

0

20

40

60

80

100

120

0 500 1000 1500 2000

AV

GG

IFT

($)

ICMED ($ in hudreds)

Relationship Between ICMED and AVGGIFT ($)

AVGGIFT

Linear (AVGGIFT)

Page 10: Linear Regression Model-The National Veterans' Organization

In general, as average family income (ICAVG) increases, families tend to donate less to

the organization.

Although the R2 of 0.0155 percent is high for ICAVG and average gift amount

(AVGGIFT) compared to several other graphs, there is still a weak correlation between

the two variables.

The linear regression model includes ICAVG because the R2 is high compared to other

graphs and the slope is going upward.

y = 0.0056x + 8.561R² = 0.0155

0

20

40

60

80

100

120

0 500 1000 1500

AV

GG

IFT

($)

Average Family Income ($ in hundreds)

Relationship Between ICAVG and AVGGIFT ($)

AVGGIFT

Linear (AVGGIFT)

Page 11: Linear Regression Model-The National Veterans' Organization

As the percent earnings less than 15K in potential donors’ neighborhood (IC15) increase,

the less likely individuals will donate. In other words, as individuals earn less and less

than $15K, they are less likely to donate.

Since the R2 is low, there is a weak correlation between IC15 and average gift amount

(AVGGIFT).

Since the R2 of 0.0047 percent is approximately in the same range as most R2s from other

graphs, the linear regression model will exclude the input variable IC15.

y = -0.0427x + 11.611R² = 0.0047

0

20

40

60

80

100

120

0 20 40 60 80 100

AV

GG

IFT

($)

IC15 (%)

Relationship Between IC15 and AVGGIFT

AVGGIFT

Linear (AVGGIFT)

Page 12: Linear Regression Model-The National Veterans' Organization

In general, as individuals receive more promotions, they are less likely to donate to the

organization.

Although the R2 of 0.0277 percent is higher than most of the other R2s, there is still a

weak correlation between number of promotions (NUMPROM) and average gift amount

(AVGGIFT).

NUMPROM refers to the amount of promotions the organization gives to donors.

The linear regression model will include NUMPROM because the trendline’s slope is

going downward and the R2 is high compared to most of the R2s in other graphs.

y = -0.0556x + 13.654R² = 0.0277

0

20

40

60

80

100

120

0 50 100 150 200

AV

GG

IFT

($)

NUMPROM

Relationship Between NUMPROM and AVGGIFT

AVGGIFT

Linear (AVGGIFT)

Page 13: Linear Regression Model-The National Veterans' Organization

There is no correlation between number of months since last donation

(TOTALMONTHS) and average gift amount (AVGGIFT).

The linear regression model will exclude TOTALMONTHS as an input variable because

there is no correlation between TOTALMONTHS and AVGGIFT. In other words, the

linear regression model will exclude TOTALMONTHS because the trendline is straight.

4. Multiple Linear Regression Model

Regression Model

Imput Variable Coefficient

Intercept 12.6402

HV 0.0001

ICMED 0.0162

ICAVG -0.0158

NUMPROM -0.0475

(a) Multiple Linear Regression Model

Average Gift Amount= 12.6402 *NOTE: The values for HV, ICMED,

+ (0.0001*HV) ICAVG AND NUMPROM are the

+ (0.0162*ICMED) medians of these coefficients.

+ (-0.0158*ICAVG)

+(-0.0475*NUMPROM)

Average Gift Amount=12.6402 +(0.0001*822) +(0.0162*357) +(-0.0158*398) +(-0.0475*46)

y = 0.0897x + 8.1683R² = 0.0023

0

20

40

60

80

100

120

0 5 10 15 20 25 30 35 40

AV

GG

IFT

($)

TOTALMONTHS

Relationship Between TOTALMONTHS and AVGGIFT

AVGGIFT

Linear (AVGGIFT)

Page 14: Linear Regression Model-The National Veterans' Organization

Average Gift Amount=12.6402 +0.0822 +5.7834 -6.2884 -2.185 Average Gift Amount= $10.03

(b) Summary of Training Sample Statistics

R2 0.0483

The R2 of 0.0483 percent implies a weak correlation between the selected input

variables and output variable, average gift amount (AVGGIFT).

RMS Error 7.325

The training RMS Error is .01 higher than the standard deviation from the multiple regression model (7.3399).

Standard Deviation

Descriptive Statistics (Without Model) 7.501585

With Multiple Regression Model 7.3399 Since the standard deviation from the multiple linear regression model is lower than the

standard deviation from the descriptive statistics, the linear regression model would make

a more precise prediction on how much individuals would donate on average.

Basically, the lower the standard deviation, the more precise a prediction is made on the

output variable.

(c) Description of How Well the Model Performs on New Data.

Column1 RMS Error

Training Data 7.325

Validation Data 7.257

Validation Data-Training Data

7.257-7.325= -0.068

Regardless of the difference being a negative or positive, the difference between the

training RMS error and validation RMS error is quite close.

Since the difference between the training RMS error and validation RMS error is quite

close, the multiple linear regression model appears to perform well on new data.

5. Recommendations:

In order to improve the linear regression model, analysts should consider interest group

(association with veterans or individuals currently serving the army, air force, etc.) and

church attendance as additional input variables.

According to Burks (2014), the analysts should add interest group as an additional input

variable. Individuals who know a veteran or individual currently serving the army, air

force, etc. may empathize with the organization’s cause and may donate more to the

organization than individuals who do not know a veteran or person currently serving in

Page 15: Linear Regression Model-The National Veterans' Organization

the army, air force, etc. In addition, the organization can show “empathy in their

marketing messages, these charitable organizations can reach more people who want to

offer monetary aid to those causes” (Burks 2014).

Analysts could gain insight and confirm on whether individuals who know a veteran or

an individual currently serving for the army, air force, etc., are more likely to donate to

the National Veterans’ Organization or not.

Furthermore, analysts should consider adding church attendance as an additional input

variable because “giving in a religious setting occurs on a regular basis, the habit of

giving can develop quite readily” (DiDonato 2012). Basically, individuals who attend

church regularly tend to donate more on average.

Therefore, analysts could gain insight on whether regular churchgoers tend to donate

more on average or not

Page 16: Linear Regression Model-The National Veterans' Organization

References

Burks, Robin (2014, July 29). A new study reveals why some people donate to charity more than

others. Retrieved from http://www.techtimes.com/articles/11536/20140729/a-new-study-

reveals-why-some-people-donate-to-charity-more-than-others.htm#ixzz3LhgDU8zx

DiDonato, Nicholas C. (2012, September 30). Not conservatives, but religious people, more

charitable. Retrieved from

http://www.patheos.com/blogs/scienceonreligion/2012/09/not-conservatives-but-

religious-people-more-charitable/

Henze, Lawrence. Using Statistical Modeling to Increase Donations. Retrieved from

https://www.blackbaud.com/files/resources/downloads/WhitePaper_TargetAnalytics_Stat

isticalModeling.pdf