Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
IMPROVING CUSTOMER PROSPECTING IN WEALTH MANAGEMENTJohn VideEngagement Lead, Big DataDell EMC [email protected]
Wei LinSr. Manager & Chief Data Scientist - Big DataDell [email protected]
Fernanda Campello de SouzaSr. Data ScientistDell [email protected]
Mauro DamoSr. Data ScientistDell [email protected]
Martin CostaAdvisory ConsultantDell [email protected]
2016 EMC Proven Professional Knowledge Sharing 2
Table of Contents
Objective ....................................................................................................................................................... 3
Approach ....................................................................................................................................................... 4
Data Sources ................................................................................................................................................. 6
American Community Survey ................................................................................................................... 6
Net Worth Distribution ............................................................................................................................. 6
Internal Revenue Service .......................................................................................................................... 9
CDC .......................................................................................................................................................... 10
Yahoo Finance ......................................................................................................................................... 12
Data Generation .......................................................................................................................................... 15
Step 1: Select from census microdata .................................................................................................... 15
Step 2: Generate risk capital ................................................................................................................... 15
Step 3: Initial marginal annual savings generation ................................................................................. 17
Step 4: Initial investment goals generation ............................................................................................ 18
Step 5: Initial risk profile generation....................................................................................................... 20
Step 6: Life events generation for 10 years ............................................................................................ 22
Step 7: Investment decisions/outcomes for 10 years ............................................................................ 23
Step 8: Lifetime value generation ........................................................................................................... 25
Predictive Models ....................................................................................................................................... 27
Return On Investment ................................................................................................................................ 29
Conclusions ................................................................................................................................................. 30
References .................................................................................................................................................. 31
Appendix: Solution Architecture ................................................................................................................. 32
Disclaimer: The views, processes or methodologies published in this article are those of the authors.
They do not necessarily reflect Dell EMC’s views, processes or methodologies.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.
2016 EMC Proven Professional Knowledge Sharing 3
Objective
A successful wealth management practice requires prospecting for and attracting new clients. This process can be substantially improved by employing a Big Data approach to identify potential high value prospects for the wealth manager to target.
To drive a successful practice for themselves and their firms, wealth managers need to identify
potential new client prospects and seek to bring them in as new customers. In today’s wealth
management marketplace, the process of identifying potential new customers has not changed
much over the decades: personal relationships, client referrals, some client demographic data
in a sales support system, and perhaps a spreadsheet on the financial advisor’s laptop. A more
quantitative and data-driven Big Data approach can result in a more efficient, effective, and
ultimately more profitable way to identify and track the most highly value-added prospects to a
financial advisor’s book of business.
This Customer Prospecting model provides the analytic tools to consume prospect data,
investment returns, risk preferences, and other variables to drive the calculation of the relative
value of individual prospects. That value can be used to prioritize marketing and other outreach
efforts in soliciting the prospect’s business.
To identify this value, we predict a lifetime value (LTV), or profitability, of potential wealth
management clients to inform financial advisor’s (FA) management of prospect list.
Target Outcome: Improved efficiency and accuracy of client prospecting
• Quantitative basis for supporting FA’s new client pipeline
• Reduced overall client acquisition costs
This provides increased FA focus on developing the most potentially profitable relationships.
2016 EMC Proven Professional Knowledge Sharing 4
Approach
A wealth management firm’s client database typically contains detailed historical information
on a client’s demographics as well as past investment decisions and outcomes (both to the
client and to the firm). This information can be leveraged to understand what type of client
tends to bring more value to the firm over a certain period of time. Historical investment
information for multiple clients over a long period of time (i.e. 10 years), allows not only for an
analysis of immediate profit opportunities clients may offer, but also for projecting what their
lifetime value to the firm would be, taking into account the natural evolution of one’s income
and financial responsibilities. In general, this approach correlates a client’s demographic and
financial characteristics at a given point in time with the cumulative profit they brought to the
firm over the subsequent period of 10 years, referred to as Lifetime Value (LTV).
Since there was no actual wealth management data available to be used in this analysis, a client
dataset was simulated, based on a mixture of related real-world data (census microdata from
2013) and reasonable assumptions around life events and risk/investment behavior. The
simulated clients can have a combination of three types of investment goals, based on
demographic information: home, retirement, and education (college). Each goal and
corresponding value is determined based on a number of parameters such as income, age,
children, children’s age, education, and more.
They can also make investment decisions based on one of five types of risk profile:
conservative, moderately conservative, moderate, moderately aggressive, and aggressive. Real
world returns based on Vanguard ETFs mimic three major asset classes that comprise the asset
allocation: Equities, Fixed Income, and Money Market.
The simulated dataset was then used as input for building predictive models of LTV, with
clients’ demographic and financial characteristics as predictors. By establishing and calculating
key investment goals based on a prospect’s individual parameters, and extrapolating these
values over time while including likely life changing events, a prioritized prospect list can enable
wealth managers to more effectively target high value prospects and provide insight into their
likely investment goals. Revenues and costs to the firm are represented by relative Assets under
Management (AUM) fees that support a rational profit margin.
The diagram in Figure 1 gives an overview of our approach.
2016 EMC Proven Professional Knowledge Sharing 5
Figure 1 Solution overview
The data generation process and predictive models utilized are discussed in detail in
subsequent sections. The Data Sources section details the data sources used. The Data
Generation section discusses how data sources and assumptions about life events and
risk/investment behavior are combined to produce the dataset for predictive analysis. The
Predictive Models section discusses the predictive models examined and describes the final
model selected.
2016 EMC Proven Professional Knowledge Sharing 6
Data Sources
This project used multiple data sources to build a big picture of financial information of US
citizens. The information relates to amount of assets, taxes paid, total compensation, margin,
and profitability of Americans. This information will be the basis to simulate how much of their
disposable assets each customer will have available to invest in our wealth management firm.
Based on this, we have four data sources:
American Community Survey – U.S. Census performs this survey every year and it
extracts a profile about the people and the housing in United States and Puerto Rico.
Internal Revenue Service – This data provides the relevant tax rates that US Citizens
pay.
Centers for Disease Control and Prevention (CDC) – This information concerns the life
expectancy in years for the American people
Assets Info for Yahoo Finance – Market data of asset prices
These data sources provide what we need to create our data simulation that we will describe in
the data generation section.
American Community Survey
Net Worth Distribution
To calculate our Net Worth distribution, we used two tables from Census data.
The first table, Net Worth Distribution – Means, had two critical pieces of information: Median
Net Worth by quintile and Monthly Household Income by quintile. The second table was
Standard Error of Net Worth distribution. It provides the variability of the Net Worth based on
the Household Income. Using these two tables, it was possible to estimate the Net Worth of US
citizens in the ACS survey.
2016 EMC Proven Professional Knowledge Sharing 7
Table 1 Net Worth Distribution - Means
Median
Net Worth
Mean Net
Worth
Median
Net Worth
Mean Net
Worth
Median
Net Worth
Mean Net
Worth
Median
Net Worth
Mean Net
Worth
Median
Net Worth
Mean Net
Worth
HOUSEHOLDS
Total 118,689 -6,029 -32,066 7,263 9,979 68,839 71,708 205,985 214,470 630,754 1,430,907
Monthly household income
Lowest quintile 23,724 0 -17,266 464 483 4,825 5,650 54,000 56,685 242,200 448,817
Second quintile 23,748 -7,086 -23,902 2,413 3,116 24,288 27,340 110,574 114,744 337,795 692,625
Third quintile 23,733 -10,668 -32,505 9,613 11,287 58,253 60,098 154,325 161,749 447,555 994,767
Fourth quintile 23,745 -8,661 -36,716 34,732 36,404 113,620 116,514 266,776 275,249 684,326 1,101,948
Highest quintile 23,738 8,073 -21,576 127,339 128,006 292,796 294,391 565,936 579,188 1,309,350 3,509,628
Household net worth
$1 to $4,999 10,815 120 198 1,000 944 1,692 1,669 2,580 2,617 4,049 4,079
$5,000 to $9,999 5,678 5,468 5,477 6,539 6,547 7,113 7,143 7,923 7,914 9,163 9,203
$10,000 to $24,999 7,793 11,000 10,989 13,526 13,465 16,040 16,045 19,397 19,445 23,000 22,993
$25,000 to $49,999 8,229 27,109 27,036 31,202 31,409 36,524 36,591 42,000 41,989 47,442 47,420
$50,000 to $99,999 12,369 54,047 53,995 62,226 62,548 72,000 72,109 83,114 82,923 94,036 93,989
$100,000 to $249,999 21,293 109,668 109,929 134,013 133,868 159,226 159,717 190,965 190,466 227,526 227,793
$250,000 to $499,999 15,010 265,424 266,076 301,723 302,346 342,000 343,745 398,944 398,377 460,876 462,976
$500,000 and over 16,021 542,196 543,523 660,489 663,196 836,340 846,608 1,187,172 1,212,343 2,302,769 6,347,066
NOTE: Excludes group quarters. Individual outliers that highly influenced the mean value for categories were excluded. Federal surveys now give
respondents the option of reporting more than one race. There are two basic ways of defining a race group. A group such as Black may be defined as those
who reported Black and no other race (the race-alone or single-race concept) or as those who reported Black regardless of whether they also reported
another race (the race alone-or-in-combination concept). This table shows data using the first approach (race-alone). The use of the single race population
does not imply that it is the preferred method of presenting or analyzing data. The U.S. Census Bureau uses a variety of approaches. Because Hispanics
may be any race, data in this table for Hispanics overlap slightly with data for the Black population. Data for American Indians and Alaska Natives are not
shown because of their small sample size. The race or Hispanic origin of the householder designates the race or Hispanic origin of the household. The
estimates in this table are based on responses from a sample of the population and may differ from the actual values because of sampling variability and
other factors. As a result, apparent differences between the estimates for two or more groups may not be statistically significant. For information on
sampling and nonsampling error see: http://www.census.gov/programs-surveys/sipp/methodology/sampling.html
Source: U.S. Census Bureau, Survey of Income and Program Participation, 2008 Panel, Wave 10
Internet Release Date: 8/21/2014
Footnotes:
CharacteristicNumber of
Households
(In thousands)
Net Worth Quintiles
Lowest Quintile Second Quintile Third Quintile Fourth Quintile Highest Quintile
NOTE: Median net worth statistics within quintiles of the net worth distribution are at the 10th, 30th, 50th, 70th, and 90th percentiles. Net worth quintiles may
be of different sizes due to values that overlap quintile breaks. Due to weighting, the median of the third quintile may not match the median of the entire
distribution of net worth. The bottom net worth quintile includes households with zero or negative net worth. Net worth may be zero or negative when a
household’s gross wealth is zero or because the value of a household’s liabilities exceeds the value of its assets. The "Number of Households" column
reflects the number of households for each characteristic. More information on methodology can be found in the Methodology section of 'Distribution of
Household Wealth in the U.S.: 2000 to 2011.pdf' at http://www.census.gov/people/wealth/ .
2016 EMC Proven Professional Knowledge Sharing 8
We transformed the Median and Standard Error Net worth tables in one table with the Lower
Percentile (percentileLL) and Higher Percentile (percentileUL) on monthly Household Income
and for each of Quintile of Net Worth with the information that will be used in the Data
Generation phase.
Table 2 Net Worth Distribution - Mean and Standard Deviation
Standard
Error for
Median
Net Worth
Standard
Error for
Mean Net
Worth
Standard
Error for
Median
Net Worth
Standard
Error for
Mean Net
Worth
Standard
Error for
Median
Net Worth
Standard
Error for
Mean Net
Worth
Standard
Error for
Median
Net Worth
Standard
Error for
Mean Net
Worth
Standard
Error for
Median
Net Worth
Standard
Error for
Mean Net
Worth
HOUSEHOLDS
Total 118,689 403 1,371 106 115 665 351 1,475 910 7,033 155,575
Monthly household income
Lowest quintile 23,724 95 1,598 51 18 216 115 1,712 837 7,248 35,789
Second quintile 23,748 495 1,402 106 75 662 471 1,785 949 7,474 101,681
Third quintile 23,733 943 3,428 275 210 939 690 1,816 1,163 8,266 285,960
Fourth quintile 23,745 1,131 2,565 905 570 1,721 935 3,323 1,980 15,940 93,628
Highest quintile 23,738 1,403 4,914 2,150 1,158 3,280 2,029 6,245 3,296 24,032 714,044
Household net worth
$1 to $4,999 10,815 21 8 1 9 23 12 19 16 31 23
$5,000 to $9,999 5,678 42 22 34 20 0 4 44 22 27 24
$10,000 to $24,999 7,793 66 39 73 38 61 45 98 52 98 64
$25,000 to $49,999 8,229 135 67 135 77 116 86 134 87 134 74
$50,000 to $99,999 12,369 190 102 150 106 263 142 282 127 233 145
$100,000 to $249,999 21,293 355 192 462 239 400 259 662 342 681 347
$250,000 to $499,999 15,010 506 334 800 404 1,113 480 906 598 1,269 784
$500,000 and over 16,021 2,145 949 2,264 1,348 5,127 2,574 11,110 5,688 45,938 1,097,883
Characteristic
Number of
Households
(In thousands)
Net Worth Quintiles
Lowest Quintile Second Quintile Third Quintile Fourth Quintile Highest Quintile
Table 6. Standard Errors for Distribution of Net Worth, By Net Worth Quintiles and Selected Characteristics: 2011
Internet Release Date: 8/21/2014
Source: U.S. Census Bureau, Survey of Income and Program Participation, 2008 Panel, Wave 10
Footnotes:
NOTE: Standard errors have been calculated using replicate weights. Standard error of 0 indicates that the width of the confidence interval for this estimate
rounds to zero. Median net worth statistics within quintiles of the net worth distribution are at the 10th, 30th, 50th, 70th, and 90th percentiles. Net worth
quintiles may be of different sizes due to values that overlap quintile breaks. Due to weighting, the median of the third quintile may not match the median of
the entire distribution of net worth. The bottom net worth quintile includes households with zero or negative net worth. Net worth may be zero or negative
when a household’s gross wealth is zero or because the value of a household’s liabilities exceeds the value of its assets. The "Number of Households"
column reflects the number of households for each characteristic. More information on methodology can be found in the Methodology section of 'Distribution
of Household Wealth in the U.S.: 2000 to 2011.pdf' at http://www.census.gov/people/wealth/ .
NOTE: Excludes group quarters. Individual outliers that highly influenced the mean value for categories were excluded. Federal surveys now give
respondents the option of reporting more than one race. There are two basic ways of defining a race group. A group such as Black may be defined as those
who reported Black and no other race (the race-alone or single-race concept) or as those who reported Black regardless of whether they also reported
another race (the race alone-or-in-combination concept). This table shows data using the first approach (race-alone). The use of the single race population
does not imply that it is the preferred method of presenting or analyzing data. The U.S. Census Bureau uses a variety of approaches. Because Hispanics
may be any race, data in this table for Hispanics overlap slightly with data for the Black population. Data for American Indians and Alaska Natives are not
shown because of their small sample size. The race or Hispanic origin of the householder designates the race or Hispanic origin of the household. The
estimates in this table are based on responses from a sample of the population and may differ from the actual values because of sampling variability and
other factors. As a result, apparent differences between the estimates for two or more groups may not be statistically significant. For information on
sampling and nonsampling error see: http://www.census.gov/programs-surveys/sipp/methodology/sampling.html
percentileLL percentileUL q1m q2m q3m q4m q5m q1se q2se q3se q4se q5se limit_dollars
0.0% 20.0% (17,266) 483 5,650 56,685 448,817 1,598 18 115 837 35,789 27,218
20.0% 40.0% (23,902) 3,116 27,340 114,744 692,625 1,402 75 471 949 101,681 48,502
40.0% 60.0% (32,505) 11,287 60,098 161,749 994,767 3,428 210 690 1,163 285,960 75,000
60.0% 80.0% (36,716) 36,404 116,514 275,249 1,101,948 2,565 570 935 1,980 93,628 115,866
80.0% 100.0% (21,576) 128,006 294,391 579,188 3,509,628 4,914 1,158 2,029 3,296 714,044
2016 EMC Proven Professional Knowledge Sharing 9
Internal Revenue Service
From IRS, we used the federal tax rates to calculate how much federal tax is owed based on
demographic information like marital status. Depending on marital status and total income, a
US citizen owes different percentage of federal tax. This is commonly referred to as tax
brackets. This analysis did not consider any deductions as part of the tax owed. We used the
2013 brackets table to calculate the federal taxes owed.
This analysis also ignored any state or local taxes potentially incurred. In actuality, this will
further increase the overall tax burden on most prospects because most states or localities
have some form of income, sales, or other taxes.
Table 3 Married Taxpayers Filing Jointly
Table 4 Head of Household
2016 EMC Proven Professional Knowledge Sharing 10
Table 5 Individual Taxpayers
Table 6 Married Taxpayers Filing Separate
Based on the Married and Single brackets and marital designations of ACS, we use these two
tables to calculate the federal taxes for both married and single prospects.
To improve the accuracy of this information, we should also calculate the state and city taxes
for each US Citizen based on where he/she lives.
CDC
The CDC life expectation table contains life expectation information about US citizens at specific
ages. This information will be important to calculate an individual’s retirement goal and the life
expectation is important to calculate with accuracy the total amount of savings a US citizen will
have available to use in the retirement phase of their lives.
2016 EMC Proven Professional Knowledge Sharing 11
Figure 2 Life expectation by age and gender
For example, in Figure 2, if a child is born today and her gender is female, her life expectancy
will be more than 80 years. A man who is 76 years old today can expect to live more than 10
more years.
Table 7 Financial Obligstions as % of Disposable Income
Total Mortgage Consumer
1990 12.03 17.46 24.85 15.57 10.32 5.24
1995 11.67 17.10 26.67 14.80 9.29 5.50
2000 12.59 17.66 30.44 15.13 8.83 6.30
2002 13.24 18.19 28.92 16.04 9.40 6.64
2003 13.21 17.91 26.59 16.19 9.57 6.62
2004 13.31 17.93 25.41 16.46 9.91 6.55
2005 13.77 18.46 25.19 17.12 10.57 6.55
2006 13.87 18.65 25.38 17.33 11.07 6.26
2007 13.89 18.76 25.02 17.48 11.25 6.24
2008 13.51 18.43 25.24 17.05 11.00 6.04
2009 12.67 17.63 24.76 16.16 10.69 5.47
2010 11.75 16.64 23.88 15.13 10.15 4.98
Homeowner
[As of end of year, seasonally adjusted.}
Table 1175. Household Debt-Service Payments and Financial Obligations as a Percentage of
Disposable Personal Income
Internet release date: 9/30/2011
For more information: http://www.federalreserve.gov/rnd.htm
Source: Board of Governors of the Federal Reserve System, "Household Debt Service and
Financial Obligations Ratios";
<http://www.federalreserve.gov/releases/housedebt/default.htm>.
YearHousehold debt
service ratio
Financial obligations ratio
Total Renter
2016 EMC Proven Professional Knowledge Sharing 12
Yahoo Finance
We used this data to simulate the performance of wealth management and the volatility of the
market. We split the portfolio into three asset classes: Money Market, Fixed Income (Bonds),
and Equity (Stocks). Based on this approach, we chose benchmarks for each of these asset
classes. They are the Vanguard Short-Term Treasury Investment (VFISX) that represents Money
Market returns and volatility, Vanguard Total Bond Market Index Investment (VBMFX) that
represents Fixed Income returns and volatility, and Vanguard Total Stock Market Index
Investment (VTSMX) that represents Equity return and volatility.
Each of these asset classes has its return calculated based on each quarter and the return
calculation is measured using this formula:
𝑥𝑖 = (𝑝𝑖
𝑝𝑖−1 − 1) x 100
𝑥𝑖 -> Is the percentage of return of the index at the end of the quarter i
𝑝𝑖 -> Is the close price of the index at the end of the quarter i
𝑝𝑖−1 -> is the close price of the index in the last trade day of the quarter before of quarter i - 1
The volatility of each asset class is calculated based on the standard deviation of each return on
the 5 years period of time. The formula to calculate volatility is:
𝑥𝑖 -> return of a specific quarter
µ -> average return of the index
𝑛 -> number of quarters
The following charts show the closing price of each of the three asset classes over the last 11
years: VFISK for Money Market (Cash), VBMFX for Fixed Income (Bonds), and VTSMX for Equity
(Stocks).
2016 EMC Proven Professional Knowledge Sharing 13
2016 EMC Proven Professional Knowledge Sharing 14
Table 8 includes average quarterly return and volatility for all three asset classes:
Table 8 Average quarterly return and volatility of assets
Using this table it will be possible to calculate the total amount of money invested with the
wealth management firm in the Data Generation phase.
VFISK VBMFX VTSMX
Return 0.67% 1.01% 2.10%
Volatility 0.86% 1.78% 7.16%
2016 EMC Proven Professional Knowledge Sharing 15
Data Generation
Our data generation process is divided into two phases: generation of initial individual financial
characteristics corresponding to a snapshot in 2005 and generation of life events and financial
decisions/outcomes for a subsequent 10-year period, until 2014. The main steps are:
1) Census data filtering: select individuals from census microdata 2) Initial risk capital generation: use summary statistics relating household income to net
worth, along with income and home ownership information from census microdata to estimate initial risk capital (assets available to invest) for each client
3) Initial marginal annual savings generation: use census microdata and tax (IRS) information to estimate the amount a client is able to save in a year from regular income, after accounting for typical expenses (initial marginal annual savings)
4) Initial investment goals generation: Use demographic and financial characteristics to generate client’s investment goals
5) Initial risk profile generation: Use demographics and financial characteristics and investment goals to generate a client’s risk profile
6) Life events generation for 10 years: Generate life events for a period of 10 years, which can update personal income, investment goals, and risk profile
7) Investment decisions/outcomes for 10 years: Generate 10 years of investment decisions and outcomes, updating client’s risk capital each quarter and risk profile every two years
8) Lifetime value (10-year profit) generation: Generate lifetime value for each client as the sum of the profit they generated to the firm in all quarters from 2005 to 2014
Step 1: Select from census microdata
We selected from census microdata only individuals with annual income above $100,000.
Step 2: Generate risk capital
To generate risk capital for an individual we first identify in which household income quintile of
Table 1 they fall. Given our dataset only contains individuals with high income relative to the
entire US population, all individuals fall in the top two household income quintiles (quintiles 4
and 5). We assume that individuals also fall in one of the top two net worth quintiles associated
with their household income level (randomly selected). We generate an individual’s net worth
following the process below. Note that since Table 1 is in 2011 dollars we converted values to
2013 dollars to match census microdata values.
2016 EMC Proven Professional Knowledge Sharing 16
We later adjust the net worth generated for each individual so as to create a positive
correlation between net worth and household income in the dataset (net worth increasing with
household income) as follows:
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑁𝑒𝑡𝑤𝑜𝑟𝑡ℎ = 𝑒0.995 log(𝐻𝑜𝑢𝑠𝑒ℎ𝑜𝑙𝑑 𝐼𝑛𝑐𝑜𝑚𝑒)+√1−0.9952log (𝑁𝑒𝑡𝑤𝑜𝑟𝑡ℎ)
Since the net worth statistics in Table 1 include home value as an asset and this value is illiquid
(not available for investment), we make our final estimate for risk capital by deducting the
home value from net worth for all home owners who do not pay mortgage. For home owners
who pay a mortgage we deduct 80% of home value from net worth (assuming that not the full
home value is illiquid in that case). Note that we eliminate from the dataset any individuals with
generated risk capital below $60,000, assuming they would be unlikely clients to be prospected
by a top wealth management firm. The final distribution for risk capital is shown in Figure 3.
Identify household income quintile
Generate networth quintile (quintile 4 or 5, randomly
selected)
Generate networth
lognormal distribution
mean : $259,668.87
standard deviation: $ 1,867.92
Generate networth
lognormal distribution
mean : $1,039,573.60
standard deviation: $ 88,328.30
Generate networth quintile (quintile 4 or 5, randomly
selected)
Generate networth
lognormal distribution
mean : $546,403.77
standard deviation: $ 3,109.43
Generate networth
lognormal distribution
mean : $3,310,969.8
standard deviation: $ 673,626.42
4
5
5
4
5
4
5
2016 EMC Proven Professional Knowledge Sharing 17
Figure 3 Risk capital distribution
Step 3: Initial marginal annual savings generation
We generate initial marginal annual savings by subtracting an estimate of an individual’s
expenses from their personal annual income according to census data (variable PINCP adjusted
to 2013 dollars). Our estimate of annual expenses includes four components:
- Taxes: based on reference values from Tables 3-6 - Leisure expenses: base value computed as a percentage of total compensation
o Single: 5% o Married, no children: 10% o Family with children under 5 years-old: 15% o Family with children between 5 and 17 years-old: 20% o Family with children between 5 and 17 years-old and under 5 years-old: 25%
Final leisure expenses are generated from a normal distribution with mean equal to
the base value and standard deviation equal to approximately half of the base value.
- Transportation expenses: base value computed as a percentage of total compensation o Vehicle owner: 25% o No vehicle: 30%
Final transportation expenses are generated from a normal distribution with mean
equal to the base value and standard deviation equal to approximately half of the
base value.
- Household expenses: sum of expenses with electricity, condo fees, gas, all mortgage payments, rent, fuel cost, fire/hazard/flood insurance, and water (from census microdata)
2016 EMC Proven Professional Knowledge Sharing 18
Individuals with generated marginal annual savings below 0 (i.e. accumulating annual debt) are
eliminated from the dataset.
Step 4: Initial investment goals generation
We focus our investment goals generation on three main types of goal: home goal, education
goal, and retirement goal. Investment goal for each client is composed of three dimensions: a
flag indicating whether the client has that particular goal, goal value (total dollar amount to
achieve), and goal years (number of years within which the client aims to achieve the goal).
Tables 9-11 summarize the demographic variables we used as drivers to generate each goal:
Table 9 Drivers of home goal
Table 10 Drivers of education goal
2016 EMC Proven Professional Knowledge Sharing 19
Table 11 Drivers of retirement goal
For each type of investment the goal flag is randomly generated with the probability of being
true depending on the drivers listed in Tables 9-11. The home goal value is generated based on
a linear regression model fit on individuals in the census dataset who are home owners. The
linear regression predicts home value taking as input personal income, number of vehicles,
employment, state of residence, number and age of children, marital status, number of times
married, gender, and age (as listed in Table 9). The number of years to home goal is generated
as the number of years it would take the individual to accumulate the home value using only
their marginal annual income. The education goal value is generated based on the number of
children and household income, with average college expenses per child being higher for
wealthier families. The number of years to education goal is generated based on the number
and age of children. Families with children aged 6-17 years will have fewer years to education
goal than families with children under 6 years old. The retirement goal value is generated by
multiplying current annual expenses by the estimated life years past retirement (assuming
retirement at age 65 and life expectancy by age and gender as in Figure 2). The number of years
to retirement goal is generated as the number of years it would take the individual to
accumulate the retirement value using only their marginal annual income. Figure 4 shows the
distribution of generated goal values for all clients.
2016 EMC Proven Professional Knowledge Sharing 20
Figure 4 Distribution of investment goal values
Step 5: Initial risk profile generation
We measure risk profile in a 5-level scale: conservative, moderately conservative, moderate,
moderately aggressive, and aggressive. In order to generate individual risk profiles we first
generate a risk score for each client using a set of nine drivers. Each driver is given a score of -1,
0, or +1, as detailed in Table 12. The sum of all driver scores for a client can range from -9 to +8,
spanning a 17-point interval. The risk score for a client is then computed as follows, dividing the
sum of scores for all drivers by 17:
𝑅𝑖𝑠𝑘 𝑆𝑐𝑜𝑟𝑒 = ∑ (𝑠𝑐𝑜𝑟𝑒 𝑓𝑜𝑟 𝑑𝑟𝑖𝑣𝑒𝑟 𝑖)9
𝑖=1
17
Table 12 Drivers of risk score
Driver Index
Driver Reason Driver score
1 Gap between the number of years remaining until their latest goal deadline and the number of years necessary for them to achieve the goal through accumulation of annual savings (no investment)
Individuals who aim to accumulate a large amount in a short period of time (compared with their savings capabilities) would tend to be more aggressive investors
Individuals are divided into 3 bins of equal size (number of individuals in each bin is the same) according to their gap. The driver score is then assigned: - lower range: -1 - middle range: 0 - upper range: +1
2016 EMC Proven Professional Knowledge Sharing 21
2 Risk capital Wealthy individuals tend to be more aggressive investors
Individuals are divided into 3 bins of equal size (number of individuals in each bin is the same) according to their risk capital. The driver score is then assigned: - lower range: -1 - middle range: 0 - upper range: +1
3 Income Wealthy individuals tend to be more aggressive investors
Individuals are divided into 3 bins of equal size (number of individuals in each bin is the same) according to their household income. The driver score is then assigned: - lower range: -1 - middle range: 0 - upper range: +1
4 Educational attainment
Highly educated people tend to be more aggressive investors [7]
- No college degree: -1 - College degree: 0 - Graduate degree: +1
5 Age People in the 21-59 age range tend to be more aggressive investors [7]
- Aged 21-59: 0 - Not aged 21-59: -1
6 Marital status Married individuals tend to invest more aggressively than unmarried individuals [7]
- Married: +1 - Not married: -1
7 Employment status Unemployed individuals tend to be more risk averse, while employers (entrepreneurs) tend to be more risk prone [7]
- Unemployed: -1 - Employed: 0 - Employer: +1
8 Education goal not yet achieved
Individuals are less likely to risk compromising education funds by investing aggressively
Goal not yet achieved: -1 No goal to achieve: +1
9 Home goal not yet achieved
Individuals are less likely to risk compromising home purchasing funds by investing aggressively
Goal not yet achieved: -1 No goal to achieve: +1
Figure 5 shows the distribution of risk propensity scores for all clients.
2016 EMC Proven Professional Knowledge Sharing 22
Figure 5 Distribution of risk score
We then place individuals into one of the 5 risk profiles according to their risk score, with the
lowest scores mapping into a conservative risk profile and highest scores mapping into an
aggressive risk profile. The thresholds we used are shown in Table 13.
Table 13 Risk profile distribution
Risk Score Risk Profile % of Clients
< -0.05 Conservative 18.58%
-0.05 to 0.1 Moderately Conservative 24.42%
0.1 to 0.2 Moderate 26.35%
0.2 to 0.3 Moderately Aggressive 22.16%
> 0.3 Aggressive 8.48%
Step 6: Life events generation for 10 years
After generating the full initial profiles for all clients in 2005 (Steps 1-5), we simulate each
client’s demographic evolution through time for a 10-year period, with possible implications to
annual income and financial goals. The assumptions are detailed in Table 14.
Table 14 Assumptions for life events generation
Variable Rule
Age Increase by 1 each year
Goal Years Decrease by 1 each year
2016 EMC Proven Professional Knowledge Sharing 23
(Education, Retirement, Home)
Years to Retirement Decrease by 1 each year
Marital Status For clients who are single, change status to married in any given year with the following probabilities: Aged < 20: 15% Aged 20-25: 30% Aged 25-30: 20% Aged 30-35: 5% Aged 35-40: 5%
Number of Children For clients who are married, increase the number of children in any given year with the following probabilities: Aged < 20: 15% Aged 20-25: 30% Aged 25-30: 20% Aged 30-35: 5% Aged 35-40: 5%
Educational Attainment Increase educational attainment in any given year by one level with the following probabilities: Aged 21-30 without college degree: 80% Aged 21-30 with college degree: 20% Aged 28-55 with graduate degree: 5%
Personal Income In any given year increase personal income as follows: Aged <= 55: 4% increase Aged >= 55: 4% decrease
Education Goal For clients who increased the number of children, generate goal dimensions (flag, value, and years) as described in Step 3
Step 7: Investment decisions/outcomes for 10 years
We simulate a client’s investment decisions based on two factors: their risk capital (dollar
amount available to invest) and their risk profile (determines how their investment dollars are
allocated among different asset types). Table 15 details our assumptions on how clients with
the different risk profiles allocate their risk capital among three types of assets: Money Market,
Bonds, and Stock.
2016 EMC Proven Professional Knowledge Sharing 24
Table 15 Asset allocation by risk profile
At the beginning of each quarter a client is assumed to allocate their full risk capital according
to Table 15. The revenue and cost a client generates to the wealth management firm in each
quarter is then computed based on their asset allocations according to the fees and costs
assumptions in Tables 16-17.
Table 16 Management fees assumptions
Management Fees (revenue to the firm)
Assets Range Money
Market
Bonds Stocks
< $500K 1.50% 2.00% 3.00%
$500K-$1,000K 1.00% 1.50% 2.50%
$1,000-$2,000K 0.75% 1.00% 2.00%
> $2,000K 0.50% 0.75% 1.50%
2016 EMC Proven Professional Knowledge Sharing 25
Table 17 Cost assumptions
Cost (to the firm) of Managing Assets
Assets Range Money Market Bonds Stocks
< $500K 1.000% 1.250% 1.750%
$500K-$1,000K 0.750% 1.000% 1.500%
$1,000-$2,000K 0.500% 0.750% 1.250%
> $2,000K 0.250% 0.500% 1.000%
At the end of each quarter a client’s risk capital is updated as follows:
𝑅𝑖𝑠𝑘 𝐶𝑎𝑝𝑖𝑡𝑎𝑙 𝑎𝑡 𝑄𝑢𝑎𝑟𝑡𝑒𝑟 𝐸𝑛𝑑
= 𝑅𝑖𝑠𝑘 𝐶𝑎𝑝𝑖𝑡𝑎𝑙 𝑎𝑡 𝑄𝑢𝑎𝑟𝑡𝑒𝑟 𝑆𝑡𝑎𝑟𝑡 + 𝑅𝑒𝑡𝑢𝑟𝑛 𝑜𝑛 𝐼𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡
− 𝑀𝑎𝑛𝑎𝑔𝑒𝑚𝑒𝑛𝑡 𝐹𝑒𝑒𝑠 + (𝑀𝑎𝑟𝑔𝑖𝑛𝑎𝑙 𝐴𝑛𝑛𝑢𝑎𝑙 𝑆𝑎𝑣𝑖𝑛𝑔𝑠)/4,
where the return on investment for each asset is simulated from a normal distribution based on
mean historical quarterly returns and volatilities (reference values in Table 8):
- Money Market: Vanguard Short-Term Treasury Inv (VFISX) - Bonds: Vanguard Total Bond Market Index (VBMFX) - Stock: Vanguard Total Stock Market Index (VTSMX)
Note that the risk profile for each client is recalculated every two years to account for changes
in behavior caused by updated investment goals and risk capital.
Step 8: Lifetime value generation
The lifetime value (10-year profit) for each client is computed by adding the profit they brought
to the firm in all quarters between 2005 and 2014. The final distribution for 10-year profit is
shown in Figure 6.
2016 EMC Proven Professional Knowledge Sharing 26
Figure 6 Distribution of lifetime value (10-year profit)
2016 EMC Proven Professional Knowledge Sharing 27
Predictive Models
To select our final model for predicting Life Time Value, we evaluated the predictive
performance of Decision Tree, Random Forest, and Linear Regression models (with different
sets of predictor variables) using a sample of unseen data containing 43,356 prospective clients.
The model with best performance was a Linear Regression model that takes as input sex, age,
presence of home purchasing goal, presence of retirement saving goal, number of children,
marital status, risk capital (initial amount of assets to invest with the firm), and risk profile. The
Linear Regression predictive model estimates the LTV as the profit a prospective client will bring
to the firm over 10 years. When tested on the unseen data, our model predicts LTV with an
average absolute percent error of 10%.
In our testing sample of prospective clients, the top 10% bring on average significantly more
profit to the firm than the bottom 90%. We compared 2 different methods for selecting the top
10% prospects; ordering by 10 year profit estimate and ordering by initial risk capital. Even
though initial risk capital is a very strong predictor of 10-year profit (initial risk capital explains
88% of the variability in 10-year profit), our model was able to select a set of more profitable
prospects. Each of the prospects identified by our model was, on average over 10 years, $1,795
more profitable. With economies of scale across thousands of potential prospects, there is
significant potential profit improvement for financial advisors and their wealth management
firms. Tables 18-19 show some differences in characteristics between the top 4,000 prospects
selected using our model and the bottom 39,356 prospects.
2016 EMC Proven Professional Knowledge Sharing 28
Table 18 Top 4,000 versus bottom prospects - discrete variables. All differences are significant at the 0.1% level.
Top 4,000 Prospects Bottom 39,356 Prospects
With Bachelor’s degree or above
88.6% 79.2%
Married 98.1% 95.2%
With children under 17 52.4% 44.1%
Employed 26.2% 13.6%
Unemployed 2.3% 5.1%
Table 19 Top 4,000 versus bottom prospects - continuous variables. All differences are significant at the 0.1% level.
Top 4,000 Prospects Bottom 39,356 Prospects
Mean St. Dev. Mean St. Dev.
10-year Profit 212,872.60 47,071.89 87,678.00 34,022.82
Household Income 557,167.30 177,506.30 217,830.20 100,082.10
Property Value 614,896.00 484,706.80 413,265.50 301,548.10
Risk Capital 1,646,908.70 558,658.70 492,573.80 289,894.50
Age 50.99 10.40 52.37 11.56
Number of Children 1.03 1.17 0.83 1.10
2016 EMC Proven Professional Knowledge Sharing 29
Return On Investment
Our ROI model includes an average number of accounts per FA, an expected account churn
percentage (historically, this is largely driven by market performance), and a net new addition
of a number of prospects per year per 1000 financial advisors. Accounting for startup costs of
hardware and software to run this analysis as well as resource costs to build and support the
model, we generated a strongly positive ROI and IRR over a 5 year period.
Number of clients varies according to firm size, investment focus, and other factors. Since firm
sizes also vary greatly, we calculated the ROI based on 1000 advisors. Bigger firms will realize
greater economies of scale.
Customer churn varies generally between 1-10% largely driven by market returns and in any
given year, the number of clients can increase or decrease due to market return, personal
circumstances, and other factors.
Our ROI Model assumes three new prospects become customers each year (1.5% is a
reasonable growth rate above customer churn). Most growth for wealth management firms is
through growth of Assets under Management which can be achieved in 2 ways: (1)
compounded annual growth in assets due to investment returns (we assume to be 5% in our
ROI Model) and (2) Net new assets acquired (especially important for more junior FAs growing
their client base).
With $1795 increase in profit over 10 years for each identified prospect from this model, that is
an average of about $180 per client per year.
With our ROI Model predicting about $180 more profit per client per year and a net new three
clients per year per FA, for every 1000 FAs we can expect an increase of profit to the firm of
$540,000 annually.
2016 EMC Proven Professional Knowledge Sharing 30
Conclusions
We conclude that this model can provide a useful and quantitative approach to more efficiently
identify prospects for FAs to target. Utilizing the concept of lifetime value (LTV) and applying it
through our model, we can help the FA target a more profitable set of prospects and focus
energies to attract them as clients. This model helps to achieve more profitable client
relationships through more efficient use of FA time marketing and selling to prospects, and
reduced overall client acquisition costs through this more efficient targeting.
Different sized wealth management firms have different profit margins, average number of
customers per FA, revenue per FA, etc. But this model can be utilized by firms of all sizes and
for both senior FAs prospecting for more lucrative clients as well as junior FAs looking to grow
their book of business through more clients.
This model can be further refined and made much more effective with a wealth management
firm’s actual client and prospect data that:
captures many more of the nuances of the assumptions and exceptions to assumptions that have been made with our simulated data
incorporates the many details of real-world relationships between market events and customer behavior
2016 EMC Proven Professional Knowledge Sharing 31
References
[1] American Community Survey – Information Guide, U.S. Census Bureau, Economics and Statistics
Administration, U.S. Department of Commerce, June 2003.
[2] IRS Announces 2013 Tax Rates, Standard Deduction Amounts And More – Forbes -
http://www.forbes.com/sites/kellyphillipserb/2013/01/15/irs-announces-2013-tax-rates-standard-
deduction-amounts-and-more/#50eaf631675b , January 2013
[3] Standard Errors for Distribution of Net Worth, By Net Worth Quintiles and Selected Characteristics:
2011. https://www.census.gov/people/wealth/. U.S. Census Bureau, Survey of Income and Program
Participation, 2008 Panel, Wave 10
[4] Table 2. Life table for males: United States, 2010, . U.S. Department of health and human services, Centers for Disease Control and Prevention , National Center for Health Statistics, National Vital Statistics Report, November 2014. [5] Table 3. Life table for females: United States, 2010. U.S. Department of health and human services,
Centers for Disease Control and Prevention , National Center for Health Statistics, National Vital
Statistics Report, November 2014.
[6] Yahoo Finance, http://finance.yahoo.com/ .
[7] A. Scorbureanu and A. Holzhausen, The Composite Index of Propensity to Risk - CIPR, Economic
Research & Corporate Development Working Paper No. 147, 2011, pp. 7.
2016 EMC Proven Professional Knowledge Sharing 32
Appendix: Solution Architecture
The solution architecture chosen was quite specific for this project. We used components that
were the best fit for use with structured data, meeting our needs of low latency and
comparatively small data volumes.
The following software components were used:
SQL Database (PgAdmin – Postgres)
R Studio
For the hardware, we used a third party cloud service vendor with the following component:
SQL server – Postgres
We utilized a straightforward architecture. Because the total volume of data was 19 gigabytes,
there was no real time processing requirement, and the data was based on csv files. We used a
micro server with 1 core and 1 GiB of RAM memory.
The data flow process is described below and represented in the diagram that follows:
Develop a code on R using US Census API and FTP that can extract and load the
information to a Postgres server.
We transform and generate the goals of all US Citizen data on R application using
auxiliary tables.
After this transformation, upload it again in a Postgres server.
On the Postgres, it is possible to access the data using any function language such as R,
Python, or Spark. We build an algorithm to generate new data about the customers
using R.
Using a statistical tool, we create the modeling in R and provide data to a dashboard in R
also, but it will be possible with any visualization tool that works with Postgres
The diagram that follows shows the data flow between software components.
2016 EMC Proven Professional Knowledge Sharing 33
Dell EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO
RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE.
Use, copying and distribution of any Dell EMC software described in this publication requires an
applicable software license.
Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.