12
PING WANC JOANNA R. BAKER Procedures to Improve the House List Segment Tests PING WANG received his BS from Northeast University China an MBA from Appalachian State University and his PhD from the University of Georgia He is currently an assistant professor at James Madison University His research interests include production planning and scheduling models and applications of artificial neural networks sales fore casting time series forecasting mailing selections and classifications His work has been presented at both national and regional meetings and his research has been published in journals and in several conference proceedings JOANNA R BAKER received her BA and MS from the University of Maine and her PhD from Clemson University She is the department head of Information and Decision Sciences at James Madison University where her primary research interests are in public sector applications of mathematical programming and mUltiObjeCtiVe decision making She has published numerousjournal articles and serves as a consultant to the National Institute of Justice She is a member of the Board of the Decision Sciences Institute the Board of INFORMS and Southeast DSI The authors wish to express their appreciation to Dr Don Schultz and the two referees for their constructive comments PING WANG JOANNA R BAKER ABSTRACT An important aspect of direct marketing research focuses on developing and segmenting a house list [customer database) using various geographic, socioeconomic, and recency, frequency and mon etary (RFM) measures For a typical promotion, direct marketers may take a simple random sample from the house list as a test mailing to forecast the segment rollout response rates Decisions about the final rollout are made in such a way that only the segments with response rates over the pres- pecified threshold or break-even response rate will be used In this article, it is shown that the com- monly used simple random sampling procedure may seriously underestimate the variability of the rollout response rates of segments with higher test response rates, overforecast the potential num- ber of buyers from the rollout, and inflate forecast accuracy Several procedures are proposed to improve the house list tests, and examples are used to compare the new procedures with the exist in9 one Results show that the proposed house list test procedures provide more statistically effi- ctent, cost-effective and reliable forecasts for segment response rates while improving the accuracy of forecasts generated 0 1996 John Wiley 8r Sons, Inc. and Direct Marketing Educational Foundation, Inc CCC OX92-0591/96/02024-12 24 JOIIIINAL 0 1 LIIRI C[ MAliKL TIN6 VOLUME 10 NUMBER 2 SPRING 1996

Procedures to improve the house list segment tests

Embed Size (px)

Citation preview

PING WANC JOANNA R. BAKER

Procedures to Improve the House List Segment Tests PING WANG received his BS from Northeast University China an MBA from Appalachian State University and his PhD from the University of Georgia He is currently an assistant professor at James Madison University His research interests include production planning and scheduling models and applications of artificial neural networks sales fore casting time series forecasting mailing selections and classifications His work has been presented at both national and regional meetings and his research has been published in journals and in several conference proceedings JOANNA R BAKER received her BA and MS from the University of Maine and her PhD from Clemson University She is the department head of Information and Decision Sciences at James Madison University where her primary research interests are in public sector applications of mathematical programming and mUltiObjeCtiVe decision making She has published numerousjournal articles and serves as a consultant to the National Institute of Justice She is a member of the Board of the Decision Sciences Institute the Board of INFORMS and Southeast DSI The authors wish to express their appreciation to Dr Don Schultz and the two referees for their constructive comments

PING WANG JOANNA R BAKER

ABSTRACT An important aspect of direct marketing research focuses on developing and segmenting a house l ist [customer database) using various geographic, socioeconomic, and recency, frequency and mon etary (RFM) measures For a typical promotion, direct marketers may take a simple random sample from the house list as a test mailing to forecast the segment rollout response rates Decisions about the final rollout are made in such a way that only the segments with response rates over the pres- pecified threshold or break-even response rate will be used In this article, i t i s shown that the com- monly used simple random sampling procedure may seriously underestimate the variability of the rollout response rates of segments with higher test response rates, overforecast the potential num- ber of buyers from the rollout, and inflate forecast accuracy Several procedures are proposed to improve the house l ist tests, and examples are used to compare the new procedures with the exist in9 one Results show that the proposed house list test procedures provide more statistically effi- ctent, cost-effective and reliable forecasts for segment response rates while improving the accuracy of forecasts generated

0 1996 John Wiley 8r Sons, Inc. and Direct Marketing Educational Foundation, Inc CCC OX92-0591/96/02024-12

24 JOIIIINAL 0 1 L I I R I C[ MAliKL TIN6 VOLUME 10 NUMBER 2 SPRING 1996

INTRODUCTION

A house list or customer database is an indispens- able asset to direct marketers. Direct marketers spend a great deal of time studying and segmenting the house list based on geographic, socioeconomic, and recency, frequency, and monetary (RFM) mea- sures. Targeting a product or service to the appro- priate segments will result in a greater likelihood of success and higher profitability. Generally, direct marketers employ test mailings to assess profitable segments before making final rollout decisions. Test response rates are used to forecast rollout response rates for each segment, to determine the final mail- ing quantities and estimate potential orders, and to plan for the purchase or production of the product or service. Overforecasting will result in high in- ventory costs and the risk of carrying products that may never be sold. Similarly, underforecasting may result in out-of-stock condition with a loss of sales.

A house list differs from rental lists or other new lists. Names on the house list are accumulated over time through various promotions by the list owners who tend to know their house list much better than a new or outside list. Based on their experience with a house list they tend to have more knowledge about the possible response from a test and even have some feeling about which segments might perform better given a specific promotion.

The primary go:! of a new list test is to estimate the rollout response x e through the observed test response rate. Conversely, for a house list test the primary goal is to identify those segments that have forecasted rollout response rates higher than the prespecified threshold value or the minimum sat- isfactory response rate, pms. The threshold value pmc

comes from a break-even analysis and other eco- nomic tradeoffs. In balancing the cost and benefit, direct marketers try to mail only to the higher re- sponse segments. As in list testing and rollout, the rollout response rates for the top-performing seg- ments (with higher test response rates) are most likely to fall short of expectations, since the overall response for the whole rollout is derived from the segments with higher response rates (1,4,6,7,9,12). Thus, the response rate for the rollout to the top- performing segments is also most likely to fall short of the expectations, with higher variability and more difficulty in forecasting. Based o n a test response rate of 0.5% and a sample size of 100,000, a 95% confidence interval for the rollout response rate specifies that the forecast error is less than or equal to f8.74% as measured by the ratio of the half-width of the resulting confidence interval to the test re- sponse rate (10). In reality, however, the errors un- der the same conditions may vary from 20-50% of the test response rate. The large variability of the estimated list response rate is the result of market and customer dynamics such as

1.

2.

3.

4.

5.

Competitors promoting the same or similar products and/or services. Changes in the composition of the list, such as aging, media exposures, and prices (9). Changes in socioeconomic conditions over time. A cannibalization effect, which may occur when house list owners promote many similar products or services during the same period of time and results in one product’s sale can- celing another. The addition of new names to the house list

TABLE 1 Segment Test Results by Simple Random Sampling

Test Rollout Forecast

Segment Response Response Segment Size Size Order (%I 1%) Orders

8.000 8 2 7 13 1.57

12.000 1.23 1 9 0.73

14,000 1,396 7 0 . 5 0

8.000 789 1 0.10

58.000 5,800 2 0.04

I . 5 0 108

0 . 7 0 75

0.40 5 0

0.10 7

0.04 20 ~~~~ ~

JOURNAL OF DIRECT MARKkTING VOLUME 10 NUMBER 2 SPRING 1096 25

TABLE 2 95% Confidence Ranges for Rollout Response Rates by Simple Random Sampling

95% Confidence Range Rollout Segment (Z score = 1.96) Decision

A (0.72-2.42) Yes

B (0.25-1.20) Not Sure

C D

E

A & B

(0.13-0.87)

(0.00-0.38)

(0.00-0.05)

(0.62-1.51)

Not Sure

No

No

Note Confidence range for p,( b,, + z(l - a/21sp,. Standard deviation

sp* = iB/Pdl - Bd/ f l ,

through other sales and the deletion of names from a house list due to reasons such as in- activity, poor credit, and so on.

6. “Shoddy targeting” or rollout to the wrong segments as a result of human error ( 3 ) ; such errors are more likely to occur in new or rental list mailings selected by outside list brokers.

7. Statistical sampling errors including nonran- dom or biased samples.

8. The use of small sample sizes for each seg- ment of the house list in the test mailing. The number of test samples from each segment is determined by the random sampling process and is generally proportional to the size of the segment, which often yields small sample sizes for some top-performing segments. This point will be explained further in the next section.

Unfortunately, many of the effects listed are exog- enous and are not under the direct control of the list users.

Many articles have been written to study list test- ing (1,2,4,8,9,11,12), but veryfew have beenapplied specifically to the problem of house list segment testing. Generally, the guidelines for list testing are, with some modification, applicable to house list segment testing. Some of the features and problems of house list segment testing differ from those as- sociated with list testing. By focusing on these dif- ferences, this study proposes several procedures to improve the house list segment testing.

This article reviews the existing list test proce- dures, their variability, and the forecast and actual

rollout response accuracy, then proposes several procedures for house list segment test sampling, including stratified sampling, and Jain’s procedure within the segment (9). I t also provides step-by- step descriptions as to how these procedures can be used in segment testing and demonstrates how Bayes’s statistics can be used to derive posterior estimates for the rollout response rates that are sup- posed to be more accurate than the test response rates. Examples are presented to compare the fore- casting performance of the various procedures in terms of forecast variability, reliability, and accuracy.

MAILING LIST TEST PROCEDURES: A REVIEW

It is a common practice to use list test procedures for house list segment tests and forecast the rollout response rates in the segment level. Simple random sampling is the most widely accepted procedure ( 4 ) . When using simple random sampling, it is as- sumed that a typical sample can be taken from the universe-here, the house list-and that the results will represent the typical customer responses in the rollout, even though the rollout will occur several months later and the house list may be changed.

Assume for a specific mailing that customers mailed from the house list have one decision to make: either accept the offer (buy) or reject the offer (not buy). The decision variable follows a bi- nomial distribution. Each customer mailed has the same probability of responding, p . Accordingly, the observed individual response x from the mailing takes on values of 1 (bought) or 0 (not bought) with mean p and variance pq and q is simply (1 - p ) (1). Since in the list testing we are dealing with large or very large samples, and np and nq will be larger than 5 , the normal approximation to the bi- nomial distribution can be used (10).

The first question is: How many names should be tested? Croy’s (4) procedure is invoked to answer this question. The procedure is as follows. Specify the minimum satisfactory response rate pnn, ex- pected test response rate pel, acceptable forecast accuracy as measured by the maximum errors emir and the confidence placed on the forecast. The pms can be from the break-even or other economic anal- yses and simply provides a cutoff response rate for the test. If the anticipated rollout response rate is

26 JOiJRNAL OF DIRECT MARKETING VOLUMF 10 NUMBER 2 SPIIIN(; 1976

TABLE 3 Test Mailing Results and Roll Response Forecast by Jain's (91 Procedure

Test Rollou~ Forecast

Response Response Segment Segment Size Size Order fW 1%) Orders

A 8.000 800 12 1 .so 1 .so 108

B 12.000 1.200 8 0.70 0.70 75

C 14,000 1.400 6 0.40 0.40 50

D 8.000 800 1 0.10 0.10 7

E 58,000 5.800 2 0.04 0.04 20

lower than pmA, the list will not be rolled. Direct marketers can estimate the expected test response rate pet from their prior knowledge in similar mail- ings. The expected test response rate per from the test mailing should be greater than the minimum satisfactory response rate p,,,,, given the distribution of the response variable. For a house list segment test, pet may often be smaller than pm, due to larger sampling errors with small samples in segment level. Thus, it is important to design a segment level test to weed out as many nonprofitable segments as possible. Also important is the maximum tolerable forecast error en,, that indicates the maximum al- lowable forecast errors on either side of the ex- pected response rate, pet, given as a percentage of the expected test response rate, pet. Estimates are typically from 10-20% of pet The last item is the level of confidence desired on the forecast, that is, the certainty that the confidence interval of the test response rate per will include the rollout response rate. A 90% level of confidence means that one is 90% certain that the rollout response rate is within the test response rate & em,. Stated differently, 10% of the intervals constructed in this manner may not contain the true rollout response rate, or the rollout response rate is outside the confidence intervals constructed. The risk of concluding from the test mailing that the rollout response rate is outside the confidence interval and, in fact, the true rollout re- sponse rate is within the confidence interval is specified as the a risk; the level of confidence is (1 - a).

In direct marketing, the rollout response rates are more likely to be lower than the test response rates (1,4,6,7,9,12). Direct marketers, therefore, are more careful about the lower limit. In actual appli-

cations, the level of confidence (1 - a) is repre- sented by the Z score or the number of standard deviations away from the expected test response rate pel, because the normal distribution is used. A Zscore of 1.96 corresponds to the 95% level of con- fidence, and a Zscore of 1.645 equals 90%.

Given the preceding assumptions, the number of names to test is given as follows (10):

where both pet and em, are expressed as the per- centage ranging from 0 to 100. Usingp,as the actual test response rate, the forecast range (confidence interval) for the rollout response rate pr is (10):

where s-, = vp,( 1 - p,)/n, is the standard deviation of the test response rate p,. This forecast range gives the lower and upper limits for the rollout response rate associated with the level of confidence speci- fied. There are three situations regarding the fore- cast range for the rollout response rate and the min- imum satisfactory response ratepP,,.

1. Ifp,, is greater than the upper limit specified by Equation 2, it indicates that the test re- sponse rate is significantly lower thanp,. The decision is to reject the rollout of the list.

2. Ifp,, is smaller than the lower limit specified by Equation 2, then it indicates that the test response rate is significantly higher than p,. The decision is to rollout the list.

JOUKNAI OF DIRECT MARKETING VOLUME 10 NUMBER 2 SPRING 1996 27

3. In those cases where pms is within the confi- dence range, no decision is clearly indicated. This outcome is an artifact of statistical infer- ence and operationally implies that there is a large gray region within which a decision maker’s judgment must be applied. Some im- provements may be provided by examining Equation 2 .

The width of the forecast range is determined by Z(l - a/2)s,,, and if we can have a smaller Z(1 - a/2)sp/, we would have more cases in which pms is either lower than the lower limit or higher than the upper limit. Unfortunately, we cannot totally elim- inate the middle gray area unless the whole list is mailed, which negates any cost savings obtained by sampling. Thus, Z(l - a /2 ) is smaller for lower levels of confidence, and s,, = Vp,t 1 - p t ) / n , is smaller for larger n,. Direct marketers making their final decisions on the choice of the level of confi- dence and the test sampling size should evaluate the possible consequence before a change is made. Although the larger test samples cost more to mail, small sample sizes and a lower confidence level may result in more marginal lists being mailed.

To apply the simple random sampling procedure i n the house list segment test, it is assumed that the samples of a representative mix of names from each segment will be obtained. Furthermore, the re- sponses from each test segment are representative of that segment as a whole. Provided these assump- tions hold true, the segment test response rate pti for a house list can be used to estimate the segment rollout response rate prr for the same house list.

The significant difference between segment test sampling and list test sampling is that the sample

size in Equation 1 is linked to the sample size of the whole list, but not to the segment sample size. However, the decisions about whether to rollout or not are made o n the segment level. The following example illustrates the method for segment testing.

Suppose the house list has 100,000 names and a house list segment test is prepared. Estimates are obtained for pm3 = 0.5%, pet = 0.3%, em, = 0.075% or 25% ofp,,and the level of confidence is (1 - a ) = 75%. The total sample size is calculated by Equa- tion l to be 20,428. Because the goal here is to estimate the segment rollout responses, the sample size n, calculated by Equation 1 may not be as useful as it is in the whole list test. Particularly, Equation 1 cannot be applied at the segment level because it could require a larger sample size than that in- cluded in the segment as suggested by Equation 1, which, theoretically, should be used for selecting the sample size for each segment; for example, 10,000 might be required to achieve the specified forecast accuracy with the specified confidence level for Segment A in Table 1. Direct marketers also may not wish to test a large sample of names prior to the rollout due to higher test mailing cost, exposing potential buyers to nonoptimal prices, or poor packaging of the products. Thus, the overall test sample size is not as important in a segment test as it is in a list test. Once the test mailing size of 10,000 names is decided, a random sample of 10,000 names is drawn from the house list. It is possible that a systematic random sampling may actually be used to draw the names, for example, everyfth name. The mail is sent out and the results are summarized in Table 1.

The forecast range is based on the 95% confi- dence level ( Z = 1.96) and is given by Equation 2. The results are given in Table 2; fit, and sbl, are the

TABLE 4 Sensitivity Analyses for Test Results

Segment A Segment B Segment C

Orders Resp I%) Orders Resp 1%) Orders Resp 1%)

12 1 5 0 8 0 6 7 6 0 43

I I 1 38 7 0 5 8 7 0 50

10 too 6 0 50 8 0 57

5 0 42 9 0 64

4 0 5 0 4 0 33

28 JOIIIINAI 0 1 I ) I R i ( I MAKKL I iNL VOLUMF 10 NUMBER 2 SPRING 1946

observed segment test response rate and the stan- dard deviation, respectively (both expressed as a percentage), and n, is the sample size for the Ah segment. Based on the results in Tables 1 and 2 , Segment A would be mailed; Segments D and E would not be mailed. Segments B and C fall into the undecided or gray area. By examining Table 1, it can be seen that the spI and the width for pn are much larger than spand the width forpras the entire list is taken. In this instance, the problem derives from the size of the sample. Specifically, the sample size ni for the segment is smaller than that derived from Equation 1.

Several decisions are applied in this situation. Some direct marketers will simply rollout Segment B and not Segment C, based on whether the seg- ment test response rate prl is greater than the cutoff response rate pm, = 0.50%. Others may apply a drop- off factor from their past experience in the mailing, for instance, 10-20%, on the observed test segment response rate and then make the decision. For Seg- ment C in Table 1, for instance, deflating the test response rate 0.50% by 20% results in an adjusted test response rate of 0.40%, which is less than ap,, of 0.50%; thus, Segment C is not rolled out.

How reliable are these decisions as a result? In the preceding example, if the actual response rate for Segment B is onlyp, -- 0.42%, and is lower than the cutoff response ratep,, = 0.50%, the direct mar- keter would lose money if Segment B were mailed. The additional cost is the result of sampling errors caused by having small sample sizes for the top- performing segments. The procedure treats the top and bottom segments the same when simple ran- dom sampling is applied, but the direct marketer will mail only to the top segments. If the direct

marketer has any information or knowledge about the segments in the house list, it should be utilized if different segments are treated differently.

In the next section, methods to improve the re- liability of the house list segment test sampling are discussed, and some existing statistical procedures are applied to the house list segment testing. Two of the procedures introduced in the next section rely on a list testing procedure from Jain (9). The procedure is outlined briefly for convenience.

Jain's (9) List Testing Procedure

Take a random sample of n from a list to be used to predict the response rates in the test and rollout. Systematically split the n samples into k equal segments and ni names each ( ni = n / k ) in such a way that the first n, is assigned to Key 1, the second n2 to Key 2, and so on. Summarize the results and record the test re- sponse rates as a percentage for each key seg- ment. Based on this procedure, the list was determined to be either homogeneous or het- erogeneous. The less variation among the re- sponses of different key segments, the more homogeneous is the house list and vice versa. Use the following equation to compute the standard deviation of the response,

where ni is the number of names in the Ah segment, n is the total sample size, and pti and

TABLE 5 Test Mailing Results and Roll Response Forecasts by Stratified Sampling

Test Forecasted Roll Real Roll

Segment Size Orders Resp (%I Orders Orders

A 1,600 18 1 . 1 7 0 57

B 2.400 1 1 0.46 44 40

C 2,800 17 0.60 67 80 D 1,600 2 0.15 9 1 1

E 1,600 0 0.03 16 40

JOURNAL OF DIRECT MARKETING VOLUME 10 NUMBER 2 SPRING 1796 29

pare the segment and the mean response rates, respectively. - The confidence interval for the rollout re- sponse rate is given as,

[*I

where Z+u),z is the Z score corresponding to the level of confidence placed on the estimate.

The significant difference between Jain’s procedure and that of simple random sampling is that the stan- dard deviation s, in Equation 3 could be much larger than the one calculated b the standard equation for the response sp = + p( 1 - p ) / n , and the empirical results from Jain suggest that the rollout response rates are distributed with s, rather than sp.

If the purpose of a test mailing is to decide the response rate for the whole house list and whether the whole list should be mailed, then Jain’s (9) pro- cedure can be used directly. In the next section, some new procedures are derived to improve house list segment testing. Jain’s procedure is then applied to these new methods.

is segmented, how many details are known about each segment, and the overall objective of the test.

PROCEDURE I is basically the stratified simple ran- dom sampling (10). After deciding on the number of names for the test by using Equation 1, a simple random sample is taken from each individual seg- ment where the sample size of a segment is pro- portional to the size of the segment. In this way a well-mixed sample is taken from each segment. The sp is then used to calculate the confidence range for the rollout response rate.

PROCEDURE 2 uses Procedure 1 along with Jain’s procedure adapted to the house list segment test. Each known segment is assigned a key. Segments may be of different sizes. Jain’s procedure and the stare applied to determine the confidence range for the rollout response rate. In many cases, valuable top segments may be smaller, and Jain’s procedure would have almost the same proportion of names from each segment. However, the smaller number of names may not provide reliable estimates for the responses because of the sampling errors. In fact, depending upon the size of the sample, even one more or less response from a top segment may have a significant impact on the forecast for the segment

PROCEDURES TO IMPROVE HOUSE LIST TEST SAMPLING

Four procedures proposed in this section employ stratified sampling. Stratified sampling is a sampling procedure that takes simple random samples from each stratum (10). The basic advantage of stratified sampling is that samples are drawn from homoge- neous strata. The variability within each stratum should be much smaller than that in a simple ran- dom sample administered to the whole house list. Thus, if a population can be broken down into some homogeneous groups or strata, as in the house list case, then the stratified sampling would be more efficient and cost-effective. In addition, this ap- proach would provide a more reliable estimate for the population parameters, response rates, and variance. Stratified sampling provides a better esti- mate for the segment response rates because it starts from the segment. This sampling procedure would not significantly increase the cost of sampling.

ftratlfled Sampllng for the House Llst Segment Test Four stratified house list test procedures are pro- posed. Each differs in terms of how the house list

rollout response. Silverman (12) referred to this as a “regression to the mean” effect; Ehrman (7) used the term “test-retest” effect. The rollout responses almost always fall short, a problem that may be overcome using a third procedure.

PROCEDURE 3 is also a stratified sampling-based procedure. I t calls for an increase in the proportion of names from the top segments and reduces the names from the bottom segments before Jain’s pro- cedure is used. The criteria used to decide how many names from each segment vary according to the criterion selected [for a complete discussion, see Neter et al. (lo)]. One criterion is to associate the cost of sampling to the proportion of samples. Another is to use the standard deviation to weigh the proportion of samples, or

Nisi n. = - ’ ZNisi 151

where s, = vpi(l - p i ) / N f is the sample standard deviation and N, is the zth segment size.

30 JOURNAL OF m r w i M A R K E T I N G V O L U M E 10 NUMBER 2 SPRING 1996

TABLE 6 Segment Test Results

Segment Size Orders Resp I%) Subsegment Subsegment

~ ~~

A, 400 6 1.5

A2 400 5 I .25

A3 400 4 I .00

A. 400 3 0.75 Total 1,600 18 1.125

There are several potential advantages to weigh- ing the proportion of samples by some variance measures. First, because direct marketers are more interested in mailing out top segments, more names tested from top segments tend to improve reliability and reduce variability. At the same time, direct mar- keters will be very reluctant to test the whole top segment because opportunities to sell their best of- fer may be lost. Thus, for those top segments with a smaller number of names, a prespecified propor- tion may be sufficient, say, 15-25% of the segment. Second, resources will be spent on those segments to be mailed and thus will be more cost-effective. As discussed so far, the basic change here is to test more for smaller, top segments and less for larger, low-response segments. If little information about the house list is available and the list is not well segmented, Jain’s (9) procedure should be used initially. As more information about the house list becomes available, the proposed procedures here should improve the quality of estimates for rollout response rate.

PROCEDURE 4 is basically the same as Procedure 3 with the addition of Jain’s (9) procedure within a given segment to develop the confidence range for the segment response rate. This is referred to as stratified sampling with double segmentations be- cause further partitioning of each segment is made using Jain’s procedure. Typically, a house list con- tains millions of names; a test mailing may be as large as 100,000. For example, with 50 prespecified segments, the average segment can contain tens of thousands of names. Jain’s procedure can be applied to each segment to divide each segment into sub- segments. From the test results, segment response rates can be estimated more closely and heteroge- neity within segments can be identified. The seg-

ment response rate is

where j i s the index for segmentj , kj is the number of subsegment used, and pii is the subsegment i re - sponse in the $h segment. The standard deviation of segment response rate is:

where nJ = Zi,,kJ nu and 5 is the standard deviation of response of pi. Because decisions are made on the segment level, the use of a segment confidence interval, based on the segment response rate and standard deviation, should yield more reliable re- sults.

Bayes’s Statlstlcs for Estlmatlng Rollout Response Rate PROCEDURE s involves the application of Bayesian statistics ( 5 ) to update the estimates for the segment test response rates. There are many cases where di- rect marketers use the test response rates as a pre- dictor of rollout success. However, the test response rates may not be indicative of the rollout response rates for a given mailing. Ehrman demonstrated bow Bayesian statistics can be used to revise the test re- sponse rates using the decision maker’s prior as- sessments and opinions on the likelihood of the rollout response rates for the mailing. Some fairly close estimates for the test segment response rates may be available prior to the test mailing such as calculating the test segment response rate based on historical averages for a product family. Using Bayesian statistics, the direct marketer’s prior knowledge about the house list can be incorporated into the test process. The following equation is used to determine the segment adjusted (posterior) re- sponse rates

where pa(new) is the updated segment response rate, pa andp, are the prior and posterior response rates, and sa2 and sjI2 are the variances of those re- sponse rates, respectively. This procedure adjusts

JOLJKNAI 0 1 LIIKFCT MARKETING VOLUME 10 NUMBER 2 SPRING 1996 31

for the effect of regression toward the mean and yields a more reliable forecast (5).

The five procedures described are suitable for different situations and require different data about the house list. However, with the wide use of various computer-based decision-support systems, the use of one or a l l of these procedures should become routine.

EXAMPLES AND COMPARISONS

The mailing quantity and response sates used for the analyses that follow are for illustrative purposes only and are not actual estimates. To obtain appro- priate test mailing sizes and the estimated response rates, users should consult Jain (9) and others (2,4,5 $1.

Assume a house list containing 100,000 names is segmented into five segments labeled A, B, C, D, and E, based on some prespecified criteria. A break- even analysis indicates the 0 50% response rate is the cutoff point. The objective is to test the house list in order to identify the segments, having a pre- dicted response over the threshold of 0.50%. Based on prior experience it is determined that the overall response rate would be around only 0.30% if the whole house list is rolled. This will result in a loss of money. Therefore, it is necessary to mail only to segments with a higher probability of buying. For

TABLE 7 Confidence Range Estimates for Segment Roll Response Rates

Segment Lower Upper and Method P% S% Limit Limit

A

Stat 1 125 0 261 0 589 1611

Jain I 125 0 280 0 566 I 684

B

Stat 0 458 0 138 0 189 0 731

Jam 0 458 0 336 -0 214 I 130

c

stat. 0.607 0.146 0314 0.886

Jam 0.607 0.226 0. I55 r ,059

Nofe The equation for s, by the statistical method is \Ip(l-, and the equation for sp, by Jain’s method is \‘Cn,4p, - p12/n

TABLE 8 Posterior Estimates for Rollout Response Rates

Prior Test Posterior

Segment (%I Variance 1%) f%1

A 10 0 99 1 5 0 1 20

B 0 30 0 30 0 70 0 42

C 0 60 0 7 0 0 40 0 51

D 0 15 0 15 0 10 0 12

E 0 03 0 03 0 04 0 03

simplicity, let n = 10,000 be the size of the test mailing. Procedure 1 will provide well-mixed sam- ples for each segment. More benefits can be realized by using other proposed procedures.

Procedure 2 After determining the sample size, use the stratified sampling to draw names from each segment. When Jain’s sampling procedure is used, each segment is labeled as A, B, C, D or E. Table 3 gives the results of the test and the forecast for rollout segment re- sponse rates. Since a single random sample is taken, assume the same proportion of test samples from each segment as given in Table 3.

The statistical estimation will lead to the follow- ing confidence interval for the overall response rate p = 0.0029 or 0.29% (29 test orders/sample size 10 000) The standard deviation is given by sp =

= i0.0029(1 - 0.0029)/10,000 = 0.0539%. The level of confidence is 95% with a Z score of 1.96. The estimated range for the rollout response would be p f Zs, or 0.29% k 1.96 X 0.0539% or from 0.1856-0.3968%. This means that if the same mailing is sent 100 times to the same house list under the same conditions, at least 95 OLIL of 100 times the true unknown roll response rate would be within the range 0.1856-0.3968%. Applying Jain’s (9 ) equation, fi = C p A E p z = 0.29 percent and sR, = I/c n i b i - jj)>’/n = 0.373 per- cent, where $ is the mean response rate and s,, is the standard deviation of the response rate, the 95% confidence estimate for the roll response rate would be, jj f ZsR, = 0.29 2 1.96 X 0.373% or from 0.00 to 1.02%. (The lower limit is calculated as -0.44% and thus 0% should be used.) Based on these results, there is a 95% “certainty” that the true roll response rate under the same conditions will be between 0%

32 JOURNAL OF D l R t C T MARK1 rlNG VOLUME 10 NUMBER 2 SPRING 1990

and 1.02%. Jain’s estimate for the standard deviation sPJ is larger than the sp calculated using basic statis- tical methods, and thus leads to wider intervals. Jain’s empirical results suggest the actual rollout responses follow the ranges as given and thus pro- vide better forecasts. The sp confidence range is dominated by that of Jain’s, that is, Jain’s confidence range is everywhere larger than that of sp. This im- plies that, for cases where the test response rates are within the lower limits of the two confidence ranges here, basic statistical formulas would rec- ommend rejecting the mailing, but Jain’s procedure would suggest no decision instead. For cases where the test response rates are within the upper limits of the two confidence ranges, the basic statistical formulas would recommend a rollout decision, but Jain’s procedure would again fail to recommend this decision. The use of Jain’s procedure makes the confidence ranges wider than those calculated with basic statistical formulas and results in more no- decision cases. As supported by Jain’s empirical re- sults, the confidence ranges generated using the basic formulas (without the inclusion of Jain’s pro- cedure) are much narrower, resulting in poor fore- casts. Consider the cases where the test response rates are within the upper limits of the two confi- dence ranges. The rollout response rates would be likely less than their test response rates-a poor forecast-because they should be no-decision cases as given by Jain’s procedure.

As the objective is to identify profitable segments to mail, and not to mail the whole house list, the preceding calculations do not help directly in mak- ing the rollout decisions. To make decisions on the segment level, a common practice is to look at the test segment response rates. If the response rate is higher than the cutoff response rate of 0.50%, then a rollout to the segment would follow; otherwise, the decision to roil to that segment would not be made. Additional tests for the segments with test response around 0.50% are not recommended due to cost, time, and competition. In such a case, a rollout to Segments A and B with forecast orders of 108 for Segment A, and 75 for Segment B should be made. The actual rollout response rates are listed in Table 4. The real responses are 0.90% for Seg- ment A and 0.42% for Segment B. The procedure overforecasts both top Segments A and B. Segment C, which has a real response of 0.72% and 90 orders, is underforecast. The potential costs for this poor

mailing decision would include the extra cost for the products, inventory costs, and the loss of sales. The drop-off in the rollout is the result of sampling errors. Because only top segments or top-perform- ing lists were mailed, the effect of regression toward the mean will always tend to push the real response to the true unknown mean response. For top-per- forming response segments in the tests, the true unknown responses are more likely to be lower. To offset this problem, some direct marketers impose a drop-off factor of about 10-30% on the test re- sponse to be used in the forecast for rollout mail- ings. The results are not encouraging. Large errors, more than 30-50%, are not unusual and result in a large inventory of unsold goods and unsatisfied customers. The products in the inventory are not the ones customers want, and products that cus- tomers want are not readily available. The primary reason for this problem is the sampling procedure. As simple random sampling is used, the same pro- portion of names is drawn from each segment. The top-performing segments are usually small, and only very few names will be included in the tests. Hence, one more or one less test order for a given segment might affect the rollout decisions. This suggests that, because the segment test sample sizes of these top- performing segments are small, the variations of the test response rates for the top-performing segments are normally higher: the smaller the test sample size, the larger the variation. To illustrate further the im- portance of the segment test sample sizes for top- performing segments, Table 4 analyzes the sensi- tivity of rollout decisions as the test orders change for segments A, B, and C.

For segment A, where the objective i s to avoid overforecasting-stocking more inventory than is demanded-rollout is recommended down to four test orders. For Segment B, a roll would not be rec- ommended given there are only five orders (three fewer than the test results), suggesting that sampling error may be a problem. The decision is based on a small set of potentially unreliable sample results. For Segment C, a roll would be recommended if one or two more orders from the test were obtained. If Segments A and B are rolled, there would be 109 orders (0.61% response) from the rollout and 29 orders from the tests, or 138 total orders with a 0.493% overall response rate. If Segments A and C are rolled, there would be 154 orders (0.78% re- sponse) from the rollout and 29 from the test, or

JOURNAL OF DIRECT MARKETING VOLUME 10 NUMBER 2 SPRING 1996 33

183 total with a 0.614% overall response. To avoid these costly mistakes, revision of the sampling pro- cedures would be indicated.

Procedure 3 This stratified sampling procedure calls for random sampling on the strata level, and the criteria to de- cide how many names to be drawn from each stra- tum also vary. A rule of thumb, based on the rough estimate for test response, is sufficient. For the cur- rent example, samples from the four small and top segments can be doubled and the samples from the large- and low-response segments can be reduced. More samples from the top segments will enhance the reliability of the estimates for these segments. Table 5 outlines the results.

Because more samples are taken from the top segments, the test response rates will be closer to the unknown true rollout response. Segment A has a response rate of 1.1%, Segment B is 0.46%, and Segment C is 0.60%. Thus, the decision will be to rollout Segments A and C and not Segment B. The test average response is 0.48%. The 95% confidence range for the roll response is 0.3473-0.6191% ob- tained using basic statistical formulas. The number of orders from the rollout is 137, plus the 48 from the test for a total of 185 (or a 0.67% of overall re- sponse rate). It is a 34% increase in the orders with- out any extra cost from both test and rollout. I t is possible that Segment B may be rolled without any other information simply because it is so close to the cutoff point.

Procedure 4: Stratified Sampling with Double Segmentations The reasons for applying Jain’s procedure to the house list segment level are twofold. First, a more accurate forecast for the segment response rates for the top-performing segments is desirable. Second, some of the potential differences within each ex- isting segment need to be identified. These differ- ences are real and reflect that the existing methods are not adequate and call for further improvements. The results obtained when Jain’s procedure is used are shown in Table 6. The confidence ranges for the rollout responses are given in Table 7. From Table 7, it can be seen that

- Jain’s method gives a wider interval estimate for the rollout segment response, which may

be able to account for some unexplained vari- ations due to many uncontrollable factors as listed previously. Apy segment confidence interval with the lower limit greater than the cutoff response rate (e.g., Segment A) indicates that the segment response rate is significantly larger than the cutoff point. The segment should be rolled. Any segment confidence interval with the up- per limit smaller than the cutoff response rate indicates that the segment response rate is sig- nificantly lower than the cutoff point. The seg- ment should not be rolled. Any segment confidence interval that includes the cutoff point indicates that the test response rate is not significantly larger or smaller than the cutoff point. Whether the actual roll re- sponse will be higher or lower than the cutoff point is unknown. Ideally, additional tests should be conducted as suggested by Jain (9). Otherwise, direct marketers need to be advised of the risk of their decisions regarding these segments and the possible costs associated with the outcomes of these decisions. These economic analyses may justify some additional tests in the future. By examining the results for Segments B and C, it can be seen that Segment B not only has a much smaller lower limit, but also has higher variability. I t is thus more risky to roll Segment B than C. In Segments B and C, some of subsegments have a much higher response rate than others. These call for additional screening of the sub- segments and may lead to more homogenous groupings of the house list.

Using Empirical Bayes in Segment Rollout Decisions Ehrman (5) applied the Bayes statistics in direct marketing. Noting that for many cases-especially given the increased capability afforded by comput- ers and decision support systems-it is much easier to consolidate expert and historical estimates for segment responses. Table 8 lists some possible prior estimates for the test response rates before the test mailing is conducted. For segment B, the posterior response is 0.42% and is below the cutoff point and should not be mailed. For Segment C, the posterior

34 JOURNAL OF DIRECT MARKETING VOLUME 10 NUMBER 2 SPRING I996

response is 0.51% and is higher than the cutoff, and thus should be mailed.

help direct marketers target their products and ser- vices to the right groups of customers and thus in- crease effectiveness and profitability.

CONCLUDING REMARKS REFERENCES

In this article it was demonstrated that the simple random sampling may not be effective for house list segment tests. Furthermore, some list test sampling procedures can be effectively adapted to the house list test sampling to improve the reliability of the forecast. Two procedures are particularly useful: one is the stratified sampling and the other is Jain’s pro- cedure. Since segmented house lists typically have some homogeneous segments, top-performing segments may be fairly small. Results presented here suggest that analyses and decisions should fo- cus on segment-level details rather than the house list as a whole when estimating segment rollout re- sponse rates. The following guidelines for the house list segment testing are suggested:

- Segment the house list whenever it is appro- priate.

* Use stratified sampling within each segment to draw samples.

* Maintain the historical averages for segment response rates. - Use Bayesian statistics to update the test re- sponse rates. - Use Procedure 4 to estimate the segment rol- lout response rates. - Use Jain’s strategy (9) to determine which seg- ment should be rolled and which segment should be tested further.

House list segment tests are complex tasks. This article shows that well-designed procedures will

1. Allenby, G. M. and Blattberg, R. C. (1987), “A New Theory of Direct Market Testing, or Why Your Rollout Results Do Not Match Your Test Results,” Journal of Direct Marketing, 1(4),

2. Berger, P. and Magliozzi, T. (1992), “The Effect of Sample Size and Proportion of Buyers in the Sample on the Performance of List Segmentation Equations Generated by Regression Anal- ysis,” Jorrrirul of Direct Murketiizg, 6(1), 13-22. 3. Blair, K. C. (1991), ‘‘ ‘Shoddy Targeting’ and the Disparity between Test and Rollout Response Rates,” Journal of Direct Murketirig, 5(2), 31-33. 4. Croy, C. D. (1988), “How Many Names Should We Test?” Joirrirul ofDirect Marketing 2(4), 41-49. 5. Ehrman, C. M. (1988), “Bayesian Statistics for Direct Marke- ters,”Joirrrzal of Direct Marketing, 2(1), 43-48. 6. Ehrman, C. M. (1990), “Correctingfor ‘Regression to the Mean’ in List Selection Decisiotis,”Jouriral of Direct Marketing, 4(2), 21-34. 7. Ehrnman, C. M. (1990), “On the Test-Retest Effect o n List Se- lection, Some New Insights,” Jozrrizal ofDirect Marketing, 4(3), 61-69. 8. Hansotia, B. J. (1990), “Sample Size and Design of Experiment Issues in Testing Offers,” JoitrirulofDirect Marketiizg, 4(4), 15- 25. 9. Jain, C. L. (1995), “How to Forecast the Rollout Response of a Mailing List from a Sample Test in Direct Mail,” Journal of Direct Marketiizg, 9(1), 29-36. 10. Neter, J., Wasserman, W., and Whitmore, G. A. (1988), Ap- plied Statistics, Needham Heights, MA: Allyn and Bacon. 11. Shepard, D. (1989), ”The Statisticians, What Did We Ever Do to Them?”.Joririral of Direct Murketiirg, 3(1), 34-37. 12. Silverman, B. (1986), “Why Rollouts Almost Never D o as Well as Tests,” Journal of Direct Marketing Research, 1 (Sum- mer/Fall), 105-115.

24-37.

JOURNAL OF DIRECT MARKETING VOLUME 10 NUMBER 2 SPRING 1996 35