1 Chapter 4: Design of Experiments 4.1 Why Experiment? 4.2 Introduction 4.3 Multi-Factor Experiments...

Preview:

Citation preview

1

Chapter 4: Design of Experiments

4.1 Why Experiment?

4.2 Introduction

4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous Responses

4.6 Recommended Reading

2

Chapter 4: Design of Experiments

4.1 Why Experiment?4.1 Why Experiment?

4.2 Introduction

4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous Responses

4.6 Recommended Reading

3

Objectives Explain the role of experiments in answering business

questions.

4

You Need to Know Work is full of questions that you need answers to.

Some have answers that only require a lookup: What is the policy regarding the use of demographic

variables in predictive models? When did you last send a marketing e-mail to

segment 17?

Some do not have readily available answers : Does it really matter whether you use first-class

postage when sending direct mailings for a cruise line?

How should you advertise if you want to maximize sales/expenditure ratio for football tickets?

5

Statistical Models Can Answer QuestionsThe models that you learn to use in this course can answer many of the questions that you have. Do you have the data to perform an analysis and

answer the question? Did you account for the kinds of variables that are in

your control as well as the kind of variables over which you have no control?

6

Questions Often Mean Comparing Things Does your question imply that a comparison is needed? First-class versus bulk-rate postage Primetime versus late-night advertising

Did you conduct an experiment?

7

Consider This…What is the question that you want to answer?

What is the population that you want the answer to pertain to?

What kinds of things do you want to compare that you can control?

How is the outcome measured (Yobs)?

What else impacts Yobs that you cannot control?

8

Consider This…What is the question that you want to answer?

1. Does postage make a difference in the response rate?

2. Is it worth the extra expense to advertise tickets for a football game in primetime?

9

Consider This…What is the question that you want to answer?

What is the population that you want the answer to pertain to?

1.The “luxury traveler” segment

2.Football fans

10

Consider This…What is the question that you want to answer?

What is the population that you want the answer to pertain to?

What kinds of things do you want to compare that you can control?

1.The class of postage on the offer envelope

2.Whether the tickets are advertised during primetime (expensive) or late night (inexpensive)

11

Consider This…What is the question that you want to answer?

What is the population that you want the answer to pertain to?

What kinds of things do you want to compare that you can control?

How is the outcome measured (Yobs)?

1.The number of responses from each postage group

2.Ticket sales in the week following each type of advertisement

12

Consider This…What is the question that you want to answer?

What is the population that you want the answer to pertain to?

What kinds of things do you want to compare that you can control?

How is the outcome measured (Yobs)?

What else impacts Yobs that you cannot control?

1.Gender, vacation already taken that year, children

2.Team’s season performance (wins, losses), disposable income of viewing markets, broadcasting lineup

13

Who Cares about Things You Cannot Control?You do!

Only accounting for the things in the experiment that you can control:

14

Who Cares about Things You Cannot Control?You do!

Accounting for the things in the experiment that you can control plus one thing that you cannot control:

15

Who Cares about Things You Cannot Control?You do!

Accounting for the things in the experiment that you can control plus two things that you cannot control:

16

Consider This…What is the question that you want to answer?

What is the population that you want the answer to pertain to?

What kinds of things do you want to compare that you can control?

How is the outcome measured (Yobs)?

What else impacts Yobs that you cannot control?

Work smarter: design an experiment!

17

Idea Exchange Have you ever conducted an experiment? If so, what

was the business or scientific objective? Web-based experiments are popular because they are

relatively inexpensive to implement and they can be modified in real time. Can you describe any Web experiments you have seen?

What kinds of factors might influenceclick-through behavior on, for example, an ad for insurance? For retailclothing? Other types of products and services?

18

Chapter 4: Design of Experiments

4.1 Why Experiment?

4.2 Introduction4.2 Introduction

4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous Responses

4.6 Recommended Reading

19

Objectives Define experimental design concepts and terminology. Relate experimental design concepts and terminology

to business marketing concepts and terminology.

20

Basic Terms in Design of Experiments (DOE)

Response

Factor

Factor Level

Effect

Power

Experimental Unit

Treatment

Replication

Balance

Orthogonality

21

Basic Terms in DOE: ResponseA response is the dependent variable of interest in the analyses. It is sometimes called the target or dependent variable.

Examples include the following:Response rate to direct mail solicitationsDefault (“Bad”) rate among credit customersBalance transfer amountFraudNumber of items purchased from a catalogSpend, six months after acquisition

22

Basic Terms in DOE: FactorA factor is an independent variable that is a potential source of variation in the response metric.

Examples include the following: Teaser or introductory APR Color of envelope Balance transfer fee Presence or absence of a sticker on a catalog First-class versus third-class mail Others?

23

Basic Terms in DOE: Factor LevelA factor level is a particular value, or setting, of a factor.

Examples include the following: 1.99% introductory APR White envelope 2% balance transfer fee Airline mile reward offer Third-class mail Others?

24

Basic Terms in DOE: EffectAn effect captures and measures the relationship between changes in factor levels and changes in the response metric.

25

Examples of an Effect

A offer with a sticker on it garners $10 more, in purchases, than a offer without.

26

Examples of an Effect

The white envelope has a 22% higher response rate than the grey envelope.

A offer with a sticker on it garners $10 more, in purchases, than a offer without.

27

A 1% increase in Introductory APR yields a 20% decrease in response rate.

The white envelope has a 22% higher response rate than the grey envelope.

A offer with a sticker on it garners $10 more, in purchases, than a offer without.

28

Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.

Examples include the following: 1.99% Intro Rate, in a White Envelope, no BT Fee 0% Intro Rate, in a Grey Envelope, 2% BT Fee 1.99% Intro Rate, in a Grey Envelope, 2% BT Fee 0% Intro Rate, in a White Envelope, no BT Fee

There are eight possible treatments when you have three factors, each at two levels.

29

Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.

Examples include the following: 1.99% Intro Rate, in a

White Envelope, no BT Fee

0% Intro Rate, in a Grey Envelope, 2% BT Fee

1.99% Intro Rate, in a Grey Envelope, 2% BT Fee

0% Intro Rate, in a White Envelope, no BT Fee

There are eight possible treatments when you have three factors, each at two levels.

30

Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.

Examples include the following: 1.99% Intro Rate, in a

White Envelope, no BT Fee

0% Intro Rate, in a Grey Envelope, 2% BT Fee

1.99% Intro Rate, in a Grey Envelope, 2% BT Fee

0% Intro Rate, in a White Envelope, no BT Fee

There are eight possible treatments when you have three factors, each at two levels.

31

Basic Terms in DOE: TreatmentA treatment is a combination of all of the factors, each at one level. In a typical marketing context, a treatment constitutes a unique offer.

Examples include the following: 1.99% Intro Rate, in a

White Envelope, no BT Fee

0% Intro Rate, in a Grey Envelope, 2% BT Fee

1.99% Intro Rate, in a Grey Envelope, 2% BT Fee

0% Intro Rate, in a White Envelope, no BT Fee

There are eight possible treatments when you have three factors, each at two levels.

32

Other Terms in DOE An experimental unit is the smallest unit to which a

treatment can be applied. Replication occurs when more than one experimental

unit receives the same treatment. Power is the probability that you will detect an effect, if

one exists.

33

Chapter 4: Design of Experiments

4.1 Why Experiment?

4.2 Introduction

4.3 Multi-Factor Experiments4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous Responses

4.6 Recommended Reading

34

Objectives Define multifactor experiments. State the advantages of multifactor experiments

versus a sequence of one-factor-at-a-time (OFAT). Explain how experimental units should be allocated to

the treatments. Define the term interaction. Analyze a simple multifactor experiment and identify

interactions.

35

Two Factors, Each at Two LevelsExample: Credit card solicitation with an introductory,

or teaser, rate The introductory (Intro) rate is High or Low. The go-to (Goto) rate is High or Low.

36

One Factor at a Time

7.99%

4.99%

Got

o

0% Intro 2.99%

Intro Test

Goto = ??

Goto Test

Intro = ??

...

37

One Factor at a TimeIntro Test

Hold Goto constant at 4.99%

Goto Test

Hold Intro constant at 0%

4.99%

7.99%

Got

o

0% Intro 2.99%

38

One Factor at a Time

4.99%

7.99%

Got

o

0% Intro 2.99%"Control"

"Goto Test"

"Intro Test"

39

Typical Volumes

4.99%

7.99%

Got

o

0% Intro 2.99%50,000 experimental units

50,000 experimental units

50,000 experimental units

40

EfficiencyVP of Marketing

Either a large numerator

or a small denominator

or both!

Experiment DesignerCan you quantify these terms?

•Number of items tested•Margin of error•Financial costs•Total sample size

...

41

EfficiencyVP of Marketing

Either a large numerator

or a small denominator

or both!

Experiment DesignerCan you quantify these terms?

•Number of items tested

•Total sample size

42

EfficiencyVP of Marketing

Either a large numerator

or a small denominator

or both!

...

Experiment DesignerCan you quantify these terms?•Two terms: Intro effect and Goto effect

•150,000 observations

43

Efficiency?!?

...

VP of Marketing

Either a large numerator

or a small denominator

or both!

Experiment DesignerCan you quantify these terms?•Two terms: Intro effect and Goto effect

•150,000 observations

44

Efficiency?!?

4.99%

7.99%

Go

to

0% Intro 2.99%

45

Efficiency?!?

4.99%

7.99%

Go

to

0% Intro 2.99%

This test uses only two-thirds of the data.

This test uses only two-thirds of the data.

46

One Factor at a Time

4.99%

7.99%

Go

to0% Intro 2.99%

4.99%

7.99%

Go

to

4.99%

7.99%

4.99%

7.99%

0% Intro 2.99%

0% Intro 2.99%

0% Intro 2.99%

There are many different ways to arrange the “same” test.

They all assume no interaction between Intro and Goto.

None of these eliminates the potential for bias in the estimates.

Go

toG

oto

47

Pick a Treatment Set

4.99%

7.99%

Go

to0% Intro 2.99%

4.99%

7.99%

0% Intro 2.99%

4.99%

7.99%

Go

to

0% Intro 2.99%

4.99%

7.99%

Go

to

0% Intro 2.99%

Go

to

48

Detecting Interactions between FactorsR

esp

on

seR

ate

Intro Rate

Low Goto

High Goto

Low High

49

Factorial Arrangement of the Treatments

Permits the testing and estimation of an Intro x Goto interaction term.

Increases the precision of estimates for the same test volumes.

Can use every individual in every test.Combinations of factor levels provide replication for individual factors.

4.99%

7.99%

Got

o

0% Intro 2.99%

4.99%

7.99%

Got

o

0% Intro 2.99%

50

Efficiency! Reuse Observations

The Intro test uses every observation.

4.99%

7.99%

Got

o

0% Intro 2.99%

4.99%

7.99%

Got

o

0% Intro 2.99%

51

Efficiency! Reuse Observations

The Goto test uses every observation.

4.99%

7.99%

Got

o

0% Intro 2.99%

4.99%

7.99%

Got

o

0% Intro 2.99%

52

Efficiency! Additional Tests

Having four treatment means yields up to four model df.

This treatment structure enables the estimation of the Intro x Goto interaction term.

4.99%

7.99%

Got

o

0% Intro 2.99%

4.99%

7.99%

Got

o

0% Intro 2.99%

53

Efficiency! Same or Smaller Sample Size

Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.

4.99%

7.99%

Got

o

0% Intro 2.99%

4.99%

7.99%

Got

o

0% Intro 2.99%

...

54

Efficiency! Same or Smaller Sample Size

Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.

25,00025,000

25,000 25,000

50,000 Low 50,000 High

...

55

Efficiency! Same or Smaller Sample Size

Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.

25,00025,000

25,000 25,000 50,000 High

50,000 Low

...

56

Efficiency! Same or Smaller Sample Size

Instead of the OFAT approach, with 50,000 experimental units in each treatment (and 1/3 of that data being ignored at each stage of the analysis), this test would require 50,000 observations in each marginal total to have the same power.

25,00025,000

25,000 25,000

100,000 Total

57

Efficiency?

Balance of the marginal totals might not be all that is required.

50,000

50,000

...

58

Efficiency?

Balance of the marginal totals might not be all that is required.

49,999

49,999

1

1

...

59

Efficiency?

Balance of the marginal totals might not be all that is required.

49,500

49,500

500

500

60

Efficiency Is Still a Balancing Act

Balancing the sample size over all of the treatments seems like a reasonable goal.

25,00025,000

25,000 25,000

100,000 Total

61

RandomizationAfter the treatment structure is defined, the next step is to randomly assign treatments to experimental units. A typical approach to randomization of 100,000 customers to four treatments includes the following steps: Define the population of interest. Select a simple random sample from the population

equal to the total samples size – for example,100,000. Randomly partition the sample into four equal groups

– for example, 25,000. Assign each group to one of the four treatments.

62

Analyzing a 2-by-2 Factorial Experiment with Interaction

Credit Card Case Study

Task: Use SAS Enterprise Guide to graph, analyze, and interpret the results of the two-factor experiment testing two different levels of intro rate and goto rate.

63

Analyzing a 2-by-2 Factorial Experiment with No Interaction

Credit Card Case Study

Task: Use SAS Enterprise Guide to graph, analyze, and interpret the results of the two-factor experiment testing two different levels of intro rate and goto rate when no interaction is present.

64

Idea ExchangeConsider the previous experiment. What attributes of the customer might affect an

individual’s likelihood to respond to an offer? How could you use your knowledge of these attributes

to improve the study’s design and treatment structure? How could you use your knowledge of the attributes to

improve the analysis of the experimental data?

65

Exercise

This exercise reinforces the concepts discussed previously.

66

Chapter 4: Design of Experiments

4.1 Why Experiment?

4.2 Introduction

4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous Responses

4.6 Recommended Reading

67

Objectives Explain the concept of orthogonality and why it is

important. Explain the concept of blocking and why it is useful. Analyze and interpret a multifactor experiment with

blocks.

68

OrthogonalityAnother ideal property of an experimental design is orthogonality among the elements of interest. There are at least three ways to think about the importance of this property: Algebraic interpretation – Matrices behave well. Geometric interpretation – Pictures look nice. Statistical interpretation – Estimates have low

variance.

69

Two-Level Full Factorial Coding

I A B AB

+1 +1 +1 +1

+1 +1 -1 -1

+1 -1 +1 -1

+1 -1 -1 +1

70

The Effect of Factor A

I A B AB

+1 +1 +1 +1

+1 +1 -1 -1

+1 -1 +1 -1

+1 -1 -1 +1

71

The Effect of Factor B

I A B AB

+1 +1 +1 +1

+1 +1 -1 -1

+1 -1 +1 -1

+1 -1 -1 +1

72

The Interaction Effect AB

I A B AB

+1 +1 +1 +1

+1 +1 -1 -1

+1 -1 +1 -1

+1 -1 -1 +1

73

Factorial Arrangement versus OFATFactorial Treatment Structure

Pros

+ Reuses observations (morepower for fewer exp units)

+ Tests for interactions

+ Guarantees balanced and orthogonal treatment plans

+ Is an efficient way to test many factors

Cons

- Can be more complicated to set up

- Can be more complicated to sell to a non-technical audience

74

Factorial Arrangement versus OFAT

Pros

+ Are easy to set up – A/B and Champion/Challenger tests are typical in many industries

+ Might yield lower per-unit printing costs

+ Have clear “control” offer, clear test offers

+ Do not require users to learn new words such as“balance” and “orthogonality”!

Cons

+/- Permit simple analysis that could be done with a pencil and paper!

- Do not allow a test for interactions

- Represent an inefficient use of experimental units

One-Factor-at-a-Time Tests

75

BlockingIt is typical to use the same statistic to test

(H0:pmen= pwomen) as

(H0:pred envelope= pblue envelope).

Are these factors equivalent from the perspective of experimental design?

76

BlockingYou can controlfeatures of the offer you make: Creative Color Pricing Duration of offer

Any restrictions are typically self-imposed.

These are usually factors in the test, not blocks.

You cannot controlfeatures of your experimental units: Risk profile Responsiveness Geography Age Gender

Restrictions here are typically features of the population of interest, and are often treated as blocks.

77

BlockingBlocks are groups of experimental units that are homogeneous in some way. Typically, they represent nuisance variability.

Blocks might or might not be randomly selected.

Because units exist in blocks, rather than being assigned to them, blocks reflect a restriction on the randomization in an experiment.

78

Analyzing an Experiment with Blocks

Credit Card Case Study

Task: Incorporate a continuous measure such as risk score into a block/factor in an experiment.

79

Idea ExchangeConsider the kinds of variables that you have no control over. These variables might be important with some types of product offers but not others. What types of product offers might have different response rates based on the following characteristics? Risk profile Geographic regions such as north, south, east,

and west Age Gender Urban, suburban, rural

Can you think of others?

80

Statistically Well-Formulated ModelA well-formulated model maintains the hierarchy of the terms in the model as model reduction is performed. Terms are removed one-at-a-time and the model is refit before removing any more terms.

Intercept

A B

A*B

81

Statistically Well-Formulated ModelA well-formulated model maintains the hierarchy of the terms in the model as model reduction is performed. Terms are removed one-at-a-time and the model is refit before removing any more terms.

Intercept

A

A*B

CB

A*C B*C

A*B*C

82

Exercise

This exercise reinforces the concepts discussed previously.

83

Chapter 4: Design of Experiments

4.1 Why Experiment?

4.2 Introduction

4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous 4.5 Business Experiments with Continuous Responses Responses

4.6 Recommended Reading

84

Objectives Name several continuous response variables you

might encounter in business experiments. Describe issues related to analyzing business

experiments with continuous responses.

85

The Response VariableIn many business applications, the key target variables of interest are binary, and can be expressed as a proportion: Did the customer purchase a product? (What

proportion of customers purchased?) Did the product fail? (What proportion of products

failed?) Did the customer churn? (What proportion of

customers churned?) Was a purchase fraudulent? (What proportion of

purchases are fraudulent?) Was there a claim on the policy? (What proportion of

policies have claims?)

86

The Response VariableIt is also common to find continuous responses in business models: Revenue per store Number of new customers following an advertising

campaign Customer value per mailing Time until churn Wait time on hold in a call center Expected lifetime for a manufactured product Average profit per SKU

87

Where Traditional Statistics Meet the Road Ordinary least squares (OLS) regression and ANOVA

models (linear models) are designed to handle continuous responses.

However, not all continuous responses are suitable for OLS models.

88

The DistributionRevenue per store

Customer value per mailing

Wait time on hold in a call center

89

The DistributionRevenue per store

Customer value per mailing

Wait time on hold in a call center

90

The DistributionRevenue per store

Customer value per mailing

Wait time on hold in a call center

91

Experimental Design and Response TypeDesign and analysis go hand in hand.

Design the experiment so that the analysis will be easy.

Fortunately, the design of the experiment is not contingent on the type of response variable that the data generates.

The same experimental design can be used for evaluating response rate, customer dollar value, lift in revenue, and many other features, regardless of whether they are continuous or categorical.

92

How Do You Analyze These Response Variables? There are many statistical techniques available for

modeling continuous responses that are not suited for either logistic regression or OLS techniques.

Advances in computing power and technology make such techniques available for business applications through statistical software.

These techniques require in-depth understanding of advanced and specialized statistical concepts, and should be used under the direction of a skilled statistician.

93

Idea ExchangeHow could you incorporate what you know about the cost and profit resulting from different settings (for example, cost of postage or higher profit from higher APR) to help you design an experiment?

94

Chapter 4: Design of Experiments

4.1 Why Experiment?

4.2 Introduction

4.3 Multi-Factor Experiments

4.4 Orthogonality and Blocking

4.5 Business Experiments with Continuous Responses

4.6 Recommended Reading4.6 Recommended Reading

95

Recommended ReadingAriely, Dan. “Why Businesses Don’t Experiment.” Harvard Business Review. April 2010. http://hbr.org/2010/04/column-why-businesses-dont-experiment/ar/1

96

Recommended ReadingDavenport, Thomas. “How to Design Smart Business Experiments.” Harvard Business Review. February 2009. http://hbr.org/2009/02/how-to-design-smart-business-experiments/ar/1

97

Recommended ReadingMay, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapters 2 and 3

Recommended