Observational versus experimental studies - web.pdx.edujong/S243LSW17/Lectures/ch07.pdfrandom sample of 1,535 national adults. Using random assignment, 719 heard the question in Form

Observational versus experimental studies

Observational study: record data on individuals without attempting to

influence the responses.

Experimental study: Deliberately impose a treatment on individuals

and record their responses. Influential factors can be controlled.

In 1992, several major medical organizations said that women should take

hormones such as estrogen after menopause, because women who took

hormones seemed to reduce their risk of a heart attack by 35% to 50%.

By 2002, several studies concluded that hormone replacement does not reduce

the risk of heart attacks. These studies had assigned women to either hormone

replacement or to dummy pills. The assignment was done by a coin toss.

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Sticky Note

Access to medical care, lifestyle, socioeconomic status, are possible confounding variables.

jong

Sticky Note

Random assignment of subjects to treatments

A 2013 Gallup study investigated how phrasing affects the opinions of Americans

regarding physician-assisted suicide. Telephone interviews were conducted with a

random sample of 1,535 national adults. Using random assignment, 719 heard the

question in Form A and 816 the one in Form B.

Form A: When a person has a disease that cannot be cured, do you think

doctors should be allowed by law to end the patient’s life by some painless

means if the patient and his or her family request it?

Form B: When a person has a disease that cannot be cured and is living in

severe pain, do you think doctors should or should not be allowed by law to

assist the patient to commit suicide if the patient requests it?

70% of those given Form A answered “should be allowed”, compared with only 51%

of those given Form B. What type of study is this?

A. Observational study.

B. Randomized experiment.

C. Neither. This is just anecdotal evidence.

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Sticky Note

The individuals surveyed did not choose which question they heard. This was done by Gallup, using random assignment.Therefore this is a comparative randomized experiment.

Confounding

Two variables are confounded when their effects on a response

variable cannot be distinguished.

Observational studies often fail to yield clear causal conclusions,

because the explanatory variable is confounded with lurking variables.

CONFOUNDING?

jong

Typewritten Text

jong

Typewritten Text

jong

Sticky Note

Experiments provide an opportunity for manipulating the environment and confounding variables.

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Sticky Note

An observational study. Higher education and higher access to health care are confounded.

Population versus sample

Sample: The part of the

population we actually examine

and for which we do have data

A statistic is a number

summarizing a characteristic of

a sample.

Population: The entire group

of individuals in which we are

interested but can’t usually

assess directly

A parameter is a number

summarizing a characteristic

of the population.

Population

Sample

jong

Typewritten Text

jong

Sticky Note

Parameters are often referred to by a Greek letter, but statistics are typically labeled using English letters.

The role of randomness in sampling

How do you select the individuals/units in a sample?

Voluntary response sampling: individuals choose to be involved

Convenience sampling: ask whoever is around (mall, street) or take

the next 10 units

Probability sampling: individuals or units are randomly selected;

the sampling process is unbiased

jong

Sticky Note

Biased

jong

Sticky Note

Biased

Ann Landers summarizing responses of readers: 70% of

(~10,000) parents wrote in to say that having kids was not

worth it—if they had to do it over again, they wouldn’t.

But a random sample showed that 91% of parents WOULD have kids again.

What do you think explains such drastically different responses?

Would you expect very different responses on the

potential legalizing of marijuana if you asked the first

people you saw on the parking lot of a university or the

first people you saw on the parking lot of a church?

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Typewritten Text

jong

Sticky Note

Most letters to newspapers are written by disgruntled people. They are not representative of the population of interest (all parents).

jong

Sticky Note

People you find in one location may be very different from the people you would find at a different location.

The simple random sample

A Simple Random Sample (SRS) is made of randomly selected

individuals. Each individual in the population has the same probability of

being in the sample. All possible samples of size n have the same

chance of being drawn.

How to choose an SRS?

Draw from a hat (lottery style)

Flip a coin

Use a table of published random numbers (Table A)

Use software that generates random numbers

Choosing a simple random sample with Table A

We need to select a random sample of 5 from a class of 20 students.

1) List and number all members of the population, which is the class of 20.

2) The number 20 is two digits long.

3) Parse the list of random digits into numbers that are two digits long. Here

we chose to start with line 103, for no particular reason.

45 46 71 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56

jong

Comment on Text

7

01 Alison

02 Amy

03 Brigitte

04 Darwin

05 Emily

06 Fernando

07 George

08 Harry

09 Henry

10 John

11 Kate

12 Max

13 Moe

14 Nancy

15 Ned

16 Paul

17 Ramon

18 Rupert

19 Tom

20 Victoria

• Remember that 1 is 01, 2 is 02, etc.

• If you were to hit 17 again before getting five people, don’t

sample Ramon twice—you just keep going.

4) Choose a random sample of size 5 by reading through the

list of two-digit numbers, starting with line 103 and on.

5) The first five random numbers matching numbers assigned

to people make the SRS.

45 46 71 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56

52 71 13 88 89 93 07 46 02 …

The first individual selected is Ramon, number 17. Then

Henry (09). That’s all we can get from line 103.

We then move on to line 104. The next three to be

selected are Moe, George, and Amy (13, 7, and 02).

jong

Comment on Text

7

A stratified random sample: make sure your sample has x,y,z% of

individuals of certain types

A multistage sample: select your final sample in stages, by

sampling within a sample within a sample

The National Youth Tobacco Survey administered in schools uses a sampling

procedure to generate a nationally representative sample of students in grades

6–12. Sampling is probabilistic and consists of selecting:

1) Counties as Primary Sampling Units (PSU).

2) Schools within each selected PSU.

3) Classes within each selected school.

America's State of Mind report was based on a probability

sample of Medco's de-identified database of members with 24

months of continuous insurance enrollment. Sampling was stratified by age

group and sex to match the demographics of the whole customer base.

Other probability samples

jong

Sticky Note

Economical but statistical analysis is more complex than for an SRS.

Sample surveys

A sample survey is an observational study that relies on a random

sample drawn from the entire population.

Opinion polls are sample surveys that typically use voter registries or

telephone numbers to select their samples.

In epidemiology, sample surveys are used to establish the incidence

(rate of new cases per year) and the prevalence (rate of all cases at

one point in time) of various medical conditions, diseases, and lifestyles.

These are typically stratified or multistage samples.

jong

Typewritten Text

Some survey challenges

Undercoverage: Parts of the population are systematically left out.

Nonresponse: Some people choose not to answer/participate.

Wording effects: Biased or leading questions, complicated/

confusing statements can influence survey results.

Response bias: Fancy term for lying or forgetting (especially on

sensitive/personal issues). Can be exacerbated by survey method (in

person vs. by phone or online).

jong

Typewritten Text

jong

Sticky Note

Surveys of households omit homeless individuals. Surveys conducted online tend to include a younger and more urban population.

jong

Typewritten Text

Remedy?

jong

Sticky Note

Stratified sampling

1995-2002

How bad is nonresponse?

The Census Bureau’s American Community Survey (ACS): ~ 2.5%

Via mail with reminders. Response is mandatory.

University of Chicago’s General Social Survey (GSS): ~ 30% - In person.

Pew Research Center methodology survey

up to ~ 90% in 2012

Private polling firms such as SurveyUSA:

~ 90% as of 2002 (stopped showing after that)

Phone (with interviewer or automated call)

or online.

A 2013 Gallup study investigated how phrasing affects the opinions of Americans

regarding physician-assisted suicide. Telephone interviews were conducted with a

random sample of 1,535 national adults. Using random assignment, 719 heard the

question in Form A and 816 the one in Form B.

Form A: When a person has a disease that cannot be cured, do you think

doctors should be allowed by law to end the patient’s life by some painless

means if the patient and his or her family request it?

Form B: When a person has a disease that cannot be cured and is living in

severe pain, do you think doctors should or should not be allowed by law to

assist the patient to commit suicide if the patient requests it?

Question wording resulted in a substantial difference in opinions: 70% of those

given Form A answered “should be allowed”, compared with only 51% of those

given Form B.

Some examples of

possible response bias

Comparative observational studies

Case-control studies start with 2 random samples of individuals with

different outcomes, and look for exposure factors in the subjects’ past

(“retrospective”).

Individuals with the condition are cases, and those without are controls.

Good for studying rare conditions. Selecting controls is challenging.

Cohort studies enlist individuals of common demographic, and keep

track of them over a long period of time (“prospective”). Individuals who

later develop a condition are compared with those who don’t.

Cohort studies examine the compounded effect of factors over time.

Good for studying common conditions. Very expansive.

Aflatoxicosis epidemics

Aflatoxins are secreted by a fungus found in damaged

crops and can cause severe poisoning and death.

The Kenya Ministry of Health investigated a 2004 outbreak of aflatoxicosis resulting

in over 300 cases of liver failure. A sample of 40 case-patients and 80 healthy

controls were asked how they had stored and prepared their maize.

The case-patients were randomly selected from a list of individuals admitted to a

hospital during the 2004 outbreak for unexplained acute jaundice.

Control individuals were selected to be as similar to the case-patients as possible,

yet randomly selected.

Preliminary data suggested that soil, microclimate, and farming practices

might have played a role, but not age or gender.

For each case-patient, two individuals from the patient’s village with no

history of jaundice symptoms were randomly selected.

jong

Typewritten Text

jong

Typewritten Text

jong

Sticky Note

The study findings indicate that improper corn storage was a principal source of poisoning.

The Nurses’ Health Study is one of the largest prospective

observational studies designed to examine factors that may

affect major chronic diseases in women.

2007 report on age-related memory loss:

About 20,000 women ages 70+ had completed telephone interviews every two

years to assess their memory with a set of cognitive tests. One of the findings:

the more women walked during their late 50s and 60s, the better their memory

score was at age 70 and older.

However, we cannot unambiguously conclude that walking has a protective

effect against memory loss.

Since 1976, the study has followed a cohort of over 100,000 registered nurses.

Every two years, they receive a follow-up questionnaire about diseases and

health-related topics. Response rate: ~ 90% each time.

jong

Typewritten Text

Documents

Observational versus experimental studies - web.pdx.edujong/S243LSW17/Lectures/ch07.pdfrandom sample of 1,535 national adults. Using random assignment, 719 heard the question in Form