25
Chapter 3 Chapter 3 Producing Data Producing Data

Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Embed Size (px)

Citation preview

Page 1: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Chapter 3Chapter 3

Producing DataProducing Data

Page 2: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Types of data collectedTypes of data collected

• Anecdotal data – data collected Anecdotal data – data collected haphazardly (not representative!!)haphazardly (not representative!!)

• Available data – existing data Available data – existing data (examples: internet, library, census (examples: internet, library, census bureau,….)bureau,….)

• Gather own data (takes money and Gather own data (takes money and time to get own data)time to get own data)

Page 3: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Some terminologySome terminology

• Population – the entire group of Population – the entire group of individuals or objects of interest individuals or objects of interest (answers the question: Who?)(answers the question: Who?)

• Sample – subset of the population on Sample – subset of the population on which information is obtained.which information is obtained.

• Census-sample is the entire Census-sample is the entire populationpopulation

• Variable – characteristics of interest Variable – characteristics of interest

Page 4: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Observational study vs Observational study vs ExperimentExperiment

• Observational study – A study that Observational study – A study that observes individuals and measures observes individuals and measures variables of interest but does not variables of interest but does not attempt to influence the response.attempt to influence the response.

• Experiment – A study that imposes Experiment – A study that imposes some treatment on individuals in some treatment on individuals in order to record their response.order to record their response.

Page 5: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Types of variablesTypes of variables

• Response variable – the outcome of Response variable – the outcome of the study.the study.

• Explanatory variable – variable(s) that Explanatory variable – variable(s) that attempt to explain the changes in the attempt to explain the changes in the responseresponse

Examples: Examples: Smoking and lung cancerSmoking and lung cancer

Running on a treadmill and heart Running on a treadmill and heart raterate

Page 6: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Classroom ExamplesClassroom ExamplesOne study of cell phones and the risk of brain cancer looked at a group of 469 One study of cell phones and the risk of brain cancer looked at a group of 469

people who have brain cancer. The investigators matched each cancer people who have brain cancer. The investigators matched each cancer patient with a person of the same sex, age, and race who did not have brain patient with a person of the same sex, age, and race who did not have brain cancer, then asked about use of cell phones. Result: “Our data suggest that cancer, then asked about use of cell phones. Result: “Our data suggest that use of handheld cellular telephones is not associated with the risk of brain use of handheld cellular telephones is not associated with the risk of brain cancer.” Is this an observational study or experiment? Why? What are the cancer.” Is this an observational study or experiment? Why? What are the explanatory and response variables?explanatory and response variables?

A typical hour of prime-time television shows 3-5 violent acts. Linking family A typical hour of prime-time television shows 3-5 violent acts. Linking family interviews and police records shows a clear association between time spent interviews and police records shows a clear association between time spent watching TV as a child and later aggressive behavior. Is this an watching TV as a child and later aggressive behavior. Is this an observational study or experiment? What are the explanatory and response observational study or experiment? What are the explanatory and response variables? Suggest some lurking variables that could explain the aggressive variables? Suggest some lurking variables that could explain the aggressive behavior.behavior.

An educational software company wants to compare the effectiveness of its An educational software company wants to compare the effectiveness of its computer animation for teaching cell biology with that of a textbook computer animation for teaching cell biology with that of a textbook presentation. The company tests the biological knowledge of each group of presentation. The company tests the biological knowledge of each group of first year college students, then randomly divides them into two groups. first year college students, then randomly divides them into two groups. One group uses the animation, and the other studies the text. The company One group uses the animation, and the other studies the text. The company retests all the students and compares the increase in understanding of cell retests all the students and compares the increase in understanding of cell biology in the two groups. Is this an observational study or experiment? biology in the two groups. Is this an observational study or experiment? What are the explanatory and response variables?What are the explanatory and response variables?

Page 7: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

3.1 Design of 3.1 Design of ExperimentsExperiments

• Experimental units – individual on which experiment is done.Experimental units – individual on which experiment is done.• Treatment – specific experimental conditionTreatment – specific experimental condition• Factors = explanatory variablesFactors = explanatory variables• Placebo – false treatment to control for psychological effects. Placebo – false treatment to control for psychological effects.

Example: Gastric freezing is a clever treatment for ulcers in the upper Example: Gastric freezing is a clever treatment for ulcers in the upper intestine. The patient swallows a deflated balloon with tubes intestine. The patient swallows a deflated balloon with tubes attached, then a refrigerated liquid is pumped through the balloon for attached, then a refrigerated liquid is pumped through the balloon for an hour (cooling will reduce production of acid and relieve ulcers). An an hour (cooling will reduce production of acid and relieve ulcers). An experiment reported in the Journal of the American Medical experiment reported in the Journal of the American Medical Association showed that gastric freezing did reduce acid production Association showed that gastric freezing did reduce acid production and relieve ulcer pain. Later experiment included a control group and relieve ulcer pain. Later experiment included a control group (34% of the treatment group improved…..38% of the placebo group (34% of the treatment group improved…..38% of the placebo group improved).improved).

• Joint effects – combination of levels of two or more factors. Example: Joint effects – combination of levels of two or more factors. Example: A maker of fabric for clothing is setting up a new line to finish the raw A maker of fabric for clothing is setting up a new line to finish the raw fabric. The line will use either metal rollers or natural-bristle rollers fabric. The line will use either metal rollers or natural-bristle rollers to raise the surface of the fabric; a dyeing cycle time of either 30 to raise the surface of the fabric; a dyeing cycle time of either 30 minutes or 40 minutes and a temperature of either 150 or 175 degrees minutes or 40 minutes and a temperature of either 150 or 175 degrees Celsius. Four specimens of fabric will be subjected to each treatment Celsius. Four specimens of fabric will be subjected to each treatment and scored for quality. What are the factors and the treatments? How and scored for quality. What are the factors and the treatments? How many units (fabric specimens) does the experiment require?many units (fabric specimens) does the experiment require?

Page 8: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Experiments continuedExperiments continued

• Experiments provide good evidence for Experiments provide good evidence for causation (able to control lurking causation (able to control lurking variables)variables)

• Confounded variables – variable(s) Confounded variables – variable(s) associated with the response, but are associated with the response, but are not of interest; effects cannot be not of interest; effects cannot be separated from the effect of the separated from the effect of the explanatory explanatory variable on the explanatory explanatory variable on the response Bias – systematically favors response Bias – systematically favors certain outcomes.certain outcomes.

Page 9: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Experiments continuedExperiments continued

• Randomization is very important in Randomization is very important in experiments…helps to ensure groups experiments…helps to ensure groups are as similar as possible.are as similar as possible.

• The three principles of Experimental The three principles of Experimental Design areDesign are– ControlControl– RandomizeRandomize– RepeatRepeat

• How can we randomize? Draw names How can we randomize? Draw names out of a hat, use table of random digits, out of a hat, use table of random digits, computer software (calculator), phone-computer software (calculator), phone-random digit dialingrandom digit dialing

Page 10: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Using R to randomizeUsing R to randomize

• First, you need to set the seedFirst, you need to set the seed• > set.seed(put seed number in here)> set.seed(put seed number in here)• Then sampleThen sample• >sample(seq(1:n),sample >sample(seq(1:n),sample

size,replace=FALSE)size,replace=FALSE)• Assign class to two groupsAssign class to two groups

Page 11: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Completely Randomized Completely Randomized Design (with one Design (with one

treatment group and one treatment group and one control group)control group)

Random

Assignment

Group 1-Treatment

Group 2- Control

Compare results

Page 12: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

More on ExperimentsMore on Experiments

• Single blind – individual receiving Single blind – individual receiving treatment does not know what treatment does not know what treatment they are receiving.treatment they are receiving.

• Double blind – individual getting Double blind – individual getting treatment and individual recording treatment and individual recording outcome do not know which outcome do not know which treatment was administered.treatment was administered.

Page 13: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Block designBlock design

• One way to control for confounding One way to control for confounding variables is to block on them.variables is to block on them.

• A block design first breaks the A block design first breaks the experimental units into blocks experimental units into blocks according to the “blocking variable” according to the “blocking variable” (for example, if one is blocking on (for example, if one is blocking on gender, first place units into female gender, first place units into female and male “blocks”).and male “blocks”).

Page 14: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Example of Block DesignExample of Block Design

• The progress of a type of cancer The progress of a type of cancer differs in women and men. A clinical differs in women and men. A clinical experiment to compare three experiment to compare three therapies for this cancer therefore therapies for this cancer therefore treats sex as a blocking variable. Two treats sex as a blocking variable. Two separate randomizations are done, separate randomizations are done, one assigning the female subjects to one assigning the female subjects to the treatment and the other assigning the treatment and the other assigning the male subjects. Draw this design.the male subjects. Draw this design.

Page 15: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Matched Pairs DesignMatched Pairs Design

• A special type of block design is A special type of block design is called Matched pairs design.called Matched pairs design.

• Can only compare two treatments Can only compare two treatments (hence the “pairs”).(hence the “pairs”).

• Block usually consists of units as Block usually consists of units as similar as possible (self, twins, similar as possible (self, twins, husband and wife).husband and wife).

Page 16: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Example of Matched Example of Matched Pairs designPairs design

• Does talking on a hands-free cell phone Does talking on a hands-free cell phone distract drivers? Undergraduate distract drivers? Undergraduate students “drove” in a high-fidelity students “drove” in a high-fidelity driving simulator equipped with hands-driving simulator equipped with hands-free cell phone. Each student drove free cell phone. Each student drove once while talking on the cell phone and once while talking on the cell phone and once without talking on the cell phone. once without talking on the cell phone. The order for each student was randomly The order for each student was randomly assigned. The car ahead breaks: how assigned. The car ahead breaks: how quickly does the subject respond? quickly does the subject respond?

Page 17: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

3.2 Sampling Design3.2 Sampling Design

• Voluntary response sample (call-in Voluntary response sample (call-in polls, comment cards) are very polls, comment cards) are very biased…bad sampling design.biased…bad sampling design.

• Want to get a probability sample. A Want to get a probability sample. A probability sample is a sample probability sample is a sample chosen by chance (will look at four chosen by chance (will look at four of them in this course).of them in this course).

Page 18: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Different Types of Different Types of Probability samplesProbability samples

• SRS (Simple Random Sample) – every SRS (Simple Random Sample) – every sample of size n has the same chance of sample of size n has the same chance of being selected.being selected.

• Stratified random sample – first divide Stratified random sample – first divide into groups (strata), and then take a SRS into groups (strata), and then take a SRS from each stratum.from each stratum.

• Cluster sample – first divide into clusters, Cluster sample – first divide into clusters, and then take a SRS of clusters (once a and then take a SRS of clusters (once a cluster is chosen, every unit in that cluster is chosen, every unit in that cluster is in the sample).cluster is in the sample).

Page 19: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Probability samples Probability samples continuedcontinued

• Multistage sampling design –at each Multistage sampling design –at each stage, a probability sample is stage, a probability sample is obtained.obtained.

• Problems with sample surveysProblems with sample surveys– UndercoverageUndercoverage– NonresponseNonresponse– Response biasResponse bias

Page 20: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Towards statistical Towards statistical inferenceinference

• Use information from sample (known Use information from sample (known information) to infer about the population information) to infer about the population (unknown)(unknown)

• Statistics – information from a sampleStatistics – information from a sample• Parameter – information from a populationParameter – information from a population• Sampling variability – information from a Sampling variability – information from a

sample will differ from one sample to the sample will differ from one sample to the next.next.

• Sample statistics will have a predictable Sample statistics will have a predictable pattern (referred to as sampling pattern (referred to as sampling distribution)distribution)

Page 21: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Bias and variabilityBias and variability

Figure 3.14Introduction to the Practice of Statistics, Sixth Edition

© 2009 W.H. Freeman and Company

Page 22: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Figure 3.15Introduction to the Practice of Statistics, Sixth Edition

© 2009 W.H. Freeman and Company

Page 23: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

3.4 Continued3.4 Continued

• Want statistics that are unbiased Want statistics that are unbiased and have low variability.and have low variability.

• How can we eliminate or at least How can we eliminate or at least reduce the bias? Use a random reduce the bias? Use a random sample and good instruments.sample and good instruments.

• How to increase precision? Larger How to increase precision? Larger samplesample

• Population size does not effect Population size does not effect precision!!! Sample size does.precision!!! Sample size does.

Page 24: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

Statistical SignificanceStatistical Significance

Definition, pg 184Introduction to the Practice of Statistics, Sixth Edition

© 2009 W.H. Freeman and Company

Page 25: Chapter 3 Producing Data Types of data collected Anecdotal data – data collected haphazardly (not representative!!)Anecdotal data – data collected haphazardly

EthicsEthics

Definition, pg 225Introduction to the Practice of Statistics, Sixth Edition

© 2009 W.H. Freeman and Company