Luc Wouters October 2013 Statistical Thinking and Smart Experimental Design

Luc Wouters October 2013 Statistical Thinking and Smart Experimental Design Slide 2 Luc Wouters October 2013 Statistical Thinking and Smart Experimental Design Slide 3 Statistical Thinking and Reasoning in Experimental Research 9h00 9h30: Welcome - Coffee 11h00 11h15: Comments & Question - Coffee break 12h30 -13h15: Lunch 15h00 15h15: Comments & Questions Coffee Break 16h45 17h15: Summary Concluding Remarks Comments & Questions 3 Slide 4 Statistical Thinking and Reasoning in Experimental Research Introduction Smart Research Design by Statistical Thinking Planning the Experiment Principles of Experimental Design Common Designs in Biosciences The Required Number of Replicates - Sample Size The Statistical Analysis The Study Protocol Interpretation & Reporting Concluding Remarks 4 Slide 5 Statistical Thinking and Smart Experimental Design I. Introduction 5 Slide 6 There is a problem with research in the biosciences 6 Scholl, et al. (Cell, 2009): Synthetic lethal interaction between oncogenic KRAS dependency and STK33 suppression in human cancer cells. Conclusion: cancer tumors could be destroyed by targeting STK33 protein Amgen Inc.: 24 researchers, 6 months labwork Unable to replicate results of Scholl, et al. Scholl, et al. (Cell, 2009): Synthetic lethal interaction between oncogenic KRAS dependency and STK33 suppression in human cancer cells. Conclusion: cancer tumors could be destroyed by targeting STK33 protein Amgen Inc.: 24 researchers, 6 months labwork Unable to replicate results of Scholl, et al. Slide 7 There is a problem with research in the biosciences 7 Sralini, et al. (Food Chem Toxicol, 2012): Long term toxicity of a roundup herbicide and a roundup-tolerant genetically modified maize Conclusion: GM maize and low concentrations of Roundup herbicide causes toxic effects (tumors) in rats Severe impact on general public and on interest of industry Referendum labeling of GM food in California, Bans on importation of GMOs in Russia and Kenya Sralini, et al. (Food Chem Toxicol, 2012): Long term toxicity of a roundup herbicide and a roundup-tolerant genetically modified maize Conclusion: GM maize and low concentrations of Roundup herbicide causes toxic effects (tumors) in rats Severe impact on general public and on interest of industry Referendum labeling of GM food in California, Bans on importation of GMOs in Russia and Kenya Slide 8 Sralini, et al. The critiques: 8 European Food Safety Authority (2012) review: inadequate design, analysis and reporting wrong strain of rats number of animals too small European Food Safety Authority (2012) review: inadequate design, analysis and reporting wrong strain of rats number of animals too small Slide 9 The problem of doing good science Some critical reviews 9 Ioannidis, 2005 Kilkenny, 2009 Loskalzo, 2012 Ioannidis, 2005 Kilkenny, 2009 Loskalzo, 2012 Slide 10 Things get even worse Retraction rates 10 21% of retractions were due to error 67% misconduct, including fraud or suspected fraud 21% of retractions were due to error 67% misconduct, including fraud or suspected fraud Slide 11 The integrity of our profession is put into question 11 Slide 12 Whats wrong with research in the biosciences? 12 Slide 13 Course objectives 13 Increase quality of research reports by: Statistical Thinking: to provide elements of a generic methodology to design insightful experiments Statistical Reasoning: to provide guidance for interpreting and reporting results Increase quality of research reports by: Statistical Thinking: to provide elements of a generic methodology to design insightful experiments Statistical Reasoning: to provide guidance for interpreting and reporting results Slide 14 This course prepares you for a dialogue 14 Slide 15 Statistical thinking leads to highly productive research 15 Set of statistical precepts (rules) accepted by his scientitst Kept research to proceed in orderly and planned fashion Open mind for the unexpected World record: 77 approved medicines over a period of 40 years 200 research papers/year Set of statistical precepts (rules) accepted by his scientitst Kept research to proceed in orderly and planned fashion Open mind for the unexpected World record: 77 approved medicines over a period of 40 years 200 research papers/year Slide 16 Statistical Thinking and Smart Experimental Design II. Smart Research Design by Statistical Thinking 16 Slide 17 The architecture of experimental research The two types of studies 17 Slide 18 The architecture of experimental research Research is a phased process 18 Slide 19 The architecture of experimental research Research is an iterative process 19 Slide 20 Modulating between the concrete and the abstract 20 Slide 21 Research styles may vary 21 The novelistThe data salvager The lab freakThe smart researcher Definition Design Data Collection Analysis Reporting Slide 22 The smart researcher 22 time spent planning and designing an experiment at the outset saves time and money in the long run optimizes the lab work and reduces the time spent in the lab the design of the experiment is most important and governs how data are analyzed minimizes time spent at the data analysis stage can look ahead to the reporting phase with peace of mind since early phases of study are carefully planned and formalized (proposal, protocol) time spent planning and designing an experiment at the outset saves time and money in the long run optimizes the lab work and reduces the time spent in the lab the design of the experiment is most important and governs how data are analyzed minimizes time spent at the data analysis stage can look ahead to the reporting phase with peace of mind since early phases of study are carefully planned and formalized (proposal, protocol) Slide 23 The design of insightful experiments is a process informed by statistical thinking 23 Slide 24 The Seven Principles of Statistical Thinking 24 1.Time spent thinking on the conceptualization and design of an experiment is time wisely spent 2.The design of an experiment reflects the contributions from different sources of variability 3.The design of an experiment balances between its internal validity and external validity 4.Good experimental practice provides the clue to bias minimization 5.Good experimental design is the clue to the control of variability 6.Experimental design integrates various disciplines; 7.A priori consideration of statistical power is an indispensable pillar of an effective experiment. 1.Time spent thinking on the conceptualization and design of an experiment is time wisely spent 2.The design of an experiment reflects the contributions from different sources of variability 3.The design of an experiment balances between its internal validity and external validity 4.Good experimental practice provides the clue to bias minimization 5.Good experimental design is the clue to the control of variability 6.Experimental design integrates various disciplines; 7.A priori consideration of statistical power is an indispensable pillar of an effective experiment. Slide 25 Statistical Thinking and Smart Experimental Design III. Planning the Experiment 25 Slide 26 The planning process 26 Slide 27 The planning process 27 Slide 28 The planning process 28 Slide 29 Limit number of research objectives 29 Example Sralini study: 10 treatment groups, females and males = 20 groups Only 10 animals/group 50 animals/group golden standard in toxicology Example Sralini study: 10 treatment groups, females and males = 20 groups Only 10 animals/group 50 animals/group golden standard in toxicology Number of research objectives should be limited ( 3 objectives) Slide 30 Importance of auxiliary hypotheses 30 Example Sralini study: Use of Sprague-Dawley breed of rats as model for human cancer Known to have higher mortality in 2 year studies Prone to develop spontaneous tumors Example Sralini study: Use of Sprague-Dawley breed of rats as model for human cancer Known to have higher mortality in 2 year studies Prone to develop spontaneous tumors Reliance on auxiliary hypotheses is the rule rather than the exception in testing scientific hypotheses (Hempel, 1966) Slide 31 Types of experiments 31 Slide 32 Abuse of exploratory experiments 32 Most published research exploratory Too often published as if confirmatory experiments Do not unequivocally answer research question Use of same data that generated research hypotheses to prove these hypotheses involves circular reasoning Inferential statistics should be interpreted and published as for exploratory purposes only Sralini study was actually conceived as exploratory Most published research exploratory Too often published as if confirmatory experiments Do not unequivocally answer research question Use of same data that generated research hypotheses to prove these hypotheses involves circular reasoning Inferential statistics should be interpreted and published as for exploratory purposes only Sralini study was actually conceived as exploratory Slide 33 Role of the pilot experiment 33 Slide 34 Types of experiments objective - usage 34 Slide 35 Statistical Thinking and Smart Experimental Design IV. Principles of Statistical Design 35 Slide 36 Some terminology 36 Factor: the condition that we manipulate in the experiment (e.g. concentration of a drug, temperature) Factor level: the value of that condition (e.g. 10 -5 M, 40C) Treatment: combination of factor levels (e.g. 10 -5 M and 40C) Response or dependent variable: characteristic that is measured Experimental unit: the smallest physical entity to which a treatment is independently applied Observational unit: physical entity on which the response is measured See Appendix A, Glossary of Statistical Terms Factor: the condition that we manipulate in the experiment (e.g. concentration of a drug, temperature) Factor level: the value of that condition (e.g. 10 -5 M, 40C) Treatment: combination of factor levels (e.g. 10 -5 M and 40C) Response or dependent variable: characteristic that is measured Experimental unit: the smallest physical entity to which a treatment is independently applied Observational unit: physical entity on which the response is measured See Appendix A, Glossary of Statistical Terms Slide 37 The structure of the response 37 Additive model Treatment effects constant Treatment effects in one unit do not affect other units Additive model Treatment effects constant Treatment effects in one unit do not affect other units Assumptions Slide 38 Defining the experimental unit 38 Slide 39 The choice of the experimental unit 39 Temme et al. (2001) Comparison of two genetic strains of mice, wild-type and connexin32-defficient Diameters of bile caniliculi in livers Statistically significant difference between two strains (P < 0.005 = 1/200) Experimental unit = Cell Is this correct? Temme et al. (2001) Comparison of two genetic strains of mice, wild-type and connexin32-defficient Diameters of bile caniliculi in livers Statistically significant difference between two strains (P < 0.005 = 1/200) Experimental unit = Cell Is this correct? Means SEM from 3 livers Slide 40 The choice of the experimental unit 40 Experimental unit = animal Results not conclusive at all Wrong choice made out of ignorance, or out of convenience? This was published in peer reviewed journal !!! Experimental unit = animal Results not conclusive at all Wrong choice made out of ignorance, or out of convenience? This was published in peer reviewed journal !!! Slide 41 The choice of the experimental unit Rivenson study on N-nitrosamines 41 Rats housed 3 / cage Treatment supplied in drinking water Impossible to treat any 2 rats differently Rats within a cage not independent Rat not experimental unit Experimental unit = cage Rats housed 3 / cage Treatment supplied in drinking water Impossible to treat any 2 rats differently Rats within a cage not independent Rat not experimental unit Experimental unit = cage Same remarks for Sralini study Rats housed 2/cage Rats in same cage are not independent of one another Number of experimental units is not 10/treatment, but 5/treatment Same remarks for Sralini study Rats housed 2/cage Rats in same cage are not independent of one another Number of experimental units is not 10/treatment, but 5/treatment Slide 42 The choice of the experimental unit The reverse case 42 Skin irritation test in volunteers 2 compounds Backs of volunteers divided in 8 test areas Each compound randomly assigned to 4 different test areas Experimental unit = Test Area, not volunteer Skin irritation test in volunteers 2 compounds Backs of volunteers divided in 8 test areas Each compound randomly assigned to 4 different test areas Experimental unit = Test Area, not volunteer Slide 43 Why engineers dont care very much about statistics 43 Slide 44 The basic dilemma: balancing internal and external validity 44 Slide 45 Failure to minimize bias ruins an experiment 45 Failure to minimize bias results in lost experiments Failure to control variability can sometimes be remediated after the facts Allocation to treatment is most important source of bias Slide 46 Basic strategies for maximizing Signal/Noise ratio 46 Slide 47 47 A control is a standard treatment condition against which all others may be compared Slide 48 48 Single blinding the condition under which investigators are uninformed as to the treatment condition of experimental units Double blinding the condition under which both experimenter and observer are uninformed as to the treatment condition of experimental units Single blinding the condition under which investigators are uninformed as to the treatment condition of experimental units Double blinding the condition under which both experimenter and observer are uninformed as to the treatment condition of experimental units Neutralizes investigator bias Neutralizes investigator bias and bias in the observers scoring system In toxicological histopathology, both blinded and unblinded evaluation are recommended Slide 49 49 Group blinding Individual blinding Group blinding Individual blinding Entire treatment groups are coded: A, B, C Each experimental unit is coded individually Codes are kept in randomization list Involves independent person that maintains list and prepares treatments Each experimental unit is coded individually Codes are kept in randomization list Involves independent person that maintains list and prepares treatments Slide 50 50 Describes practical actions Guidelines for lab technicians Manipulation of experimental units (animals, etc.) Materials used Logistics Definitions of measurements and scoring methods Data collection & processing Personal responsibilities Describes practical actions Guidelines for lab technicians Manipulation of experimental units (animals, etc.) Materials used Logistics Definitions of measurements and scoring methods Data collection & processing Personal responsibilities A detailed protocol is imperative to minimize potential bias Slide 51 Requirements for a good experiment 51 Slide 52 52 Calibration is an operation that compares the output of a measurement device to standards of known value, leading to correction of the values indicated by the measurement device Neutralizes bias in the investigators measurement system Slide 53 53 Randomization ensures that the effect of uncontrolled sources of variability has equal probability in all treatment groups Randomization provides a rigorous method to assess the uncertainty (standard error) in treatment comparisons Formal randomization involves a chance dependent mechanism, e.g. flip of a coin, by computer (Appendix B) Formal randomization is not haphazard allocation Moment of randomization such that the randomization covers all substantial sources of variation, i.e. immediately before treatment administration Additional randomization may be required at different stages Randomization ensures that the effect of uncontrolled sources of variability has equal probability in all treatment groups Randomization provides a rigorous method to assess the uncertainty (standard error) in treatment comparisons Formal randomization involves a chance dependent mechanism, e.g. flip of a coin, by computer (Appendix B) Formal randomization is not haphazard allocation Moment of randomization such that the randomization covers all substantial sources of variation, i.e. immediately before treatment administration Additional randomization may be required at different stages Slide 54 54 Randomized versus Systematic Allocation First unit treatment A, second treatment B, etc. yielding sequence AB, AB, AB Other sequences as AB BA, BA AB, etc. possible First unit treatment A, second treatment B, etc. yielding sequence AB, AB, AB Other sequences as AB BA, BA AB, etc. possible any systematic arrangement might coincide with a specific pattern in the variability biased estimate of treatment effect randomization is a necessary condition for statistical analysis without randomization analysis is not justified any systematic arrangement might coincide with a specific pattern in the variability biased estimate of treatment effect randomization is a necessary condition for statistical analysis without randomization analysis is not justified. Slide 55 55 Improvements of Randomization Schemes. Balancing means causes imbalance in variability Invalidates subsequent statistical analysis Balancing means causes imbalance in variability Invalidates subsequent statistical analysis Slide 56 56 Haphazard Allocation versus Formal Randomization Haphazard treatment allocation is NOT the same as formal randomization Haphazard allocation is subjective and can introduce bias Proper randomization requires a physical randomizing device Haphazard treatment allocation is NOT the same as formal randomization Haphazard allocation is subjective and can introduce bias Proper randomization requires a physical randomizing device Effect of 2 diets on body weight of rats 12 animals arrive in 1 cage technician takes animals out of cage first 6 animals diet A, remaining 6 diet B Effect of 2 diets on body weight of rats 12 animals arrive in 1 cage technician takes animals out of cage first 6 animals diet A, remaining 6 diet B Isnt this Random ? heavy animals react slower and are easier to catch than smaller animals first 6 animals weigh more than remaining 6 heavy animals react slower and are easier to catch than smaller animals first 6 animals weigh more than remaining 6 Slide 57 57 Moment of Randomization Effect of treatment confounded with systematic errors due to incubation Randomization should be done as late as possible just before treatment application Randomization sequence should be maintained throughout the experiment Effect of treatment confounded with systematic errors due to incubation Randomization should be done as late as possible just before treatment application Randomization sequence should be maintained throughout the experiment Rat brain cells harvested and seeded on Petri-dishes (1 animal = 1 dish) Dishes randomly divided in two groups Two sets of Petri dishes placed in incubator After 72 hours group 1 treated with drug A and group 2 with drug B Rat brain cells harvested and seeded on Petri-dishes (1 animal = 1 dish) Dishes randomly divided in two groups Two sets of Petri dishes placed in incubator After 72 hours group 1 treated with drug A and group 2 with drug B Slide 58 Basic strategies for maximizing Signal/Noise ratio 58 Slide 59 59 Simple random sample a selection of units from a defined target population such that each unit has an equal chance of being selected Neutralizes sampling bias Provides the foundation for the population model of inference Particularly important in genetic research (Nahon & Shoemaker) Neutralizes sampling bias Provides the foundation for the population model of inference Particularly important in genetic research (Nahon & Shoemaker) A random sample is a stringent requirement that is very difficult to fulfill in biomedical research Switch to randomization model of statistical inference in which inference is restricted to the actual experiment only Slide 60 60 Reducing the intrinsic variability and bias by: use of genetically uniform animals use of phenotypically uniform animals environmental control nutritional control acclimatization measurement system Reducing the intrinsic variability and bias by: use of genetically uniform animals use of phenotypically uniform animals environmental control nutritional control acclimatization measurement system Beware of external validity Slide 61 Basic strategies for maximizing Signal/Noise ratio 61 Slide 62 62 Replication can be an effective strategy to control variability (precision) Replication is an expensive strategy to control variability Precision of experiment is quantified by standard error of difference between two treatment means Replication is also needed to estimate SD Replication is only effective at the level of the true experimental unit Slide 63 63 Replication is in most cases only effective at the level of the true experimental unit Replication at the subsample level (observational unit) makes only sense when the variability at this level is substantial as compared to the variability between the experimental units Replication is in most cases only effective at the level of the true experimental unit Replication at the subsample level (observational unit) makes only sense when the variability at this level is substantial as compared to the variability between the experimental units Slide 64 64 A blocking factor is a known and unavoidable source of variability that adds or subtracts an offset to every experimental unit in the block Typical blocking factors age group microtiter plate run of assay batch of material subject (animal) date operator Typical blocking factors age group microtiter plate run of assay batch of material subject (animal) date operator Slide 65 65 A covariate is an uncontrollable but measurable attribute of the experimental unit or its environment that is unaffected by the treatments but may have an influence on the measured response Covariates filter out one particular source of variation. Rather than blocking it represents a quantifiable (by coefficient ) attribute of the system studied. Typical covariates baseline measurement weight age ambient temperature Typical covariates baseline measurement weight age ambient temperature Covariance adjustment requires slopes to be parallel Slide 66 Requirements for a good experiment 66 Slide 67 Simplicity of design 67 Complex Design Protocol adherence? Statistical analysis? Simple designs require a simple analysis Slide 68 The calculation of uncertainty (error) 68 It is possible and indeed it is all too frequent, for an experiment to be so conducted that no valid estimate of error is available (Fisher, 1935) Experimental units Respond independently to treatment Differ only in random way from experimental units in other treatment groups Without a valid estimate of error, a valid statistical analysis is impossible Validity of error estimation Slide 69 Statistical Thinking and Smart Experimental Design V. Common Designs in Biosciences 69 Slide 70 The jungle of experimental designs 70 Split Plot Completely Randomized Design Randomized Complete Block Design Latin Square Design Greco-Latin Square Design Youden Square Design Factorial Design Plackett-Burman Design Slide 71 Experimental design is an integrative concept 71 An experimental design is a synthetic approach to minimize bias and control variability Slide 72 Three aspects of experimental design determine complexity and resources 72 Slide 73 Completely Randomized Design 73 Studies the effect of a single independent variable (treatment) One-way treatment design Completely randomized errorcontrol design Most common design (default design), easy to implement Studies the effect of a single independent variable (treatment) One-way treatment design Completely randomized errorcontrol design Most common design (default design), easy to implement Lack of precision in comparisons is major drawback Slide 74 Completely Randomized Design Example 1 74 Effect of treatment on proliferation of gastric epithelial cells in rats 2 drugs & 2 controls (vehicle) = 4 experimental groups 10 animals/group, individual housing in cages Cages numbered and treatments assigned randomly to cage numbers Effect of treatment on proliferation of gastric epithelial cells in rats 2 drugs & 2 controls (vehicle) = 4 experimental groups 10 animals/group, individual housing in cages Cages numbered and treatments assigned randomly to cage numbers Cages sequentially put in rack Blinding of laboratory staff and of histological evaluation, Treatment code = cage number (individual blinding) Cages sequentially put in rack Blinding of laboratory staff and of histological evaluation, Treatment code = cage number (individual blinding) Slide 75 Completely Randomized Design Example 2 75 Bias due to plate location effects in microtiter plates (Burrows (1984), causes parabolic patterns: Underlying causes unkown Solution by Faessel (1999): Random allocation of treatments to the wells Bias of plate location is now introduced as random variation Solution by Faessel (1999): Random allocation of treatments to the wells Bias of plate location is now introduced as random variation Randomization can be used to transform bias into noise Slide 76 A convenient way to randomize microtiter plates 76 dilutions made in tube rack 1 tubes are numbered MS Excel used to make randomization map map taped on top of rack 2 tubes from rack 1 are pushed to corresponding number in rack 2 multipipette used to apply drugs to 96- well plate results entered in MS Excel and sorted dilutions made in tube rack 1 tubes are numbered MS Excel used to make randomization map map taped on top of rack 2 tubes from rack 1 are pushed to corresponding number in rack 2 multipipette used to apply drugs to 96- well plate results entered in MS Excel and sorted Alternative methods: design (Burrows et al. 1984, Schlain et al. 2001) statistical model (Straetemans et al. 2005) Alternative methods: design (Burrows et al. 1984, Schlain et al. 2001) statistical model (Straetemans et al. 2005) Slide 77 Factorial Design (full factorial) 77 Studies the joint effect of two or more independent variables (factors) Considers main effects and interactions Factorial treatment design Completely Randomized error-control design Studies the joint effect of two or more independent variables (factors) Considers main effects and interactions Factorial treatment design Completely Randomized error-control design Slide 78 Factorial Design Example 78 No interaction, additive effects Interaction present, superadditive effects Slide 79 Factorial Designs Interaction and Drug Synergy 79 A hypothetical factorial experiment of a drug combined with itself at 0 and 0.2 mg/l Synergy of a drug with itself? The pharmacological concept of drug synergy requires more than just statistical interaction Slide 80 Factorial Design Further examples 80 Effects of 2 different culture media and 2 different incubation times on growth of a viral culture Microarray experiments interaction between mutants and time of observation For cDNA microarrays specialized procedures required (see Glonek and Solomon, 2004; Banerjee and Mukerjee, 2007) Effects of 2 different culture media and 2 different incubation times on growth of a viral culture Microarray experiments interaction between mutants and time of observation For cDNA microarrays specialized procedures required (see Glonek and Solomon, 2004; Banerjee and Mukerjee, 2007) The need for a factorial experiment should be recognized at the design phase The analysis should make use of the proper methods The need for a factorial experiment should be recognized at the design phase The analysis should make use of the proper methods Slide 81 Factorial Design Higher dimensional designs 81 3 x 3 x 3 design = 27 treatments, 2 replicates = 54 experimental units is this manageable? Interpretation of higher (3-way) interactions? 3 x 3 x 3 design = 27 treatments, 2 replicates = 54 experimental units is this manageable? Interpretation of higher (3-way) interactions? Neglect higher order interactions Do not test ALL combinations Fractional Factorial Design Slide 82 Factorial Design Efficiency of factorial design 82 No important interaction from statistical AND scientific point of view Effect of factor A: [(13 18) + (10 -15)]/2 = -5 d.f. = 4 x (n 1) Effect of factor B: [(15 18) + (10 -13)]/2 = -3 d.f. = 4 x (n 1) Interaction = (18 + 10) (15 +13) = 0 Increase external validity Only valid method to study interaction between factors Slide 83 Randomized Complete Block Design 83 Studies the effect of a single independent variable (treatment) in presence of a single isolated extraneous source of variability (blocks) One way treatment design One way block error-control design Increase of precision Eliminates possible imbalance in blocking factor Increase of precision Eliminates possible imbalance in blocking factor Assumes treatment effect the same among all blocks Blocking characteristic must be related to response or else loss of efficiency due to loss in df Assumes treatment effect the same among all blocks Blocking characteristic must be related to response or else loss of efficiency due to loss in df Slide 84 Completely Randomized Design Example 84 Effect of treatment on proliferation of gastric epithelial cells in rats 2 drugs & 2 controls (vehicle) = 4 experimental groups 10 animals/group, individual housing in cages Cages numbered and treatments assigned randomly to cage numbers Blinding of laboratory staff Shelf height will probably influence the results 1 shelf = 1 block 8 animals (cages) / shelf, 5 shelves Randomize treatments per shelf 2 animals / shelf / treatment 1 shelf = 1 block 8 animals (cages) / shelf, 5 shelves Randomize treatments per shelf 2 animals / shelf / treatment Not restricted to shelf height, other characteristics can also be included (e.g. body weight) Slide 85 Randomized Complete Block Design Paired Design 85 Special case of RCB design with only 2 treatments and 2 EU/block Example Cardiomyocyte experiment Special case of RCB design with only 2 treatments and 2 EU/block Example Cardiomyocyte experiment Slide 86 Randomized Complete Block Design Paired Design Example 86 Experimental Design Results Left panel ignores pairing groups overlap, standard error drug-vehicle = 7.83 Right panel pairs are connected by lines consistent increase in drug treated dishes standard error drug-vehicle = 2.51 Left panel ignores pairing groups overlap, standard error drug-vehicle = 7.83 Right panel pairs are connected by lines consistent increase in drug treated dishes standard error drug-vehicle = 2.51 The CRD requires 8 times more EUs then the paired design Slide 87 Latin Square Design 87 Studies the effect of a single independent variable (treatment) in presence of two isolated, extraneous sources of variability (blocks) One-way treatment design Two-way block error-control design In 1 k x k Latin square only k EUs/treatment More EUs by stacking Latin squares upon each other or next to each other Randomizing the rows (columns) of the stacked LS design can be advantageous In 1 k x k Latin square only k EUs/treatment More EUs by stacking Latin squares upon each other or next to each other Randomizing the rows (columns) of the stacked LS design can be advantageous Latin squares can be used to eliminate row-column effects in 96-well microtiter plates, 1 plate = 2 x (5 x 5 LS) Slide 88 Split Plot Design 88 A split-plot design recognizes two types of experimental units: Plots Subplots Two-way crossed (factorial) treatment design Split plot error-control design Plots randomly assigned to primary factor (color) Subplots randomly assigned to secondary factor (symbol) Plots randomly assigned to primary factor (color) Subplots randomly assigned to secondary factor (symbol) Slide 89 Split Plot design Example 89 Effects of 2 diets and vitamin supplement on weight gain in rats Rats housed 2 / cage (independence?) Cages randomly allocated to diet A or diet B Individual rats color marked randomly selected to receive vitamin or not Main plot factor = diet Subplot factor = vitamin Main plot factor = diet Subplot factor = vitamin Slide 90 Repeated Measures Design 90 Two independent variables time and treatment Each is associated with different experimental units Special case of split plot design Slide 91 Statistical Thinking and Smart Experimental Design VI. The Required Number of Replicates Sample Size 91 Slide 92 Determining sample size is a risk-cost assessment 92 Replicates Cost Replicates Cost Uncertainty Confidence Uncertainty Confidence Choose sample size just big enough to give confidence that any biologically meaningful effect can be detected Slide 93 The context of biomedical experiments 93 Point Estimation Interval Estimation Hypothesis Testing Statistical Context Sample size depends on Assumptions Study Specifications Study Objectives Study Design Study Design Slide 94 The hypothesis testing context 94 Population Model Null Hypothesis Alternative Hypothesis Specify allowable false positive & false negative rate Level of significance 0.01, 0.05, 0.10 Power 80% or 90% Sample size Alternative hypothesis, effect size: Size of difference Variability Alternative hypothesis, effect size: Size of difference Variability Number of treatments Number of blocks Number of treatments Number of blocks Slide 95 Major determinants of sample size 95 Slide 96 Approaches for determining sample size 96 Formal power analysis approaches Requires input of significance level power alternative hypothesis (i.e. effect size ) smallest difference variability (SD) R-package pwr (Champely, 2009) Lehrs equation: n per group single comparison (2 groups) significance level 0.05 power 80%, other powers possible Rule-of-thumb: Meads Resource Requirement Equation E = N - T - B E = df for error, 12 E 20, blocked exps. 6 N = df total sample size (no. EUs 1) T = df for treatment combination B = df for blocks and covariates df = number of occurences 1 Slide 97 Approaches for determining sample size Examples 97 Effect size = difference in means / standard deviation = 0.2 small; = 0.5 medium; = 0.8 large (Cohen) Effect size = difference in means / standard deviation = 0.2 small; = 0.5 medium; = 0.8 large (Cohen) standard deviation of either group or pooled SD Cardiomyocytes completely randomized design Treated - Control = 10; SD = 12.5 = 10/12.5 = 0.8 Cardiomyocytes completely randomized design Treated - Control = 10; SD = 12.5 = 10/12.5 = 0.8 > require(pwr) > pwr.t.test(d=0.8,power=0.8,sig.level=0.05,type="two.sample", + alternative="two.sided") Lehrs equation 16/0.64 = 25 Two-sample t test power calculation n = 25.52457 d = 0.8 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Two-sample t test power calculation n = 25.52457 d = 0.8 sig.level = 0.05 power = 0.8 alternative = two.sided NOTE: n is number in *each* group Slide 98 Approaches for determining sample size Examples (cont.) 98 Rule-of-thumb: Meads Resource Requirement Equation E = N - T - B E = df for error, 12 E 20, 6 blocked exps. N = df total sample size (no. EUs 1) T = df for treatment combination B = df for blocks and covariates Cardiomyocytes completely randomized: N = 9, T = 1, B = 0 E = 10, too small (min. = 12) Cardiomyocytes completely randomized: N = 9, T = 1, B = 0 E = 10, too small (min. = 12) Cardiomyocytes paired experiment: N = 9, T = 1, B = 4 E = 4, too small (min. = 6) Cardiomyocytes paired experiment: N = 9, T = 1, B = 4 E = 4, too small (min. = 6) Slide 99 Multiplicity and sample size 99 Slide 100 Multiplicity and sample size 100 2 tests: 20% increase RSS 3 tests: 30% increase RSS 4 tests: 40% increase RSS 10 tests: 70% increase RSS 2 tests: 20% increase RSS 3 tests: 30% increase RSS 4 tests: 40% increase RSS 10 tests: 70% increase RSS For single comparison, large sample sizes required for medium ( = 0.5) to large ( = 0.8) effects Effects in early research usually very large For single comparison, large sample sizes required for medium ( = 0.5) to large ( = 0.8) effects Effects in early research usually very large Slide 101 Statistical Thinking and Smart Experimental Design VII. The Statistical Analysis 101 Slide 102 The Statistical Triangle Design, Model, and Analysis are Interdependent 102 Slide 103 Every experimental design is underpinned by a statistical model 103 Statistical analysis: Fit model to data Compare treatment effect(s) with error Statistical analysis: Fit model to data Compare treatment effect(s) with error Slide 104 Measurement theory defines allowable operations and statements 104 Nominal Naming Ex. Color, gender, etc. Frequency tables Naming Ex. Color, gender, etc. Frequency tables Ordinal Ranking and ordering Ex. Scoring as none, minor, moderate,. Differences between scores do not reflect same distance Frequency tables, median values, IQR Ranking and ordering Ex. Scoring as none, minor, moderate,. Differences between scores do not reflect same distance Frequency tables, median values, IQR Interval Adding, subtracting (no ratios) Arbitrary zero, 30C 2 x 15C Ex. Temperature in degrees Fahrenheit or Celsius Means, standard deviations, etc. Adding, subtracting (no ratios) Arbitrary zero, 30C 2 x 15C Ex. Temperature in degrees Fahrenheit or Celsius Means, standard deviations, etc. Ratio Differences and ratios make sense True zero point, negative numbers impossible Ex. Concentrations, duration, temperature in degrees Kelvin, age, counts, Geometric means, coefficient of variation Differences and ratios make sense True zero point, negative numbers impossible Ex. Concentrations, duration, temperature in degrees Kelvin, age, counts, Geometric means, coefficient of variation Slide 105 Significance tests 105 Randomization Model Null hypothesis (H 0 ): no difference in response between treatment groups The concept of the p-value is often misunderstood Exp. Data Observed Test Statistic t Observed Test Statistic t P-value = Probability of obtaining a value for the test statistic that is at least as extreme as t assuming H 0 is true Consider H 0 as true p Statistical Model Reject H 0 Do not reject H 0 Yes No Slide 106 Significance tests Example Cardiomyocytes 106 Mean diff. = 7% viable MC standard error = 2.51 Test statistic t = 7/2.51 = 2.79 Test statistic t = 7/2.51 = 2.79 Assume H 0 is true Distribution of possible values of t with mean 0 P(t 2.79) = 0.024 Probability of obtaining an increase of 2.79 or more is 0.024, provided H 0 is true Slide 107 Significance tests Two-sided tests 107 Mean diff. = 7% viable MC standard error = 2.51 Test statistic t = 7/2.51 = 2.79 Test statistic t = 7/2.51 = 2.79 Assume H 0 is true Distribution of possible values of t with mean 0 P(|t| 2.79) = 0.048 Probability of obtaining a difference of 2.79 or more is 0.048, provided H 0 is true Results in opposite direction are also of interest Two-sided tests are the rule! Slide 108 Statistics always makes assumptions 108 Parametric Methods: Distribution of error term Randomization Independence Equality of variance Parametric Methods: Distribution of error term Randomization Independence Equality of variance Nonparametric Methods: Randomization Independence Nonparametric Methods: Randomization Independence Always verify assumptions Planning (historical data) Analysis time before actual analysis Always verify assumptions Planning (historical data) Analysis time before actual analysis Slide 109 Multiplicity 109 Multiple comparisons Multiple variables Multiplicity Multiple measurements (in time) 20 doses of drug tested against its control each at significance level of 0.05 Assume all null hypotheses are true, no dose differs from control P(correct decision of no difference) = 1 - 0.05 = 0.95 P(ALL 20 decisions correct) = 0.95 20 = 0.36 P(at least 1 mistake) = P(NOT ALL 20 correct) = 1 0.36 = 0.64 20 doses of drug tested against its control each at significance level of 0.05 Assume all null hypotheses are true, no dose differs from control P(correct decision of no difference) = 1 - 0.05 = 0.95 P(ALL 20 decisions correct) = 0.95 20 = 0.36 P(at least 1 mistake) = P(NOT ALL 20 correct) = 1 0.36 = 0.64 The Curse of Multiplicity is of particular importance in microarray data (30,000 genes 1,500 FP) Multiplicity and how to deal with it must be recognized at the planning stage The Curse of Multiplicity is of particular importance in microarray data (30,000 genes 1,500 FP) Multiplicity and how to deal with it must be recognized at the planning stage Slide 110 Statistical Thinking and Smart Experimental Design VIII. The Study Protocol 110 Slide 111 The writing of the study protocol finalizes the research design phase 111 Research protocol Rehearses research logic Forms the basis for reporting Research protocol Rehearses research logic Forms the basis for reporting Technical protocol Describes practical actions Guidelines for lab technicians Technical protocol Describes practical actions Guidelines for lab technicians General hypothesis Working hypothesis Experimental design rationale (treatment, error control, sampling design) Measures to minimize bias (randomization, etc.) Measurement methods Statistical analysis Manipulation of animals Materials Logistics Data collection & processing Personal responsibilities Writing the Study Protocol is already working on the Study Report Slide 112 Statistical Thinking and Smart Experimental Design IX. Interpretation & Reporting 112 Slide 113 Points to consider in the Methods Section 113 Experimental design Weaknesses & Strengths : Size and number of experimental groups How were experimental units assigned to treatments? Was proper randomization used or not? Was it a blinded study and how was blinding introduced? Was blocking used and how? How were outcomes assessed? In case of ambiguity, which was the experimental unit and which the observational unit and why? Experimental design Weaknesses & Strengths : Size and number of experimental groups How were experimental units assigned to treatments? Was proper randomization used or not? Was it a blinded study and how was blinding introduced? Was blocking used and how? How were outcomes assessed? In case of ambiguity, which was the experimental unit and which the observational unit and why? Statistical Methods: Report and justify statistical methods with enough detail Supply level of significance and when applicable direction of test (one-sided, two-sided) Recognize multiplicity and give justification of strategy to deal with it Reference to software and version Statistical Methods: Report and justify statistical methods with enough detail Supply level of significance and when applicable direction of test (one-sided, two-sided) Recognize multiplicity and give justification of strategy to deal with it Reference to software and version Slide 114 Points to consider in the Results Section Summarizing the Data 114 Always report number of experimental units used in the analysis For - and only for - normally distributed data mean and standard deviation (SD) are sufficient statistics SD is descriptive statistic about spread, i.e. variability Report as mean (SD) NOT as mean SD SEM is measure of precision of the mean (makes not much sense) SD is descriptive statistic about spread, i.e. variability Report as mean (SD) NOT as mean SD SEM is measure of precision of the mean (makes not much sense) Not normally distributed Median values and interquartile range Extremely small datasets (n < 6) Raw data Mean 2 x SD can lead to ridiculous values if data not normally distributed (concentrations, durations, counts, etc.) Spurious precision detracts a papers readability and credibility Slide 115 Points to consider in the Results Section Graphical Displays 115 Complement tabular presentations Better suited for identifying patterns:A picture says more than a thousand words Whenever possible display individual data (e.g. use of beeswarm package in R) Complement tabular presentations Better suited for identifying patterns:A picture says more than a thousand words Whenever possible display individual data (e.g. use of beeswarm package in R) Individual data, median values and 95% confidence intervals Longitudinal data (measurements over time) Individual subject profiles Slide 116 Points to consider in the Results Section Significance Tests 116 Specify method of analysis do not leave ambiguity e.g. statistical methods included ANOVA, regression analysis as well as tests of significance in the methods section is not specific enough Tests of significance should be two-sided Two group tests allow direction of alternative hypothesis, e.g. treated > control, this is a one-sided alternative Use of one-sided tests must be justified and the direction of the test must be stated beforehand (in protocol) For tests that allow a one-sided alternative it must be stated whether two-sided or one-sided was used Two group tests allow direction of alternative hypothesis, e.g. treated > control, this is a one-sided alternative Use of one-sided tests must be justified and the direction of the test must be stated beforehand (in protocol) For tests that allow a one-sided alternative it must be stated whether two-sided or one-sided was used Report exact p-values rather than p

Documents

Luc Wouters October 2013 Statistical Thinking and Smart Experimental Design