Chance Models, Hypothesis Testing, Power Q560: Experimental Methods in Cognitive Science Lecture 6

Embed Size (px)

DESCRIPTION

Stick/Switch StickSwitch Stick Switch Which is correct in the “real” world? Population Observed Sample

Citation preview

Chance Models, Hypothesis Testing, Power Q560: Experimental Methods in Cognitive Science Lecture 6 Stick/Switch StickSwitch Which is correct in the real world? Observed Sample Population Stick/Switch StickSwitch Stick Switch Which is correct in the real world? Population Observed Sample Probability and Samples So far, weve talked about samples of size 1. In an experiment, we take a sample of several observations and try to make generalizations back to the population How do we estimate how good a representation of the population the sample we obtain is? The distribution of sample means contains all sample means of a size n that can be obtained from a population. Sample Means Lets do an example from a very small population of 4 scores: X: 2, 4, 6, 8 Sample Means We construct a distribution of sample means for n=2. 1. Step: Write down all 16 possible samples. Sample Means 2. Step: Draw the distribution of sample means. Sample Means Things to note about the distribution: 1.Mean of sample means = mean of population. 2.Shape looks normal. 3.We can use this distribution to answer questions about probabilities. Central Limit Theorem For any population with mean and standard deviation , the distribution of sample means for sample size n will have a mean of and a standard deviation of, and will approach a normal distribution as n approaches infinity. Central Limit Theorem Even though we cant compute all possible samples of size n from this population to compare to, the Central Limit Theorem tells us that for any DSM of samples of size n: 1) 2) 3) DSM will approach the unit normal as n approaches infinity DSM will always be normally distributed, even if the population was not normally distributed Central Limit Theorem The mean of the distribution of sample means is called the expected value of M. The standard deviation of the distribution of sample means is called the standard error of M. standard error = M Standard deviation: standard distance between a score X and the population mean . Standard error: standard distance between a sample mean M and the population mean . Law of Large Numbers The larger a sample, the better its mean approximates the mean of the population. Visualizing sampling distributions and CLT: Probability and the DSM We can use the distribution of sample means to find out probabilities (= proportions!). For example: Given a population, how likely is it to obtain a sample of size n with a certain M? Probability and the DSM Example: SAT-scores (=500, =100). Take sample n=25. What is p(M>540)? p =.0228 Another Example: SAT-scores (=500, =100). Take sample n=25. What range of values for M can be expected 80% of the time (prediction)? Using the Standard Error The standard error tells us how much error, on average, should exist between a sample mean and the population mean. As the sample size n increases, the standard error decreases. Hypothesis Testing What is Hypothesis Testing A hypothesis test uses sample data to evaluate a hypothesis about a population parameter. The basic logic of hypothesis testing: 1.State hypothesis about a population. 2.Obtain random sample from population. 3.Compare sample data with population. - if consistent, accept hypothesis - if inconsistent, reject hypothesis An Example: Basic experimental situation: Four Steps The 4 Steps of Hypothesis Testing: 1.State the hypothesis 2.Set decision criteria 3.Collect data and compute sample statistic 4.Make a decision (accept/reject) Step 1: State Hypothesis Step 2: Set Criteria Consider distribution of sample means if H 0 is true. Divide the distribution into two sections: 1. Sample means likely to be obtained if H 0 is true. 2. Sample means very unlikely to be obtained if H 0 is true. Step 2: Set Criteria Distribution of sample means: Step 2: Set Criteria Examples for boundaries: Step 3: Collect Data/Statistics Select random sample and perform experiment. Compute sample statistic, e.g. sample mean. Locate sample statistic within hypothesized distribution (use z-score). Is sample statistic located within the critical region? Step 4: Decision 1. Possibility: sample statistic is within critical region. Reject H Possibility: sample statistic is not within critical region. Do not reject H 0. We reject or do not reject the null, we cannot prove the alternate hypothesis It is easier to demonstrate a hypothesis is false than to demonstrate that it is true Hypothesis Testing: An Example It is known that corn in Bloomington grows to an average height of =72 =6 six months after being planted. We are studying the effect of Plant Food 6000 on corn growth. We randomly select a sample of 40 seeds from the above population and plant them, using PF-6000 each week for six months. At the end of the six month period, our sample has a height of M=78 inches. Go through the steps of hypothesis testing and draw a conclusion about PF State hypotheses 2. Chance model/critical region 3. Collect data 4. Decision and conclusion 1.State hypotheses Null and alternate in both sentence and parameter notation 2.Determine critical region in chance model Calculate and draw dist of sample means (DSM) Determine alpha level (.05) Calculate upper and lower cutoff for means that will be considered unlikely due to chance (for =.05, z crit =1.96) 3.Collect data/compute test statistic (Done for us) 4.Hypothesis Decision and Conclusion Does M obt exceed M crit ? If yes, reject null; if no, cannot reject null Sentence to make conclusion about effect of IV Step 1: State Hypotheses Null: PF6000 will not have an effect on corn growth Alt: PF6000 will have an effect on corn growth In words: In code symbols: Step 2: Chance Model and Critical Value a)Distribution of Sample Means: Step 2: Chance Model and Critical Value a)Distribution of Sample Means: b) Set alpha level =.05 z crit = 1.96 Draw the sampling distribution Shade in critical region on sampling distribution Step 2: Chance Model and Critical Value c) Compute critical values to correspond to z crit This is the range of means we will tolerate as due to chance Beyond these values, the obtained sample mean is unlikely to have come from this expected sampling distribution Pencil these values onto our sampling distribution Step 3: Do Experiment This is the part where we actually draw the sample, conduct the experiment, and compute the sample statistic (mean so far) For the question, this part has already been done for us, we just need to compare this obtained sample mean to our chance model to determine if any discrepancy between our sample and the original population is due to: 1. Sampling Error 2. A true effect of our manipulation Step 4: Decision and Conclusion M crit is (lower) or (upper) If M obt exceeds either of these critical values (i.e., is out of the chance range, we reject H 0. Otherwise, cannot reject H 0 M obt = 78 M crit = M obt exceeds M crit Reject H 0 Conclusion: We must reject the null hypothesis that the chemical does not produce a difference. Conclude that PF6000 has an effect on corn growth. Directional Tests Directional = one-tailed In a one-tailed test the hypotheses make a statement about the expected direction of an effect. Example: experimental test of dietary drug (expected: reduction in food intake) H 0 : no reduction in food intake H 1 : food intake is reduced Errors and Uncertainty A hypothesis test may produce an erroneous result (wrong decision). Two types of errors can be made Type I Error: Concluding there is an effect when there really is not Type II Error: Concluding there is no effect when there really is Errors and Uncertainty Type I error: H 0 is rejected, while in fact the treatment has no effect. Example: Experimental treatment (behavior, drug, etc.) has actually no effect, but sample data make it look that way (due to sampling error). The alpha level is the probability that the test will lead to a Type I error. Researcher controls the magnitude of Type I error by setting . Errors and Uncertainty Type II error: Treatment effect really exists but hypothesis test fails to detect it. Example: treatment effect may be small Symbol 1- 1- Type I Error Type II Error Power PCR Summary of possible outcomes of a statistical decision: Statistical Power Another way of defining power: Power is the probability of obtaining sample data in the critical region when H 0 is actually false. Probability of detecting an effect if indeed one exists Power is difficult to specify because it depends in part on the magnitude of any treatment effect. Example Power, if treatment effect is 20 points: Power, if treatment effect is 40 points: Factors Affecting Power 1.Alpha (lowering reduces power) 2.Sample size (increasing n increases power b/c the standard error goes down) 3.Effect size (the bigger the effect, the greater the power b/c distance between distributions is bigger) 4.Tails (a one-tailed hypothesis test is more powerful than a two-tailed hypothesis test) p and Sample means located in the critical region have p< (reject H 0 ). Sample means located outside of the critical region have p> (accept H 0 ) Why not z-test: An Example It is thought that we are genetically hardwired to recognize human faces. In a preferential looking paradigm, newborns are presented with two stimuli: one representing a face, and one containing the same features, but in a different configuration. The experimenter records how long the infants look at the face stimulus during a 60-sec presentation (lets assume they always look at one or the other) By chance, we would only expect them to look at the face stimulus for 30 seconds, but they look for 35 secondsis this effect significant? Sample Variance We dont know the variability of the population. But: we do know the variability of the sample. Sample variance = s 2 = SS n-1 = SS df Sample standard deviation = s 2 Estimated Standard Error We can use the estimated standard error as an estimate of the real standard error. Estimated standard error = Standard error = t-statistic Substituting the estimated standard error in the formula for the z-score gives us the following: t statistic = t = M - s M The t-statistic approximates a z-score, using the sample variance instead of the population variance (which is unknown). How well does that work? Degrees of Freedom and t Statistic Degrees of freedom describes the number of scores in a sample that are free to vary. degrees of freedom = df = n-1 The greater df, the better the t-statistic approximates the z-score. The set of t statistics for a given df (n) forms a t distribution. For large df (large n) the t distribution approximates the normal distribution. t distribution: Shape Hypothesis Tests Using the t Statistic Same procedure as with z-scores, except using the t statistic instead. Step 1: State hypothesis, in terms of population parameter . Step 2: Determine critical region, using , df, and looking up t. Step 3: Collect data and calculate value for t using estimated standard error. Step 4: Decide, based on whether t value for sample falls within critical region One-Sample t Test: An Example Well go back to our preferential looking paradigm and newborn babies. We show them the two stimuli for 60 seconds, and measure how long they look at the facial configuration. Our null assumption is that they will not look at it for longer than half the time, = 30 Our alternate hypothesis is that they will look at the face stimulus longer b/c face recognition is hardwired in their brain, not learned (directional) Our sample of n = 26 babies looks at the face stimulus for M = 35 seconds, s = 16 seconds Test our hypotheses ( =.05, one-tailed) Step 1: Hypotheses Sentence: Null: Babies look at the face stimulus for less than or equal to half the time Alternate: Babies look at the face stimulus for more than half the time Code Symbols: Step 2: Determine Critical Region Population variance is not known, so use sample variance to estimate n = 26 babies; df = n-1 = 25 Look up values for t at the limits of the critical region from our critical values of t table Set =.05; one-tailed 1.708 Step 2: Determine Critical Region Population variance is not known, so use sample variance to estimate n = 26 babies; df = n-1 = 25 Look up values for t at the limits of the critical region from our critical values of t table Set =.05; one-tailed t crit = Step 3: Calculate t statistic from sample a) Sample variance: b) Estimated standard error: c) t statistic: Step 4: Decision and Conclusion The t obt =1.59 does not exceed t crit =1.708 We must retain the null hypothesis Conclusion: Babies do not look at the face stimulus more often than chance, t(25) = +1.59, n.s., one-tailed. Our results do not support the hypothesis that face processing is innate. Stick/Switch StickSwitch Stick Switch Which is correct in the real world? Population Observed Sample