182
Math 321 - Dr. Minnotte 1 Math 321 - Dr. Minnotte 2 Introduction: What is Statistics? Definition: Statistics is the science of measurement and decision-making under conditions of uncertainty, randomness, and variability. More briefly: Statistics is the field of dealing with data. In statistics, we make observations, to collect information, to help make decisions. If that sounds familiar, it should. We do that sort of thing every day, in every field of study, and in our everyday life. In statistics, we simply formalize this process mathematically. This allows us to recognize smaller differences than might otherwise be found, and to make decisions under conditions of greater uncertainty. Math 321 - Dr. Minnotte 3 The term “statistic” is also used to describe any bit of numerical information, like the 6.3% unemployment rate in April, 2014 or the 15,143 students enrolled at UND in Fall, 2013. These numerical bits of data are thrown at us every time we read the newspaper, or watch TV news, or read a journal in our field. Just as words should be read with understanding, so should statistics. If we uncritically accept the numbers others give us, we open ourselves to believing misinformation. Math 321 - Dr. Minnotte 4

Math 321 - Statistics

Embed Size (px)

DESCRIPTION

Slides from Dr. Micheal Minnotte's University of North Dakota Math Class, Math 321 - Applied Statistical Methods. These are slides from the Summer of 2015 Semester. This class uses Principles of Statistics for Engineers and Scientists written by William Navidi, first edition.

Citation preview

  • Math 321 - Dr. Minnotte 1

    Math 321 - Dr. Minnotte 2

    Introduction: What is Statistics?

    Definition: Statistics is the science of measurement and decision-making under conditions of uncertainty, randomness, and variability.

    More briefly: Statistics is the field of dealing with data.

    In statistics, we make observations, to collect information, to help make decisions.

    If that sounds familiar, it should. We do that sort of thing every day, in every field of study, and in our everyday life.

    In statistics, we simply formalize this process mathematically. This allows us to recognize smaller differences than might otherwise be found, and to make decisions under conditions of greater uncertainty.

    Math 321 - Dr. Minnotte 3

    The term statistic is also used to describe any bit of numerical information, like the 6.3% unemployment rate in April, 2014 or the 15,143 students enrolled at UND in Fall, 2013.

    These numerical bits of data are thrown at us every time we read the newspaper, or watch TV news, or read a journal in our field.

    Just as words should be read with understanding, so should statistics. If we uncritically accept the numbers others give us, we open ourselves to believing misinformation.

    Math 321 - Dr. Minnotte 4

  • Math 321 - Dr. Minnotte 2

    Statistics are an important tool in almost every field. In this class, well look at examples like:

    How can doctors tell if a new vaccine really works?

    How can irrigation engineers use past river flow rates to predict future flows?

    How can polltakers use responses from a few thousand voters to predict the results of an election in which more than a hundred million people vote?

    What are some other examples of statistics in practice?

    Math 321 - Dr. Minnotte 5

    The Challenger Disaster: A Statistical Cautionary Tale

    In 1986, a lack of statistical thinking contributed to a tragedy: the explosion of the space shuttle Challenger.

    The destruction of the Challenger killed seven astronauts, including Christa McAuliffe, a 37-year-old teacher selected to be the first teacher in space, and set the U.S. manned space program back several years.

    Math 321 - Dr. Minnotte 6

    Math 321 - Dr. Minnotte 7

  • Math 321 - Dr. Minnotte 3

    The solid rocket motors used to launch the space shuttles are shipped to the Kennedy Space Center in four pieces. Large rubber O-rings are used to seal the three joints between the pieces.

    The Challenger explosion occurred when one of the O-rings failed to seal quickly enough to prevent hot gasses from escaping from the rocket and igniting the large external fuel tank.

    Implicated in the failure was the unusually cold (for Florida) launch temperature of 29F.

    Math 321 - Dr. Minnotte 8

    The night before the launch, forecasters predicted a temperature of 31F for the launch time.

    A three-hour teleconference took place between people at:

    Morton Thiokol (manufacturer of the rocket motors)

    Marshall Space Flight Center (NASA center for motor design control), and

    Kennedy Space Center.

    Math 321 - Dr. Minnotte 9

    There was concern that the cold temperatures could lead to problems with the O-rings.

    In 7 out of 23 previous launches, some O-ring damage had occurred.

    Some participants recommended delaying the launch until the temperature rose above 53F, the lowest previous launch temperature, in which the greatest number of damaged O-rings occurred.

    Math 321 - Dr. Minnotte 10

  • Math 321 - Dr. Minnotte 4

    In the end, the recommendation was made to launch on schedule, in part because of the following plot.

    The plot shows temperature vs. number of damaged O-rings for the 7 affected launches.

    The relationship seems limited, at most.

    What error was made preparing this plot?

    Math 321 - Dr. Minnotte 11

    Math 321 - Dr. Minnotte 12

    Math 321 - Dr. Minnotte 13

  • Math 321 - Dr. Minnotte 5

    By only including the launches in which incidents occurred, the investigators left out some important information!

    When the data from all 23 launches is plotted, a temperature dependence becomes obvious.

    All of the 4 launches below 66F had damage. Only 3 out 16 flights above that temperature

    suffered damage.

    Note where 31F or 29F would appear on that plot.

    Math 321 - Dr. Minnotte 14

    More sophisticated analyses are possible, but unnecessary.

    Had the concerned engineers presented the complete data in such a format, they might well have convinced the decision-makers to delay the launch and prevented the tragedy.

    Theres more to this story, so well return to it later in the semester.

    Math 321 - Dr. Minnotte 15

    Math 321 - Dr. Minnotte 16

    Chapter 1: Univariate Data -Populations and Samples Definition: A population consists of all

    potential observations from a distribution of interest.

    In an enumerative study, the population will be tangible, real and finite, and might be represented by a sampling frame listing the members of the population.

    o Examples include populations of people, or corporations, or items in a shipment.

  • Math 321 - Dr. Minnotte 6

    Math 321 - Dr. Minnotte 17

    In an analytic study, we study an ongoing process, and the conceptual population is infinite and simply a useful theoretical construct. No sampling frame is possible.

    o Examples include populations of rainfall over time, or objects coming off an ongoing assembly line, or repeated measurements of the same underlying weight.

    As an investigator, you have a great deal of flexibility in defining the population of interest.

    Math 321 - Dr. Minnotte 18

    Example: We are interested in the ages of UND students. What are some possible relevant populations?

    Example: A quality engineer wishes to study the volume of milk in containers coming off a production line. What are possible populations?

    Example: We wish to examine the incidence of obesity in preteen children. What is an appropriate population?

    Math 321 - Dr. Minnotte 19

    Once we have defined our population, we take a sample from that population.

    Measurements from each member of the sample will be the observations which make up the dataset we will analyze.

    Example: Student ages.

  • Math 321 - Dr. Minnotte 7

    Experiments Suppose that a chemical engineer wants

    to determine how the concentration of a catalyst affects the yield of a process.

    The engineer can run the process several times, changing the concentration each time and compare the yields that result.

    This sort of experiment is called a controlled experiment because the values of the concentration variable are under the control of the experimenter.

    Math 321 - Dr. Minnotte 20

    Observational Studies There are many situations in which scientists

    cannot control the variables of interest. Many studies have been conducted to

    determine the effect of cigarette smoking on the risk of lung cancer. In these studies, rates of cancer among smokers are compared with rates among nonsmokers.

    The experimenter cannot control who smokes and who doesnt.

    This kind of study is called an observational study.

    Math 321 - Dr. Minnotte 21

    Math 321 - Dr. Minnotte 22

    When we study a sample, we must make sure it is representative of the population.

    One option is a census, or complete enumeration, of everyone in the population. What are some problems with this approach?

  • Math 321 - Dr. Minnotte 8

    Math 321 - Dr. Minnotte 23

    Usually, the best solution is to take a random sample, choosing your sample with planned probability methods.

    The most basic such method is called a simple random sample (SRS).

    In a SRS, we draw individuals out of the population with the equivalent of drawing names out of a (well-mixed) hat.

    Each subset of the population of the appropriate size is equally likely to make up the sample.

    This is theoretically convenient, but often hard to arrange in practice.

    Math 321 - Dr. Minnotte 24

    When viewed in order, or over time, the observations of a SRS should not show any noticeable pattern or trend.

    A SRS is not guaranteed to reflect the population perfectly.

    SRSs always differ in some ways from each other; occasionally a sample is substantially different from the population.

    This phenomenon is known as sampling variation.

    Math 321 - Dr. Minnotte 25

  • Math 321 - Dr. Minnotte 9

    The items in a sample are independent if knowing the values of some of the items does not help to predict the values of the others.

    Items in a simple random sample may be treated as independent in most cases encountered in practice. The exception occurs when the population is finite and the sample comprises a large fraction (more than 5%) of the population.

    Math 321 - Dr. Minnotte 26

    Math 321 - Dr. Minnotte 27

    Samples of Convenience A nonrandom sample, or sample of

    convenience, may be easier to collect, but may be nonrepresentative in some important ways.

    Such a sample may bias your results, making them worthless (or at least a whole lot less trustworthy).

    Math 321 - Dr. Minnotte 28

    Example: We are interested in the size of hometowns for all U.S. college students, but only sample at UND.

    Example: We want to survey UND students on math anxiety, and pick a class to interview:

    Math 321? Upper-division English?

  • Math 321 - Dr. Minnotte 10

    Math 321 - Dr. Minnotte 29

    Example: Not everyone will consent to test a new AIDS vaccine. We could give those who consent the vaccine, and leave those who dont alone to be the control group.

    What about a historical control (compare vaccinated group with past infection rates)?

    Terminology and Notation From each individual person or object in

    our sample, we are generally interested only in a small number of characteristics.

    Each characteristic we record will be called a variable, and assigned a letter from the end of the alphabet.

    Math 321 - Dr. Minnotte 30

    Math 321 - Dr. Minnotte 31

    Data that we collect may be of two main types:

    1) Categorical classifying the subject into one of several distinct groups.

    o X = Sexo T = Hair Coloro W = Zip Code

    2) Numerical data recorded as a number, where operations like averages make sense.

    o Y = Ageo U = Rainfallo Z = Volume of milk

  • Math 321 - Dr. Minnotte 11

    We also classify datasets based on how many variables we measure on each individual.

    If we only collect a single variable (e.g. age), we say the dataset is univariate.

    If we collect two variables for each individual (e.g. age and sex), we say it is bivariate.

    With still more variables, we say that it is trivariate, quadrivariate, and so on, or more commonly, that it is multivariate.

    Math 321 - Dr. Minnotte 32

    Math 321 - Dr. Minnotte 33

    We often use subscripts on the variable name (letter) to indicate specific observations in a dataset, such as X1, X2, , Xn.

    A subscript of i (occasionally j or k) indicates a specific, but arbitrary, observation.

    We usually reserve the label n for the number of observations (the sample size).

    Math 321 - Dr. Minnotte 34

    There are two primary branches of statistics:

    1) Descriptive statistics simply attempts to simplify and understand a dataset.

    2) Inferential statistics attempts to say (infer) something about the broader population or distribution from which the data was drawn.

    Descriptive statistics are simpler, so well start there.

  • Math 321 - Dr. Minnotte 12

    Math 321 - Dr. Minnotte 35

    Summary Statistics (1.2) Given data X1, X2, , Xn, we frequently use

    sample statistics to summarize the dataset.

    A statistic is anything which may be calculated from a dataset. A sample statistic simply makes clear that it derives from a sample.

    Use of sample statistics can improve our understanding of the data, as well as make it easier to communicate with others about it.

    Math 321 - Dr. Minnotte 36

    The Sample Mean The most important feature of a dataset to

    describe is generally its location, or the location of its center.

    The most commonly used statistic for center is the familiar average, or sample mean.

    Definition: The sample mean of data X1, X2, , Xn is

    Math 321 - Dr. Minnotte 37

    Example: Stocks:

  • Math 321 - Dr. Minnotte 13

    To understand how the mean works, suppose we were to take a very thin yardstick or similarly marked board, and place a small (equal) weight at the mark for each observations value.

    The mean may be thought of as the point where this would balance.

    Math 321 - Dr. Minnotte 38

    Outliers An outlier is an observation which is very

    different from the rest of the sample. For univariate data, this means it is much larger or much smaller than the rest.

    Outliers should be carefully examined. Often they are the result of measurement or recording errors.

    If so, they should be fixed or deleted. Correct but unusual values, however, should be kept.

    Math 321 - Dr. Minnotte 39

    Math 321 - Dr. Minnotte 40

    The sample mean is not robust (resistant to outliers). Changing even one observation can change the sample mean as much as we want.

    Example: Mistype the final stock return as 374 (instead of 37.4). What is the sample mean now?

  • Math 321 - Dr. Minnotte 14

    Math 321 - Dr. Minnotte 41

    Measures of Variability After center, the second-most-used

    feature to describe a sample is its variability, or spread.

    The simplest measure of variability is the range, the difference between the maximum and minimum values.

    R = max(X) min(X)

    Unfortunately, the range both wastes most of the data, and is maximally non-robust, using only the two extreme data points, so it is rarely used.

    Math 321 - Dr. Minnotte 42

    A better solution looks at the deviations from the mean, This removes the effect of the mean (location), and looks only at the variability around the mean.

    One option: Look at the average deviation from the mean.

    Problem: Positive deviations cancel out negative ones, and the average deviation from the mean is always 0.

    Math 321 - Dr. Minnotte 43

  • Math 321 - Dr. Minnotte 15

    Math 321 - Dr. Minnotte 44

    We could take absolute values of the deviations, but for a few theoretical reasons, its better to look at the squared deviations instead.

    Definition: The sample variance, s2, measures the spread of a dataset.

    Definition: The sample standard deviation, s, is the square root of the sample variance.

    Math 321 - Dr. Minnotte 45

    Use of the definition formula is tedious, as it requires finding and squaring each of the n deviations from the mean.

    It is usually simpler to calculate s2 using the following computation formula.

    Math 321 - Dr. Minnotte 46

    Example: What are the variance and standard deviation of the stocks data?

  • Math 321 - Dr. Minnotte 16

    Math 321 - Dr. Minnotte 47

    The sample variance and standard deviation are measures of the spread of a dataset, and estimates of the variance and standard deviation of the underlying population or distribution.

    Like the sample mean, they are not robust.

    Example: Stocks, replace 37.4 with 374: s2 = ? s = ?

    Math 321 - Dr. Minnotte 48

    While very useful practically and theoretically, the variance and standard deviation are a little tricky intuitively.

    One helpful rule of thumb: About 2/3 of data should fall in About 95% of data should fall in Almost all data should fall in

    Example: Stock data:

    If X1, , Xn is a sample, and Yi = a + b Xi,where a and b are constants, then

    This is most commonly needed if we change units for our data.

    Math 321 - Dr. Minnotte 49

  • Math 321 - Dr. Minnotte 17

    Example: Let X1,,Xn be a sample of temperatures measured in degrees Celsius, with = 30. Let Y1,,Yn be the same temperatures in degrees Fahrenheit, Yi = 9/5 Xi + 32. What is ?

    Example: Let the variance of the Celsius temperatures be = 25.

    What is the standard deviation? What is the variance of the Fahrenheit

    temperatures? The s.d.?

    Math 321 - Dr. Minnotte 50

    Math 321 - Dr. Minnotte 51

    Definition: The ith order statistic, X(i), is the ith smallest value when the Xs are sorted. The minimum is X(1), the second smallest X(2), and so on up to the maximum, X(n).

    Order Statistics and Robust Measures of Center and Spread

    Math 321 - Dr. Minnotte 52

    Example: Stock data (sorted):

    X(1) = -7.2, X(4) = 1.3, X(20) = 37.4, and so on.

    Because outliers will always be in the first or last few order statistics, values computed from middle order statistics will be very robust.

  • Math 321 - Dr. Minnotte 18

    Math 321 - Dr. Minnotte 53

    Definition: The sample median, , is the middle of the sorted data.

    If n is odd, the sample median is the (n+1)/2thorder statistic.

    If n is even, it is the average of the n/2th and (n+2)/2th order statistics.

    Example: Stocks: = ?

    Math 321 - Dr. Minnotte 54

    The sample median has 50% of the data on either side of it.

    The sample median is very robust; changing one or a few observations wont change it much, if at all.

    Example: Stocks: Replace 37.4 with 374, and the sample median remains 17.6

    Math 321 - Dr. Minnotte 55

    Quartiles The quartiles of the data divide the sample

    into quarters. The first quartile, Q1, splits the lowest quarter

    of the sample from the rest. If (n+1)/4 is an integer, Q1 is the (n+1)/4 order

    statistic. If (n+1)/4 is not an integer, Q1 is the average of

    the two order statistics on either side.

    The third quartile, Q3, splits the highest quarter from the rest.

    Find it as Q1, but using 3(n+1)/4.

  • Math 321 - Dr. Minnotte 19

    Math 321 - Dr. Minnotte 56

    Example: Sorted stocks:

    Q1 = ? Q3 = ?

    Math 321 - Dr. Minnotte 57

    Definition: The sample interquartile range is a robust measure of spread, found as the difference between the sample quartiles, IQR = Q3 Q1.

    Example: Stocks: IQR = ?

    Note: Changing 37.4 to 374 doesnt change Q1, Q3, or IQR.

    Math 321 - Dr. Minnotte 58

    Percentiles Definition: The pth sample percentile, has

    (roughly) p% of the data below it, and (100-p)% above it.

    Compute p(n + 1)/100. If this is an integer, use that order statistic. If not, average the two closest order statistics.

    The median and quartiles are just special names for the 50th, 25th, and 75thpercentiles.

  • Math 321 - Dr. Minnotte 20

    Example: Descriptive Statistics in Minitab

    Descriptive Statistics: Stock Returns 1976-1995

    Variable Mean StDev Variance Minimum Q1 Median Q3 MaximumStock Returns 19 15.37 13.66 186.49 -7.20 5.48 17.60 28.90 37.40

    Variable IQRStock Returns 19 23.43

    Math 321 - Dr. Minnotte 59

    Math 321 - Dr. Minnotte 60

    Basic Statistical Graphics (1.3) Some of the most powerful tools available

    for understanding a dataset are graphics which we can use to look at our data.

    Its very hard to get much useful out of large tables or long columns of numbers. But the human eye is very good at picking out patterns in pictures.

    Math 321 - Dr. Minnotte 61

    Bar Charts Given categorical data, the most useful

    plot available is usually a simple bar chart.

    A bar is drawn for each category, with the height proportional to the count (frequency) or percentage found in that category.

    Other measurements for each category may also be compared.

  • Math 321 - Dr. Minnotte 21

    Math 321 - Dr. Minnotte 62

    Example: Television Picture Grades Perfect, Good, Satisfactory, Fail

    Math 321 - Dr. Minnotte 63

    Category CountPerfect 64Good 47Satisfactory 33Fail 6Total 150

    Math 321 - Dr. Minnotte 64

    Spaces between the bars show categories.

    Bars should start at 0 and show full height (no truncation!). Otherwise, relative heights get distorted.

  • Math 321 - Dr. Minnotte 22

    Math 321 - Dr. Minnotte 65

    Math 321 - Dr. Minnotte 66

    Unless there is a strong natural ordering (e.g. poor-fair-good-excellent; notalphabetical), bars should be sorted in ascending or descending order. This makes comparisons between close values much easier.

    Math 321 - Dr. Minnotte 67

  • Math 321 - Dr. Minnotte 23

    Math 321 - Dr. Minnotte 68

    Math 321 - Dr. Minnotte 69

    Many categories or long category names may be better served by horizontal bars.

    Math 321 - Dr. Minnotte 70

    3-D perspective looks fancy but hurts clarity usually a bad idea.

  • Math 321 - Dr. Minnotte 24

    A stacked bar chart includes a second categorical variable, but focuses on the totals for the main category of the bars.

    Math 321 - Dr. Minnotte 71

    0100200300400500600700800900

    1000

    1st Class 2ndClass

    3rdClass

    Crew

    Individuals on the Titanic

    SurvivedDied

    A clustered bar chart focuses on the counts of the specific combinations of categories, and is useful for comparing the distribution of one variable for different values of the other.

    Math 321 - Dr. Minnotte 72

    0100200300400500600700800

    1stClass

    2ndClass

    3rdClass

    Crew

    DiedSurvived

    Example Minitab Bar Charts

    Math 321 - Dr. Minnotte 73

  • Math 321 - Dr. Minnotte 25

    Math 321 - Dr. Minnotte 74

    Math 321 - Dr. Minnotte 75

    Math 321 - Dr. Minnotte 76

    Pie Charts The other common chart for categorical

    data.

    A pie chart should only be used when the categories represent (all of the) parts of some whole, and so should always plot percentages.

  • Math 321 - Dr. Minnotte 26

    Math 321 - Dr. Minnotte 77

    Each categorys slice gets an angle equal to

    Math 321 - Dr. Minnotte 78

    Comparing angles is much more difficult than comparing heights or lengths. Bar charts are almost always more effective.

    3-D pie charts are the work of the devil. (Probably worse than no chart.)

    Minitab:

    Math 321 - Dr. Minnotte 79

  • Math 321 - Dr. Minnotte 27

    Dotplots Dotplots are simple plots which are very

    useful for looking at univariate numeric data, especially when the sample size is small or there are many ties in the data.

    Each observation is plotted at its location above an appropriate number line. If there are ties, one dot is stacked for each tied observation.

    Math 321 - Dr. Minnotte 80

    Example: Temperature (F) at launch of the first 25 space shuttle launches.

    Math 321 - Dr. Minnotte 81

    66 70 69 80 6867 72 73 70 5763 78 70 67 5375 67 70 81 7679 75 76 58 31

    Math 321 - Dr. Minnotte 82

    Histograms A histogram is a bar chart for numerical

    data.

    The shape of the histogram describes the shape of the distribution of the data.

    If you have a large, randomly collected sample, the shape is also descriptive of the population the sample was taken from.

    Your book also describes stem-and-leafplots, which are similar, but rarely used.

  • Math 321 - Dr. Minnotte 28

    Math 321 - Dr. Minnotte 83

    Constructing a Histogram1) Find the minimum and maximum of the

    data.

    2) Break that interval into class intervals. 5-20 classes is often a good start. More for

    large samples, less for small ones. A reasonable rule of thumb is

    Select your classes so that each is of equal width.

    Math 321 - Dr. Minnotte 84

    3) Find the frequencies (counts, ni) and relative frequencies (fi = ni/n) in each class.

    4) Plot the bar chart with a bar over each class whose height equals fi or ni.

    Math 321 - Dr. Minnotte 85

    Example: Stock Data (Annual Rate of Return, 1976-1995):

  • Math 321 - Dr. Minnotte 29

    Math 321 - Dr. Minnotte 86

    The shape of the histogram tells us about the distribution. Some things to look for include:

    Is the distribution left-skewed? Symmetric? Right-skewed?

    Math 321 - Dr. Minnotte 87

    Is the distribution bimodal?

    Multimodal?

    Are there any outliers?

    Math 321 - Dr. Minnotte 88

    Its a good idea to look at several choices of bin width and location, as different choices here can produce dramatically different histograms.

    Features that remain in many histograms are likely to be trustworthy; those that only appear sometimes are less certain.

  • Math 321 - Dr. Minnotte 30

    Math 321 - Dr. Minnotte 89

    Example: Milk Fill Weights Data

    Math 321 - Dr. Minnotte 90

    Math 321 - Dr. Minnotte 91

  • Math 321 - Dr. Minnotte 31

    Math 321 - Dr. Minnotte 92

    Math 321 - Dr. Minnotte 93

    Math 321 - Dr. Minnotte 94

    Definition: A boxplot is another graphical tool for displaying a sample:

    Boxplots

  • Math 321 - Dr. Minnotte 32

    Math 321 - Dr. Minnotte 95

    The box goes from the first to the third quartile, with a line at the median.

    For boxplots, outliers are usually defined as any values below

    Q1 1.5 IQRor above Q3 + 1.5 IQR.Those points are marked individually.

    The whiskers go from the quartiles to the least and greatest values among the non-outliers.

    Math 321 - Dr. Minnotte 96

    Boxplots are much less informative than histograms for a single distribution, so the histogram is usually preferable.

    On the other hand, comparing histograms is difficult, while comparing boxplots is easy.

    Use boxplots to compare 2-20 (or more) distributions.

    Math 321 - Dr. Minnotte 97

    Example: Fish length data

  • Math 321 - Dr. Minnotte 33

    Math 321 - Dr. Minnotte 98

    Example: Circuit board data by board.

    Math 321 - Dr. Minnotte 99

    Ch. 2: Bivariate Data

    Statistics is most powerful when looking at relationships between variables.

    In the simplest case, this involves looking at pairs of measurements made on the same subjects, (x, y).

    Recall, such data is called bivariate (two variables).

    Math 321 - Dr. Minnotte 100

    Examples: Heights and weights of a group of people. ACT score and Freshman GPA for college

    students. January and April average temperatures for

    many years at a specified location. January and February inflows of the Nile river

    at a location.

  • Math 321 - Dr. Minnotte 34

    Math 321 - Dr. Minnotte 101

    We usually picture our variables in a cause-and-effect relationship.

    The explanatory (independent, predictor) variable, x, is assumed to play some role in determining the value of the response(dependent) variable, y.

    x y

    Math 321 - Dr. Minnotte 102

    Scatterplots (2.1)

    Definition: A scatterplot is the most common graph for displaying bivariatedata. It consists of plotting each point at (xi, yi), on a standard x-y graph.

    The pattern formed by the points describes the relationship between the variables.

    Math 321 - Dr. Minnotte 103

  • Math 321 - Dr. Minnotte 35

    Math 321 - Dr. Minnotte 104

    Math 321 - Dr. Minnotte 105

    Math 321 - Dr. Minnotte 106

  • Math 321 - Dr. Minnotte 36

    Math 321 - Dr. Minnotte 107

    Minitab Scatterplot:

    Correlation Suppose we have a sample of (x, y) pairs

    and compute the sample means, and

    For each observation (xi, yi), compute the product of the two deviations from the means.

    Dividing the scatterplot at the means results in two quadrants where the product is positive, and two where it is negative.

    Math 321 - Dr. Minnotte 108

    Math 321 - Dr. Minnotte 109

  • Math 321 - Dr. Minnotte 37

    For a scatterplot with a positive relationship, most of the products will have a positive sign, and the sum will be positive.

    Likewise, if the picture shows a negative relationship, the sum of the products will be negative.

    Unfortunately, the exact value of the sum depends on the units and spread (as measured by standard deviation) of the variables.

    Math 321 - Dr. Minnotte 110

    Math 321 - Dr. Minnotte 111

    Dividing by measures of spread for x and ysolves this issue.

    Then is a good, unitlessmeasure of the linear relationship between xand y called the correlation coefficient.

    Math 321 - Dr. Minnotte 112

    Example: Nile flow data: n=115

    What is r?

  • Math 321 - Dr. Minnotte 38

    Math 321 - Dr. Minnotte 113

    Properties of r1. The value of r does not depend on the units of x

    or y. We will not change r if we multiply all xs, all ys, or both by a positive constant or if we add any constant to all xs, all ys, or both.

    2. The value of r does not depend on which variable is labeled x.

    3. Correlation is always between -1 and +1.

    4. The sign of r shows whether the relationship between x and y is positive or negative.

    Math 321 - Dr. Minnotte 114

    Properties of r (continued)5. The absolute value of r measures the strength of the

    linear relationship between x and y. Roughly speaking:

    a. If |r| < 0.5, the relationship (if any) is weak.b. If 0.5 < |r| < 0.8, the association is moderate.c. If 0.8 < |r| < 1.0, the association is strong.d. If |r| = 1.0, the association is perfect. This occurs only

    when all (x, y) points fall in a perfect line.

    Note that strength is often context- and discipline-dependent. An engineer might find any correlation less than .95 to be weak, while a social scientist might find a correlation of .3 to be very strong.

    Math 321 - Dr. Minnotte 115

  • Math 321 - Dr. Minnotte 39

    Math 321 - Dr. Minnotte 116

    Math 321 - Dr. Minnotte 117

    Properties of r (continued)6. The correlation coefficient cannot measure the

    strength of a nonlinear (curved) relationship.

    Math 321 - Dr. Minnotte 118

    7. Outliers can also lead to an inappropriate value -in either direction!

  • Math 321 - Dr. Minnotte 40

    Math 321 - Dr. Minnotte 119

    High correlation indicates strong association, not necessarily causality.

    If |r| is large, there are at least 3 possible explanations:

    1) x determines y2) y determines x3) Some third value, z, (called a confounding

    factor) determines both x and y.

    Math 321 - Dr. Minnotte 120

    Example: Weekly surveys show that per capita chocolate consumption is strongly correlated with traffic fatalities.

    Should driving under the influence of chocolate be outlawed?

    Do people eat a lot of chocolate at funerals? Is there a third explanation that makes more

    sense?

    Math 321 - Dr. Minnotte 121

    Example: Over time, ministers salaries in Massachusetts are strongly correlated with the price of rum in Havana. What is the causal relationship here?

    Example: Childrens shoe size is correlated with size of vocabulary. What is the causal relationship?

  • Math 321 - Dr. Minnotte 41

    One advantage of well-designed randomized, controlled experiments is that potential confounding factors should be (roughly) balanced between levels of the independent variable we are investigating, so should be much less likely to produce a spurious correlation.

    Math 321 - Dr. Minnotte 122

    Math 321 - Dr. Minnotte 123

    Linear Regression (2.2 2.3) Definition: Regression involves modeling

    and predicting the values of one response variable, based on the observed values of one or more other explanatory variables.

    Well focus on the case of simple linear regression, where a straight line is fit to a scatterplot of x and y.

    Math 321 - Dr. Minnotte 124

    We want an equation for a line of the form

    The most common way to estimate and uses the least squares fit, minimizing

    This leads to the least squares estimates,

  • Math 321 - Dr. Minnotte 42

    Deviations from a potential regression line:

    Math 321 - Dr. Minnotte 125

    Math 321 - Dr. Minnotte 126

    The least squares line best fits the scatter plot.

    Math 321 - Dr. Minnotte 127

    Example: Nile flow data

    What is the least-squares line for this data, and what should we predict the flow for February to be if Januarys was 3?

  • Math 321 - Dr. Minnotte 43

    Math 321 - Dr. Minnotte 128

    What would we predict for February from a January value of 10?

    Is this likely to be a valid prediction? (Recall, Januarys mean is about 4, and its standard deviation is about 1.)

    Extrapolation outside the range of the data is dangerous.

    Math 321 - Dr. Minnotte 129

    Math 321 - Dr. Minnotte 130

    Residuals and Goodness-of-Fit Definition: Given a data set (xi, yi) and an

    associated fitted regression model, the fitted value for observation i is

    Definition: The residual for i is

    The smaller the residuals, the better x and the regression line are at predicting y.

  • Math 321 - Dr. Minnotte 44

    The error sum of squares (SSE) is

    SSE is usually compared to the total sum of squares, SST:

    and the regression sum of squares, SSR:

    To avoid having to calculate all the residuals, we may use the computing formula:

    SSE = SST - SSR

    Math 321 - Dr. Minnotte 131

    Math 321 - Dr. Minnotte 132

    The coefficient of determination, r2, measures the proportion of the total variation of y which is explained by x:

    The closer r2 is to 1, the more successful the relationship is at explaining the variation in y.

    As the notation suggests, the coefficient of determination is the square of the correlation coefficient.

    Math 321 - Dr. Minnotte 133

  • Math 321 - Dr. Minnotte 45

    Math 321 - Dr. Minnotte 134

    Example: Nile flow data:

    Find SST, SSR, SSE, and r2.

    What do these say about our predictions?

    Note: r = 0.933.

    The coefficient of determination r2 is found as R-Sq in Minitab output.

    The sums of squares may be found in the SS column of the Analysis of Variance table.

    The regression equation isFebruary Inflow = - 0.4698 + 0.8362 January Inflow

    S = 0.330519 R-Sq = 87.1% R-Sq(adj) = 87.0%

    Analysis of VarianceSource DF SS MS F PRegression 1 83.3794 83.3794 763.25 0.000Error 113 12.3444 0.1092Total 114 95.7238

    Math 321 - Dr. Minnotte 135

    Math 321 - Dr. Minnotte 136

    Chapter 3: Probability

    Definition: Probability is the branch of mathematics dealing with chance, randomness, and uncertainty.

    Probability provides most of the mathematical foundation for inferential statistics.

  • Math 321 - Dr. Minnotte 46

    Math 321 - Dr. Minnotte 137

    Definition: A situation for which the outcome cannot be determined in advance is called an experiment.

    Examples: The roll of a die. The draw of a card. The lifetime of an electronic component.

    Math 321 - Dr. Minnotte 138

    Definition: The sample space, S, of an experiment is the set of all possible outcomes.

    Examples: Die: S = {1, 2, 3, 4, 5, 6} Card: S = ? Component: S = ?

    An experiment with several steps can be visually represented by a tree diagram:

    Example: Toss a coin three times:

    Math 321 - Dr. Minnotte 139

  • Math 321 - Dr. Minnotte 47

    Math 321 - Dr. Minnotte 140

    Events Definition: Set A is a subset of set B

    (A B) if every element of A is also in B. Example: S = {1, 2, 3, 4, 5, 6}

    A = {1, 3, 5} S B = {1, 2, 6, 7} S

    Every set is a subset of itself. The empty set, , consisting of no

    elements, is a subset of every set.

    Math 321 - Dr. Minnotte 141

    Definition: Any interesting subset of the sample space can be called an event.

    Examples: Die: A = odd numbers = {1, 3, 5} Card: B = ? Component: C = ?

    The individual outcomes which make up Sare sometimes called simple events.

    Math 321 - Dr. Minnotte 142

    For subsets of S, A and B (A S, B S):1) The union of A and B (A B) is the set

    consisting of all elements found in A, B, or both.

    Keyword: or

    Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} S B = {1, 2, 3} S A B = ?

    Combining Events

  • Math 321 - Dr. Minnotte 48

    Math 321 - Dr. Minnotte 143

    2) The intersection of A and B (A B) is the set consisting of all elements found in bothA and B.

    Keywords: and, both

    Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} B = {1, 2, 3} A B = ?

    Math 321 - Dr. Minnotte 144

    3) The complement of A (Ac) is the set consisting of all elements of S not found in A.

    Keyword: not

    Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} Ac = ?

    Math 321 - Dr. Minnotte 145

    4) Sets A and B are said to be mutually exclusive if there are no elements in both A and B. That is, if A B = (the empty set).

    Example: S = {1, 2, 3, 4, 5, 6} A = {1, 3, 5} C = {4, 6} A and C = , so A and C are mutually

    exclusive.

  • Math 321 - Dr. Minnotte 49

    Example: Three coin tosses.S={HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}

    Let A = First toss is a head = ? Let B = Last toss is a head = ? What simple events make up the event A and

    B?

    A or B? Not A? Are A and B mutually exclusive?

    Math 321 - Dr. Minnotte 146

    Math 321 - Dr. Minnotte 147

    The Axioms of Probability Definition: A probability function P() is a

    function from subsets of S (events) to the real numbers which satisfies the following axioms of probability:

    1) P(S) = 1.2) 0 P(A) 1 for all events A.3) If A and B are mutually exclusive,

    P(A B) = P(A) + P(B).

    Math 321 - Dr. Minnotte 148

    Example: A fair die. P(1) = 1/6, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,

    P(5) = 1/6, P(6) = 1/6.

    Probabilities of bigger events are found by axiom 3:

    P({1,3}) = P(1) + P(3) = 1/6 + 1/6 = 2/6 = 1/3 P({1,3,5}) = ?

  • Math 321 - Dr. Minnotte 50

    Math 321 - Dr. Minnotte 149

    Example: A biased die. P(1) = 1/12, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,

    P(5) = 1/6, P(6) = 3/12 = 1/4.

    Note:

    (as required by axiom 2) P({1,3}) = P(1) + P(3) = 1/12 + 1/6 = 1/4 P({1,3,5}) = ?

    Math 321 - Dr. Minnotte 150

    When applied to real experiments, probability measures (long-term) likelihood: if the experiment is repeated many times, event A should occur roughly P(A) fraction of the time.

    Math 321 - Dr. Minnotte 151

    Additional Properties of Probability The axioms of probability imply some

    additional properties:

    1) For any event A, P(Ac) = 1 P(A). This is sometimes called the complementary

    events rule, or the opposites rule.

    Show:

    Note: Since Sc = , P() = 0.

  • Math 321 - Dr. Minnotte 51

    Math 321 - Dr. Minnotte 152

    2) For any events A and B, P(A B) = P(A) + P(B) P(A B).

    This is sometimes called the general addition rule.

    Show:

    Note: if A and B are mutually exclusive, P(A B) = P() = 0, so this is the same as axiom 3.

    Math 321 - Dr. Minnotte 153

    Example: A fair die. P(1) = 1/6, P(2) = 1/6, P(3) = 1/6, P(4) = 1/6,

    P(5) = 1/6, P(6) = 1/6. A = {1, 3, 5}, P(A) = 3/6 = 1/2. B = {1, 2}, P(B) = 2/6 = 1/3. P(Ac) = ?

    A B = {1}, P(A B) = 1/6. P(A B) = ?

    Math 321 - Dr. Minnotte 154

    We dont need to know the entire probability function to use these.

    Example: Lifetime of a component (T). Suppose we know:

    P(A) = P(T 60) = .47 P(B) = P(40 T 80) = .34 P(A B) = P(40 T 60) = .26Then:

    P(T 60) = ? P(lifetime no more than 80) = ?

  • Math 321 - Dr. Minnotte 52

    Math 321 - Dr. Minnotte 155

    Example: Suppose the probability that an integrated circuit chip has defective etching is 0.12. The probability that the chip has a crack defect is 0.29. And the probability of both defects is 0.07.

    What is the probability the chip does not have defective etching?

    What is the probability it has at least one defect?

    What is the probability it has neither defect?

    Math 321 - Dr. Minnotte 156

    Equally Likely Outcomes If S consists of N equally likely outcomes,

    and event A consists of k of them, P(A) = k/N.

    Example: A fair die (see slides 148, 153). Example: Draw a card at random from a

    standard deck (52 cards, 13 spades). What is the probability of drawing a spade?

    Example: A shipment of 1000 hard drives contains 6 which do not work. If we draw one at random, what is the probability of selecting a defective drive?

    Math 321 - Dr. Minnotte 157

    Conditional Probability (3.2) Suppose we have partial information about

    the outcome of an experiment. In particular, suppose we know that the event B has occurred.

    We may use this information to revise the probability of another event, A.

    We call the revised probability a conditional probability, as it depends on the condition of B being true.

  • Math 321 - Dr. Minnotte 53

    Math 321 - Dr. Minnotte 158

    Example: Fair die. Let A = {1, 3, 5} P(A) = 3/6 = 1/2 B = {1, 2, 3} P(B) = 3/6 = 1/2 P(A B) = P({1, 3}) = 2/6 = 1/3 If I roll the die and, without showing you, tell

    you event B has occurred (I rolled no greater than 3), now what is the probability of event A?

    Math 321 - Dr. Minnotte 159

    Since B has occurred, the sample space reduces to B: {1, 2, 3}.

    Two of the three possibilities are odd (in A), and the chances are still equal. So P(A|B) = 2/3.

    Once we know the roll is 3 or less, the probability increases to 2/3 that its odd.

    Math 321 - Dr. Minnotte 160

    Definition: The conditional probability of A given B is

    (undefined if P(B) = 0).

    This is the probability, given that event Bhas occurred, that event A has also occurred.

    Die:

  • Math 321 - Dr. Minnotte 54

    Example (continued from slide 155): P(defective etching) = 0.12. P(crack defect) = 0.29. P(etching and crack defects) = 0.07.

    If a chip has a crack defect, what is the (conditional) probability that it also has defective etching?

    Math 321 - Dr. Minnotte 161

    What is the probability that a chip has a crack defect but satisfactory etching?

    If a chip has a crack defect, what is the probability that it has satisfactory etching?

    Note: P(A|B) = 1 P(Ac|B) , just like P(A) =1 - P(Ac).

    Math 321 - Dr. Minnotte 162

    If a chip has defective etching, what is the probability that it also has a crack defect?

    No relationship between P(A|B), P(B|A).

    Math 321 - Dr. Minnotte 163

  • Math 321 - Dr. Minnotte 55

    Math 321 - Dr. Minnotte 164

    Independence Definition: If P(A B) = P(A) P(B), we say

    A and B are independent.

    If A and B are independent, P(A)>0, P(B)>0, then

    Likewise, P(B|A) = P(B). Your book uses this as the definition of independence.

    Math 321 - Dr. Minnotte 165

    Assuming P(A)>0, P(B)>0, any one of P(A B) = P(A) P(B) P(A|B) = P(A) P(B|A) = P(B)proves independence and the other two.

    Math 321 - Dr. Minnotte 166

    Example: Draw one card at random from a well-shuffled deck. Define:

    A = {draw a club} B = {draw an ace} C = {draw a red card}

    Are A and B independent? A and C?

  • Math 321 - Dr. Minnotte 56

    Note that events being mutually exclusive and their being independent is not the same thing.

    Show: If P(A) > 0, P(B) > 0, and A and B are mutually exclusive, they cannot be independent!

    Math 321 - Dr. Minnotte 167

    Math 321 - Dr. Minnotte 168

    Well often assume independence to calculate probabilities of intersections.

    Example: Roll a red die and a black die. A = {red 6} P(A) = 1/6 (fair dice) B = {black 6} P(B) = 1/6

    Results on one die shouldnt influence the other, so we assume independence.

    P(double-sixes) = P(A B) = P(A) P(B)= (1/6)(1/6) = 1/36.

    Math 321 - Dr. Minnotte 169

    This extends to more than 2 events.

    The multiplication law for independent events says that if events A1, A2, , Anare independent (that is, knowledge of any combination of the Ais does not change the probabilities of the remainder), then

    P(A1 A2 An) = P(A1) P(A2) P(An). Note: this is the probability that all n

    events occur.

  • Math 321 - Dr. Minnotte 57

    Math 321 - Dr. Minnotte 170

    Example: Flip a fair coin 4 times. Let Ai = {Flip i is a head}. P(Ai) = 1/2, i = 1, 2, 3, 4 Separate flips are independent. (Why?) P(4 heads) = P(A1 A2 A3 A4)

    = P(A1) P(A2) P(A3) P(A4)= (1/2) (1/2) (1/2) (1/2)= 1/16.

    Math 321 - Dr. Minnotte 171

    Example: Draw a card from a standard deck 3 times with replacement (replace and reshuffle after each draw).

    Let Ai = {Draw i is a spade}. P(Ai) = 13/52 = 1/4, i = 1, 2, 3 Separate draws are independent. (Why?) P(3 spades) = ?

    Math 321 - Dr. Minnotte 172

    What if events arent independent?

    Recall,

    Therefore, P(A B) = P(A|B) P(B).

    The general multiplication law:

    P(A1 and A2) = P(A1) P(A2|A1).

  • Math 321 - Dr. Minnotte 58

    Math 321 - Dr. Minnotte 173

    Example: Suppose we have 4 cards, labeled 1, 2, 3, and 4. Suppose we draw two at random without replacement. What is the probability both cards are odd?

    Math 321 - Dr. Minnotte 174

    Example: Suppose we draw two cards at random without replacement from a standard deck. What is the probability both cards are spades?

    Math 321 - Dr. Minnotte 175

    Random Variables (3.3)

    Definition: A random variable is a random number. It is obtained by assigning a number to each outcome of an experiment.

    Example: Roll a die. The number rolled is a random variable.

  • Math 321 - Dr. Minnotte 59

    Math 321 - Dr. Minnotte 176

    Example: Flip a coin 5 times. Is the sequence of heads and tails a random variable (Example: HHTHT)?

    Some random variables we could generate from 5 coin flips:

    X = # H Y = # H # T Z = # H before first T

    We usually denote random variables by capital letters from the end of the alphabet.

    Math 321 - Dr. Minnotte 177

    Example: Select a rat at random from a large colony. What are some possible random variables?

    Math 321 - Dr. Minnotte 178

    There are two main types of random variables: discrete and continuous.

    Definition: A discrete random variable can only take on a specified (countable) list of values. There is a gap between any two elements in its sample space.

    In practice, these are usually counts of some sort, and thus whole numbers.

    Example: Number of heads in 5 coin flips.

  • Math 321 - Dr. Minnotte 60

    Math 321 - Dr. Minnotte 179

    Definition: A continuous random variable may take any real number in some (set of) interval(s).

    Examples: Weight, lifetime.

    We will need to deal differently with discrete and continuous random variables.

    Math 321 - Dr. Minnotte 180

    Discrete Random Variables

    Definition: The probability mass function (p.m.f.) of a discrete random variable X is a function p() from the support of X to the real numbers, where

    p(x) = P(X = x) .

    Notation: X: capital letter, indicates a random variable. x: lowercase letter, indicates a specific value.

    Math 321 - Dr. Minnotte 181

    Example: Let X be the roll of a fair die. S = {1, 2, 3, 4, 5, 6} p(1) = P(X = 1) = 1/6 p(2) = P(X = 2) = 1/6 and so on.

    We might write

    p(x) = 1/6 x {1, 2, 3, 4, 5, 6}

  • Math 321 - Dr. Minnotte 61

    Math 321 - Dr. Minnotte 182

    Example: An industrial plant has 3 machines. The probability that X are operating at a given random time may be found from

    x 0 1 2 3p(x) 0.12 0.27 0.46 0.15

    Math 321 - Dr. Minnotte 183

    The laws of probability tell us that:

    1) ? p(x) ? for all p(x)

    2) x S p(x) = ?

    Math 321 - Dr. Minnotte 184

    A p.m.f. is plotted as spikes:

  • Math 321 - Dr. Minnotte 62

    Or as a probability histogram, with areas equal to probabilities:

    Math 321 - Dr. Minnotte 185

    Math 321 - Dr. Minnotte 186

    Math 321 - Dr. Minnotte 187

  • Math 321 - Dr. Minnotte 63

    Math 321 - Dr. Minnotte 188

    Continuous Random Variables

    Recall, a continuous random variable may take any value in some real interval.

    Continuous random variables are typically measurements (length, weight, lifetime, etc.).

    Math 321 - Dr. Minnotte 189

    With continuous random variables, we cant use a p.m.f. to find probabilities. Instead:

    Definition: A probability density function (density, p.d.f.), f(x), is a function which determines the probability properties of a continuous random variable. If X f(x), then

    Math 321 - Dr. Minnotte 190

    If f(x) is a p.d.f.:

    f(x) ? for all x, and

    Note: for a continuous random variable,

    Why?

  • Math 321 - Dr. Minnotte 64

    Math 321 - Dr. Minnotte 191

    Example: a continuous random variable has p.d.f.

    Is f(x) a true p.d.f.?

    Math 321 - Dr. Minnotte 192

    Example (continued): What is the probability that X will be between 0.5 and 1.0?

    P(2.5 X 3.0) = ?

    P(0.2 X 0.2) = ?

    P(X < 1.0) = ?

    Math 321 - Dr. Minnotte 193

  • Math 321 - Dr. Minnotte 65

    Math 321 - Dr. Minnotte 194

    Definition: The cumulative distribution function (c.d.f.), F(x), of a random variable is defined as

    F(x) = P(X x). If X is continuous,

    Math 321 - Dr. Minnotte 195

    Properties of continuous c.d.f.s:1) limx-F(x) = 0

    2) limxF(x) = 1

    3) F is nondecreasing (if x < y, F(x) F(y) ).4) P(a X b) = P(X b) P(X a)

    = F(b) F(a).

    This is often easier than integrating f(x).

    Math 321 - Dr. Minnotte 196

    Example (back to earlier p.d.f.):

    P(0.5 X 1.0) = ?(Compare to slide 192.)

  • Math 321 - Dr. Minnotte 66

    Math 321 - Dr. Minnotte 197

    The Population Mean Definition: The population mean (expectation,

    expected value) of random variable X is

    if X is discrete, and

    if X is continuous. It can be thought of as the long-term average

    of X, or the mean of a sample that follows the distribution of X perfectly.

    Math 321 - Dr. Minnotte 198

    Example: Die roll p(x) = 1/6 x{1, 2, , 6} = ?

    Example: Machines

    = ?

    x 0 1 2 3p(x) 0.12 0.27 0.46 0.15

    Math 321 - Dr. Minnotte 199

    Example:

    = ?

    Example:

    = ?

  • Math 321 - Dr. Minnotte 67

    Math 321 - Dr. Minnotte 200

    Expectations of Functions of Random Variables Given a random variable, X, suppose we

    are really interested in a function, h(X).

    The expected value of h(X) is

    if X is discrete, and

    if X is continuous.

    Example: X ~ p(x) = , x = 1, 2. What is E(X2)?

    Note: In general, E[h(X)] h[E(X)]. Example: For the above p.m.f., what is

    E(X)? [E(X)]2?

    Is E(X2) = [E(X)]2?

    Math 321 - Dr. Minnotte 201

    Math 321 - Dr. Minnotte 202

    The Population Variance and Standard Deviation Just as we have a population mean to

    measure of the center of a distribution, the population variance and standard deviation measure a distributions spread.

  • Math 321 - Dr. Minnotte 68

    Math 321 - Dr. Minnotte 203

    Definition: Let X be a random variable with mean . Then the population variance of X, 2, is

    Definition: The population standard deviation, , of random variable X is the square root of the variance of X.

    Math 321 - Dr. Minnotte 204

    Example: Die roll p(x) = 1/6 x{1, 2, , 6} = ? E(X2) = ? V(X) = ? = ?

    Example: p(x) = 1/2 x{3, 4} = ? E(X2) = ? V(X) = ? = ?

    Math 321 - Dr. Minnotte 205

    Example: Machines

    = ? E(X2) = ?

    V(X) = ?

    = ?

    x 0 1 2 3p(x) 0.12 0.27 0.46 0.15

  • Math 321 - Dr. Minnotte 69

    Math 321 - Dr. Minnotte 206

    Example:

    = ? E(X2) = ?

    V(X) = ?

    = ?

    Math 321 - Dr. Minnotte 207

    Linear Functions of Random Variables (3.4)

    Recall, a linear function (or linear combination) of variables x1, x2, , xn, is a function of the form

    f(x1,x2,,xn) = a1x1 + a2x2 + +anxn + bwhere b and all of the ais are fixed constants.

    Math 321 - Dr. Minnotte 208

    Given any random variables X1, X2, , Xnand known constants a1, a2, , an, and b, then

    E(a1X1 + a2X2 + + anXn + b) = a1E(X1) + a2E(X2) + + anE(Xn) + b .

    To find the expectation of a linear combination of random variables, we need only know the constants and the expectation of each random variable individually.

  • Math 321 - Dr. Minnotte 70

    Math 321 - Dr. Minnotte 209

    Example: Let X be a random temperature measured in degrees Celsius, with E(X) = 10. Let Y be the same temperature in degrees Fahrenheit, Y = 9/5 X + 32. What is E(Y)?

    Example: The expectation of the roll of a fair die is 3.5. What is the expectation of the sum of four such rolls?

    Independent Random Variables Recall, events are said to be independent

    if knowledge of one does not affect the probability of the other.

    Likewise, random variables X and Y are independent if knowing the value of Xdoes not affect probabilities of Y, no matter what value X takes (and vice-versa).

    Math 321 - Dr. Minnotte 210

    Math 321 - Dr. Minnotte 211

    If X and Y are independent, any event involving X alone will be independent from any event involving Y alone.

    P(X A and Y B) = P(X A)P(Y B) for any A and B.

    Draws with replacement are independent. Draws in a simple random sample are not

    independent, but may be treated as though they are if the sample size is much smaller than the population size.

  • Math 321 - Dr. Minnotte 71

    Math 321 - Dr. Minnotte 212

    If the random variables are independent, then

    V(a1X1 + a2X2 + + anXn + b) = a12V(X1) + a22V(X2) + + an2V(Xn) .

    Notes: The shift b does not affect the variance. The coefficients ai are squared. Dependent random variables require a more

    complex formula.

    Math 321 - Dr. Minnotte 213

    Example: Let the variance of the Celsius temperature X be V(X) = 25.

    What is the standard deviation of X?

    What is the variance of Y = 9/5 X + 32?

    What is the standard deviation of Y?

    Math 321 - Dr. Minnotte 214

    Example: The variance of the roll of a fair die is 35/12. What is the variance of the sum of four such rolls?

    If we take a single roll and multiply it by 4, what is the variance of the result? Why is this different?

  • Math 321 - Dr. Minnotte 72

    Math 321 - Dr. Minnotte 215

    Suppose X and Y each have mean 10 and variance 4. What are the mean and variance of Z = X Y?

    Math 321 - Dr. Minnotte 216

    Mean and Variance of the Sample Mean

    An important special case concerns the sample mean of the Xis,

    Note that is a linear combination of the Xis.

    Math 321 - Dr. Minnotte 217

    Theorem: If X1, X2, Xn are independent random variables, each with E(Xi) = and V(Xi) = 2, then

    and

    Proof:

  • Math 321 - Dr. Minnotte 73

    Math 321 - Dr. Minnotte 218

    Example: A (possibly biased) coin has probability p of coming up heads. We flip it and let X = 1 if heads, 0 if tails.

    What are E(X) and V(X)?

    Suppose we flip it n times, and look at

    Chapter 4: Common Distributions Often we will have useful mathematical forms

    which represent entire families of distributions.

    These distributions include one or more constants (called parameters) which must be specified to define a specific distribution.

    We will concentrate on two especially important families, the binomial and normal distributions.

    Math 321 - Dr. Minnotte 219

    The Binomial Distribution (4.1) The binomial distribution is the most

    important common named family of discrete distributions.

    Recall, a discrete distribution is described by a probability mass function p(), where

    p(0) = P(X = 0) p(1) = P(X = 1) and so on.

    Math 321 - Dr. Minnotte 220

  • Math 321 - Dr. Minnotte 74

    Suppose our experiment consists of trials with only two possible outcomes.

    One outcome called a success occurs with probability p.

    The other outcome is called a failure, and occurs with probability (1 p).

    Such a process is called a Bernoulli trial(after 17th-century probabilist James Bernoulli).

    The binomial distribution looks at a fixed number of independent identical Bernoulli trials, and counts the number of successes.

    Math 321 - Dr. Minnotte 221

    Example: Suppose silicon computer chips are made in pairs, and that 30% of all chips produced are defective.

    Also assume that the chips in a pair are independent of each other.

    Out of pairs in which the first chip is good, the second is defective in 30% of pairs. This remains true for pairs in which the first chip is defective.

    Math 321 - Dr. Minnotte 222

    Out of all pairs, 70% will have a good first chip. Out of those, 70% will also have a good second chip. Overall, 70% of 70%, or 49% (.7*.7 = .49) will have two good chips.

    Likewise, 30% of that 70%, or 21% overall (.7*.3 = .21) will have a good first chip and a defective second chip.

    By the same reasoning, 30% will have a defective first chip, and 70% of those (21% overall) will have a good second chip.

    Finally, 30% of 30%, or 9% will have both chips defective.

    Math 321 - Dr. Minnotte 223

  • Math 321 - Dr. Minnotte 75

    If we let the letter S (for success) represent a good chip, and F (for failure) represent a defective one, we can summarize as:

    P(SS) = .7*.7 = .49 P(SF) = .7*.3 = .21 P(FS) = .3*.7 = .21 P(FF) = .3*.3 = .09

    Math 321 - Dr. Minnotte 224

    Now let X be the number of good chips produced in a pair.

    Then X can take the values 0, 1, or 2.

    From the above, p(0) = P(X = 0) = P(FF) = .09 p(2) = P(X = 2) = P(SS) = .49 p(1) = P(X = 1) = P(SF or FS) = .21 + .21

    = .42

    Math 321 - Dr. Minnotte 225

    What if the chips are produced in sets of 4?

    If we want the probability of a set consisting of 2 good and 2 defective chips, we can think about the case of SSFF the first and second chips are good, while the third and fourth are defective.

    The probability of this particular outcome will be .7*.7*.3*.3 = .0441 or 4.41%.

    Math 321 - Dr. Minnotte 226

  • Math 321 - Dr. Minnotte 76

    But there are other ways we can have two successes and two failures 5 other ways, in this case:

    P(SSFF) = .7*.7*.3*.3 = .0441 P(SFSF) = .7*.3*.7*.3 = .0441 P(SFFS) = .7*.3*.3*.7 = .0441 P(FSSF) = .3*.7*.7*.3 = .0441 P(FSFS) = .3*.7*.3*.7 = .0441 P(FFSS) = .3*.3*.7*.7 = .0441

    Overall, p(2) = P(X = 2) = 6*.0441 =.2646.

    Math 321 - Dr. Minnotte 227

    Math 321 - Dr. Minnotte 228

    In general, suppose we have an experiment consisting of n independent Bernoulli trials.

    Those trials which satisfy the condition we wish to count are called successes, and occur with probability p.

    The remaining trials are called failures; these occur with probability (1 p).

    Let X be the number of successes in the full experiment.

    Math 321 - Dr. Minnotte 229

    If these conditions are true, we say that X, the number of successes in the experiment, has a binomial distribution with parameters n and p.

    X Binomial(n, p) or X Bin(n, p) . The mass function for X is:

  • Math 321 - Dr. Minnotte 77

    Math 321 - Dr. Minnotte 230

    Note: the exclamation mark is pronounced factorial.

    Given n items, n! is the number of arrangements, and is found as

    n! n (n-1) (n-2) 2 1.

    Since there is one (empty) way to arrange 0 objects, we define 0! = 1.

    Example: The chips (30% defective) are produced in batches of 4. Let X be the number of good chips in a batch.

    What distribution does X follow?

    What is p(2)?

    What is the probability that a random batch will contain no more than one good chip?

    Math 321 - Dr. Minnotte 231

    Example: In a genetics study, a second-generation cross of pure green peas with pure yellow peas leads to pods where p = P(yellow) = .

    If pods contain 8 seeds, what is the probability that a random pod will contain 6 yellow seeds?

    What is the probability that a random pod will contain at least 6 yellow seeds?

    Math 321 - Dr. Minnotte 232

  • Math 321 - Dr. Minnotte 78

    Table A.1 in your book can save calculations by providing probabilities of P(X x) for n 20 and certain values of p.

    Example: Draw 16 times with replacement from a standard deck, and let X = number of spades drawn.

    Find P(X > 6).

    Math 321 - Dr. Minnotte 233

    With standard distributions, the mean and variance may generally be found as a function of the parameters.

    If X Binomial(n, p), then = np. Example: If 75% of all seeds are yellow, and

    each pod contains 8 seeds, what is the mean number of yellow seeds per pod?

    Example: If we have 4 fair coins which we flip as a batch, what is the mean number of heads?

    Math 321 - Dr. Minnotte 234

    Additionally, if X Bin(n, p), then 2 = np(1 p).

    Example: X = # yellow seeds ~ Bin(8, .75). What are the variance and standard deviation of X?

    Example: X = # heads in 4 flips ~ Bin(4, .5). What are the variance and standard deviation of X?

    Math 321 - Dr. Minnotte 235

  • Math 321 - Dr. Minnotte 79

    Recall, draws without replacement (simple random samples) are not independent.

    However, we may do calculations as though they are independent (including binomial calculations) as long as the sample size is small (less than 5%) compared to the population size.

    Math 321 - Dr. Minnotte 236

    Example: A lot of several thousand components contains 7% defective. We sample 8 at random.

    What is the probability of no defective components in our sample?

    What is the probability of at least one defective?

    What is the expected number of defectives in our sample?

    Math 321 - Dr. Minnotte 237

    Math 321 - Dr. Minnotte 238

    The Normal Distribution (4.3) The continuous normal (or Gaussian)

    distribution has two parameters, and 2. If X ~ N(, 2),

    This distribution is often seen in practice, and is also very important theoretically.

  • Math 321 - Dr. Minnotte 80

    Math 321 - Dr. Minnotte 239

    The normal p.d.f. is a bell-shaped curve, symmetric around, and with its peak at, . E(X) = .

    Its width is determined by 2; large values of 2imply a wide, low curve, while small values imply a narrow, tall one. V(X) = 2.

    Math 321 - Dr. Minnotte 240

    An important special case is the standard normal distribution, with = 0 and 2 = 1.

    We usually identify standard normal variables with the letter Z.

    If Z is standard normal, Z~N(0,1) and the density of Z is

    Math 321 - Dr. Minnotte 241

    There is no closed-form integral for the normal probability density function, so we cant find probabilities that way.

    To find normal probabilities, we must use computer programs (which themselves use numeric integration), or tables such as Table A.2 (p. 521-522, and inside the front cover of your book) of the standard normal distribution.

  • Math 321 - Dr. Minnotte 81

    Math 321 - Dr. Minnotte 242

    Math 321 - Dr. Minnotte 243

    Examples: P(Z 1.00) = ?

    P(Z > 1.00) = ?

    P(-2.00 Z 0.75) = ?

    Math 321 - Dr. Minnotte 244

    For X ~ N(, 2), we find proportions by converting to standard units.

    If X ~ N(, 2), then Z = (X - )/ ~ N(0,1). Remember to convert both sides of any

    inequality the same way.

  • Math 321 - Dr. Minnotte 82

    Math 321 - Dr. Minnotte 245

    Examples: Let X ~ N(3, 4). P(X 6.00) = ?

    P(X > 4.00) = ?

    Math 321 - Dr. Minnotte 246

    Normal Percentiles Just as for samples, the pth percentile of a

    distribution has p% of the probability below it, and (100 p)% above.

    We find percentiles for the normal distribution using Table A.2 again, but reading from the inside out.

    Since probabilities are in the middle of the table, start there.

    Read to the outside to find the percentile.

    Math 321 - Dr. Minnotte 247

    Example: Z ~ N(0, 1). What is the 70thpercentile of Z?

    Example: What is the 25th percentile of Z?

  • Math 321 - Dr. Minnotte 83

    Math 321 - Dr. Minnotte 248

    For non-standard normal variables, first find the desired percentile for the standard normal, then use the fact that since

    Z = (X - )/, therefore X = + Z. Example: X ~ N(10, 25). What is the 95th

    percentile of X?

    Besides the binomial and normal distributions, there are a number of other named families of distributions with useful properties.

    For example, the Poisson distribution (Section 4.2) is useful for modeling random counts in a fixed interval of time or space.

    See Sections 4.4-4.6 for discussion of the lognormal, exponential, gamma, and Weibulldistributions, which are useful for modeling continuous histograms which are positively skewed and unimodal.

    Math 321 - Dr. Minnotte 249

    Sampling Distributions (4.8) Suppose random variable X is drawn from

    some distribution f. (X ~ f )

    Now suppose we generate n of these random variables, X1, Xn, independently from f.

    We say that X1, Xn make a random sample from f.

    Sometimes we say that X1, Xn are i.i.d. (independent and identically distributed) from f.

    Math 321 - Dr. Minnotte 250

  • Math 321 - Dr. Minnotte 84

    Since the Xs make a sample, we can compute sample statistics such as the mean,

    Recall (3.4), since the Xs are random, so is and since it is a number, is itself a random variable with a distribution.

    This distribution is referred to as the sampling distribution of and plays a large role in inferential statistics.

    Math 321 - Dr. Minnotte 251

    Example: Let pX(x) = 1/3, x = 1, 2, 3, and let X1 and X2 be independent draws from pX(x).

    Now let = (X1 + X2)/2 be the average of X1 and X2.

    Note that is also a discrete random variable, and therefore has a probability mass function.

    What is the mass function (sampling distribution) of ?

    Math 321 - Dr. Minnotte 252

    Example: Suppose X ~ N(50, 4). A histogram of 1000 Xs looks like this:

    Math 321 - Dr. Minnotte 253

  • Math 321 - Dr. Minnotte 85

    Sample 25 Xs and compute

    If we repeat this process 1000 times, we get a histogram such as this:

    Math 321 - Dr. Minnotte 254

    Note that has a distribution that: Is centered on 50 (); Is narrower than the solid normal curve for the

    individual Xs the variance and standard deviation of are smaller than those of X.

    Remains bell-shaped and (roughly?) normal.

    Understanding the distributions of sample statistics and their relationships to the associated population parameters is the basis of most of inferential statistics.

    Math 321 - Dr. Minnotte 255

    In general, if a sample statistic is used to estimate a population parameter:

    The sampling distribution of the statistic is centered on (or at least near) the parameter.

    The spread of the sampling distribution will decrease as the sample size gets larger.

    As the sample size gets larger, the shape of the sampling distribution will usually get more and more bell-shaped (normal).

    Math 321 - Dr. Minnotte 256

  • Math 321 - Dr. Minnotte 86

    Let be the sample mean of a random sample X1, X2, Xn, from a population or process with mean and standard deviation . Then (recall, Section 3.4):

    The mean of the sampling distribution of , , is , the population mean, regardless of

    sample size n. The standard deviation of the sampling

    distribution of , , is , the population standard deviation divided by the square root of the sample size.

    Math 321 - Dr. Minnotte 257

    Sampling Distributions of the Mean

    The standard deviation of the sample mean, , is often called the standard error of the sample mean.

    This emphasizes that it describes a sampling distribution, not a population.

    Math 321 - Dr. Minnotte 258

    As the sample size gets larger, we have more information and can make better estimates, so the standard error decreases.

    (Note, however, that the square root means we have diminishing returns; each new observation provides less new information than the previous one.)

    The larger the sample, the closer is likely to be to .

    Math 321 - Dr. Minnotte 259

  • Math 321 - Dr. Minnotte 87

    Math 321 - Dr. Minnotte 260

    If our original population has a normal distribution, the sampling distribution of is also normal, regardless of sample size.

    Example: An automated filling machine fills soft drink cans with a volume that has a normal distribution with = 0.05 ounces.

    If we sample 4 cans and take the sample mean, what is the probability that will be within 0.04 ounces of the population mean ?

    Math 321 - Dr. Minnotte 261

    The Central Limit Theorem The Central Limit Theorem is the most

    important theorem in statistics.

    It shows the importance of the normal distribution, and provides the justification of many of the most fundamental statistical methods.

    Math 321 - Dr. Minnotte 262

  • Math 321 - Dr. Minnotte 88

    Math 321 - Dr. Minnotte 263

    If we know that a population or process has a normal distribution, we know that the sampling distribution of will also be normal. This allows us to compute useful probabilities.

    Unfortunately, we often do not know the population distribution (or perhaps we know that it is not normal).

    Fortunately, this is not always required.

    Math 321 - Dr. Minnotte 264

    The sample mean (or sum) of a large number of independent random variables has a sampling distribution which is approximately normal, no matter what distribution the original random variables come from.

    This important result is the Central Limit Theorem.

    Math 321 - Dr. Minnotte 265

    Theorem (Central Limit Theorem): If X1, X2, Xn are independent random variables, from a population or process with mean and standard deviation , then as long as n is sufficiently large,

    We can use this to find probabilities for sums or averages, without knowing the distribution of the Xis!

  • Math 321 - Dr. Minnotte 89

    Math 321 - Dr. Minnotte 266

    Math 321 - Dr. Minnotte 267

    Example: The (population) mean time required for maintenance on an air-conditioning unit is 1 hour, and the standard deviation is also 1 hour. A company operates 50 such units.

    Could we find the probability that the maintenance on a single unit requires more than 2 hours from the information given?

    Math 321 - Dr. Minnotte 268

    What is the probability that the average time for maintenance will be more than 75 minutes?

    What is the probability that the total time for maintenance will be less than 40 hours?

  • Math 321 - Dr. Minnotte 90

    How large is large? As a general rule, n 30 is usually large

    enough that the Central Limit Theorem is reasonable.

    Symmetric populations can get by with much less, often as few as 10, or even fewer.

    Highly skewed populations require more. 50 or more should be fairly safe in all but the worst cases.

    Math 321 - Dr. Minnotte 269

    The Normal Approximation to the Binomial Distribution Recall, if X ~ B(n, p), then E(X) = np and

    V(X) = np(1-p). If the particular values of n and p lead to a

    binomial distribution which is not very skewed, the distribution can be a good approximation to the B(n,p) distribution.

    We usually require that np 10 and n(1-p) 10 .

    Math 321 - Dr. Minnotte 270

    Example: Roll a die 120 times and count the number of 6s rolled (X).

    What distribution does X follow?

    What are E(X) and V(X)?

    What is P(X 25)?

    Math 321 - Dr. Minnotte 271

  • Math 321 - Dr. Minnotte 91

    The true binomial probability is 0.136.

    Were pretty close, but we can do better.

    Binomial probabilities are located entirely on the integers, but normal probabilities are smeared out over the whole real line (remember the probability histogram).

    Well get a better approximation if we use a continuity correction, by taking the normal probability from (x - .5) to (x + .5) to approximate the binomial P(X = x).

    Math 321 - Dr. Minnotte 272

    Math 321 - Dr. Minnotte 273

    So, for X ~ B(120, 1/6),

    P(X 25) = P(X 24.5) =

    Example: If X~Bin(120, 1/6), use the normal approximation to estimate P(15 < X < 25).

    Math 321 - Dr. Minnotte 274

  • Math 321 - Dr. Minnotte 92

    Math 321 - Dr. Minnotte 275

    Chapter 5: Statistical Estimation

    The remainder of the course will focus on inferential statistics.

    Recall, in probability, we generally know the distribution in question and wish to calculate something about particular outcomes or events.

    In inferential statistics, we have a sample, and wish to use that information to say something about the population or distribution the sample was drawn from.

    Math 321 - Dr. Minnotte 276

    Population Sample

    Probability

    Inferential Statistics

    Math 321 - Dr. Minnotte 277

    Recall: A parameter is an unknown quantity related to a population or distribution.

    A statistic is a known quantity which can be calculated from a dataset.

    Estimation uses a statistic (what we know) to tell us something about an unknown parameter (what we wish we knew).

  • Math 321 - Dr. Minnotte 93

    Math 321 - Dr. Minnotte 278

    Definition: A point estimate of a parameter , is a statistic, , which represents a best guess for .

    Example: We have an unknown distribution, X ~ f(x), and we wish to know the unknown parameter = E(X). We take a sample X1, X2, Xn, and estimate with the known statistic .

    Point Estimation (5.1)

    Math 321 - Dr. Minnotte 279

    Other common point estimates:

    Estimate V(X) = 2 with . If X ~ Binomial(n, p) (n known, p

    unknown), estimate p with .

    All of our standard sample statistics (median, quartiles, etc.) are good estimates of the corresponding population or distribution parameters.

    Properties of Estimates There are a few properties that we like to

    see in a parameter estimate.

    On average (over many samples), an estimate should give the correct value for the parameter. If the mean of the sampling distribution of our estimate is the parameter we are estimating, that is,

    we say that is an unbiasedestimate of .

    Math 321 - Dr. Minnotte 280

  • Math 321 - Dr. Minnotte 94

    Example: We know that so is an unbiased estimate of .

    Also, and (proof:)

    so the sample variance and proportion are unbiased estimates of the population variance and proportion.

    This is why we divide by (n 1) instead of n to find s2.

    Math 321 - Dr. Minnotte 281

    On the other hand, the sample standard deviation, s, has so s is a biasedestimate for .

    Fortunately, the bias (defined asor more generally, ) is small, especially as n gets large.

    Math 321 - Dr. Minnotte 282

    Note that just because an estimate is unbiased, does not guarantee that it will give you the exact parameter on this (or possibly, any) sample.

    Example: X ~ Binomial(n = 25, p = 0.3). Even though is unbiased for p, there is no value of X that will give

    Remember our sampling distributions; an unbiased estimates distribution will be centered correctly, but it will still have some spread.

    Math 321 - Dr. Minnotte 283

  • Math 321 - Dr. Minnotte 95

    The variance of the sampling distribution of our estimate measures that spread and is also important in measuring how well it performs.

    Math 321 - Dr. Minnotte 284

    We combine these two aspects into a single measure, the mean squared error:

    A small MSE means that both bias and variance are small.

    Math 321 - Dr. Minnotte 285

    Example: Suppose X1 and X2 are independent, with E(X1) = E(X2) = and V(X1) = V(X2) = 2.

    Let

    Find:

    Math 321 - Dr. Minnotte 286

  • Math 321 - Dr. Minnotte 96

    Example (continued): Let

    Find:

    For what values of and 2 is

    Math 321 - Dr. Minnotte 287

    Math 321 - Dr. Minnotte 288

    Confidence Intervals (5.2) Having a good estimate is a good first step

    in learning about a population parameter.

    We should also be interested in how close our estimate is likely to be to the parameter.

    One approach is to calculate the standard error, remembering that we will usually be within 2-3 standard errors of the parameter (if we use an unbiased estimate).

    Math 321 - Dr. Minnotte 289

    Another way to look at this issue is that we know our estimate is incorrect. (We just dont know by exactly how much.)

    We can improve this situation by expanding our point estimate to an interval estimate, providing a range of plausible values for .

    Done carefully, we can identify how likely it is that our interval includes .

  • Math 321 - Dr. Minnotte 97

    Math 321 - Dr. Minnotte 290

    If our sample size, n, is large, we can use the Central Limit Theorem to give us the following.

    Math 321 - Dr. Minnotte 291

    Therefore, the interval

    is a random interval which covers the population mean with probability 0.95.

    We call such an interval a 95% confidence interval.

    This represents a set of plausible values of that are consistent with the data.

    Example: A random sample of 80 auto body shops for cost to repair a particular kind of damage have mean $472.36 and standard deviation $62.35.

    What is the 95% confidence interval for the mean of this population?

    Math 321 - Dr. Minnotte 292

  • Math 321 - Dr. Minnotte 98

    Math 321 - Dr. Minnotte 293

    Is it correct to say P(458.70 486.02) = 0.95 ?

    No! Nothing inside the probability statement is random. Recall:

    The random parts are the sample statistics.

    The interval is random, not the population parameter, .

    Math 321 - Dr. Minnotte 294

    If we constructed many 95% confidence intervals from independent datasets, wed get many different sample means and sample standard deviations, and each would lead to a different confidence interval.

    In the long run, about 95% of these different confidence intervals would contain the true parameter .

    Remember, randomness is in the sample and the interval, not in the parameter!

    Math 321 - Dr. Minnotte 295

  • Math 321 - Dr. Minnotte 99

    Math 321 - Dr. Minnotte 296

    We call the value 95% the confidence level. We say we are 95% confident that the population mean lies within the computed interval.

    We can select other confidence levels if desired, by replacing the critical value 1.96 with the Z-percentile that gives the appropriate center probability.

    A confidence level of 95% (1.96) is most common, but levels of 90% (1.645) and 99% (2.575) are also often used.

    In general, define zp to be the value, above which there is probability p in the tail of the standard normal distribution.

    Then zp will be the 100(1-p)th percentile of the standard normal distribution.

    For a 100(1-)% confidence interval, we use the critical value z/2.

    Example: What critical value would we use for an 80% confidence interval?

    Math 321 - Dr. Minnotte 297

    Math 321 - Dr. Minnotte 298

  • Math 321 - Dr. Minnotte 100

    Math 321 - Dr. Minnotte 299

    Math 321 - Dr. Minnotte 300

    What factors affect the length (precision) of the confidence interval?

    s If s is bigger, is less accurate, and the interval must be wider.

    Confidence level To be more confident of including the true value, we must make the interval wider.

    n as n gets bigger, the standard error of gets smaller, and the interval gets narrower.

    Math 321 - Dr. Minnotte 301

    If we require a 95% confidence interval of error width (interval half-width) no more than w, we can compute a (rough) minimum sample size if we have an estimate or upper bound for s.

    Of course, we can substitute the appropriate Zcritical value to find sample sizes for other confidence levels.

  • Math 321 - Dr. Minnotte 101

    Math 321 - Dr. Minnotte 302

    Example: Milk fill weights. n = 50, = 2.0727, s = 0.0711 Find a 95% confidence interval for .

    w = ?

    If we require w 0.01, how big should n be?

    Confidence Bounds Sometimes, we only wish to know a lower

    (or upper) bound on . We can generate one-sided confidence

    intervals, also called confidence bounds, in a similar way to the usual two-sided case.

    Math 321 - Dr. Minnotte 303

    If we have a large sample, then: A 95% lower confidence bound for is

    A 95% upper confidence bound for is

    To get 90%, 99%, or 100(1-)% bounds, replace 1.645 with 1.28, 2.33, or z,respectively.

    Math 321 - Dr. Minnotte 304

  • Math 321 - Dr. Minnotte 102

    Example: A sample of 48 Shear strength measurements give a mean of 17.17 N/mm2 and a standard deviation of 3.28 N/mm2.

    If we only care that the population mean shear strength is great enough, find a 90% lower bound on .

    Math 321 - Dr. Minnotte 305

    Math 321 - Dr. Minnotte 306

    For our normal-based confidence interval and level to be valid, we must know (or at least assume) that:

    The sample is a random draw from the population.

    The sample size n is large enough that the sample mean is approximately normally distributed and that s is a good estimate of .

    Math 321 - Dr. Minnotte 307

    Chapter 6: Hypothesis Testing Estimation (both point and interval) is

    useful for providing an idea of the value of a population parameter.

    Frequently, we may wish to investigate a more specific question about a parameter. For this purpose, we use the other major branch of inferential statistics, hypothesis testing.

  • Math 321 - Dr. Minnotte 103

    One-Sample Z-Tests (6.1-6.2) Example: (Milk data) Suppose our bottle-

    filling machine is supposed to dispense 2.04 L of milk. Recall, a sample of size 50 gave = 2.0727, s = 0.0711. Does the machine need to be recalibrated?

    To answer this, lets assume that the machine is working properly, and see how likely we are to get a sample mean as far or further from the expected value as the sample mean we actually saw (2.0727).

    Math 321 - Dr. Minnotte 308

    Math 321 - Dr. Minnotte 309

    More formally, we choose a null hypothesis, H0.

    This is a statement about a population parameter (say, ), generally that it is equal to the value of interest (denoted 0).

    Usually, the null hypothesis means everything is as it should be, or nothing interesting is happening.

    Here: H0: = 2.04 (= 0)

    Math 321 - Dr. Minnotte 310

    We also choose an alternative hypothesis, H1, that the null is incorrect.

    H1: 2.04

    The alternative is literally simply that the null is incorrect, but this is often the more interesting or important result.

  • Math 321 - Dr. Minnotte 104

    Math 321 - Dr. Minnotte 311

    Next, we compute a test statistic, under the assumption that H0 is correct.

    For large-sample tests on the population mean, , we usually use the z-statistic:

    Here: z = ? If H0 is true, and z ~ N(0, 1). Is z a typical value from a N(0, 1)

    distribution?

    Math 321 - Dr. Minnotte 312

    Formally, we find a P-value, the probability that a sample from the null distribution would give a test statistic as or more unusual as the one we just saw.

    Since H1: 2.04, we use a two-sidedP-value: P = P(|z| 3.25) (z ~ N(0,1)).

    From our table, if z ~ N(0,1),P (|z| 3.25) = .0012.

    Math 321 - Dr. Minnotte 313

    So we have two possibilities:1) H0 is correct, = 2.04, and we got very

    unlucky to happen to get the (roughly) 1 in 800 chance to get 2.0727 (or the equally unusual 2.0073), or

    2) H0 is wrong.

    Which seems more reasonable to believe? Since P is so small, we reject H0 and

    decide the filling machine does require recalibration.

  • Math 321 - Dr. Minnotte 105

    Math 321 - Dr. Minnotte 314

    All hypothesis tests follow this general pattern:

    1) We observe some difference in a sample and wish to decide if it reflects a true difference in the population.

    2) Identify the null and alternative hypotheses.3) Compute a test statistic which has a known

    distribution when the null hypothesis is true.4) Find a P-value: the probability of a statistic as

    or more unusual than the one we observed, when the null hypothesis is true.

    5) If P is small, reject the null hypothesis. Otherwise, fail to reject it.

    Math 321 - Dr. Minnotte 315

    This basic pattern holds for many different tests on different parameters with different assumptions.

    For questions about the population mean for a single population, we often use the one sample z-test demonstrated above.

    Math 321 - Dr. Minnotte 316

    Details on the one-sample z-test:1) We have a single population, and a

    specific value, 0, we wish to consider for the population mean.

    This may be a known population mean for some related population (see next example).

    Or it may be a desired population mean (example: milk data).

    A sample from the population will give a sample mean different from 0, even if that is the actual population mean.

  • Math 321 - Dr. Minnotte 106

    Math 321 - Dr. Minnotte 317

    2) Identify H0 and H1. H1 is a statement that something interesting is

    going on. It is usually what we wish to prove. We should decide if we care about a one-

    sided or two-sided alternative, ideally before we ever see data.

    Two-sided: H0: = 0 vs. H1: 0. One-sided: H0: 0 vs. H1: > 0

    or: H0: 0 vs. H1: < 0 We always compute z and P using 0, so = 0 is always part of H0.

    Math 321 - Dr. Minnotte 318

    Example: Example: A newspaper article says that college freshmen average 7.5 hours per week at parties.

    We suspect the number is lower at our college.

    H0 = ?

    H1 = ?

    Math