PS - Handbook.pdf

8/19/2019 PS - Handbook.pdf

1/108

CONTENTS

Lecture 1: Exploratory Data Analysis And Descriptive Statistics 11.1 Studying one variable at a time 11.2 Studying two variables at a time 61.3 Studying more than two variables at a time 6

1.4 Effect of Transformation 61.5 Percentiles 61.6 Exercises 8

Lecture 2: Probability 112.1 Probability 132.2 Operations with Probability 142.3 Additive Rules of Probability 152.4 Complement of Event A 152.5 Conditional Probability 162.6 Independent Events 17

2.7 Intersection of events A and B 172.8 Bayes’ Rule 192.9 Exercises 21

Lecture 3: Discrete Random variables 253.1 Introduction 253.2 Bernoulli Distribution 273.3 Binomial Distribution 273.4 Poisson Distribution 303.5 Exercises 32

Lecture 4: Continuous Random variables 334.1 Introduction 334.2 Exponential Distribution 364.3 Exercises 38

Lecture 5: Normal Distribution 395.1 Introduction 395.2 Normal as an approximating distribution 405.3 Exercises 41

Lecture 6:

Random Sampling and sampling distributions 436.1 Introduction 436.2 Exercises 46

Lecture 7: Test of Hypotheses 487.1 Introduction to Hypothesis Testing 487.2 Procedure 487.3 Confidence intervals for hypothesis testing 507.4 Proportions 517.5 Sample size 537.6 Hypothesis Test for Proportions 53

7.7 Exercises 54


2/108

Lecture 8: Type I and II Errors 568.1 Introduction 568.2 Type I and II Errors 568.3 Exercises 56

Lecture 9: Further Hypothesis Tests 58

9.1 Introduction 589.2 Comparison of two population means 589.3 The difference between two proportions 619.4 Paired samples 639.5 Exercises 65

Lecture 10: Inference for Variance 6710.1 Introduction 67

10.2 Confidence Interval for 2 6710.3 Confidence Interval for the Ratio of Two Variances 6810.4 Significance Test of Hypotheses about a Variance 68

10.5 Significance Test of Hypotheses about Two Variances 6910.6 Exercises 70

Lecture 11: Chi-squared Test 7211.1 Goodness-of-fit Test 7211.2 Test for Homogeneity 7411.3 Continuity Correction 7711.4 Exercises 78

Lecture 12: Regression Analysis 8012.1 Introduction 8012.2 Correlation 8212.3 Regression 8312.4 Exercises 87

Lab Assessment one 88

Lab Assessment two 90

Lab Assessment three 93

Lab Assessment four 94

Lab Assessment five 96

Lab Assessment six 98

Lab Assessment seven 99Lab Assessment eight 101

Lab Assessment nine 104

Lab Assessment ten 106


3/108

LECTURE 1

EXPLORATORY DATA ANALYSIS AND DESCRIPTIVE STATISTICS

Statistics can be divided into two major areas. Descriptive statistics comprises the statistical methods

dealing with the collection, tabulation and summarization of data, so as to present meaningful

information. Statistical inference, on the other hand, consists of the methods involved with the

analysis and interpretation of data that will enable the statistician to develop meaningful inferences

about the data. Both sub fields are interrelated; while descriptive statistics organizes the collected data

in a systematic manner, statistical inference analyses the data and enables one to produce significant

inferences about it.

A population is the totality of the observations with which a statistician is concerned. The

observations could refer to anything of interest, such as persons, animals or objects; it need not belimited to people. The size of the population is defined to be the number of observations in the

population. In collecting data concerning a population, the statistician is often interested in arriving at

conclusions involving the entirety of the population.

A sample is a subset of a population. A random sample of n observations is a sample with n

observations, selected in such a way that every such sample of the population has the same probability

of being selected. These samples are considered to be unbiased.

Often, a sample of the population is taken, data collected from it, and inferences about the population

are made based on the analysis of the sample data.

1.1 Studying one variable at a time

A stem-and-leaf plot is a graphical display showing the frequency of values in specifiedintervals. It is useful for small amounts of data as it retains the actual numerical values.

Example:

Stem Leaf

1 3456662 0000112233453 22334444484 1112345675 223367896 2697 458 9

The stem is an integer and the leaf is a decimal value.

A histogram is a graphical way to display the shape of the distribution.


4/108

A box-plot is a graphical summary of the distribution of a variable. The minimum, the 1stquartile, the median, the 3rd quartile and the maximum are used to construct a box-plot.

This is called the five-number summary.

1. The ends of the box are at the quartiles.

2. Mark the median with a line.

3. Observations more than 1.5 * IQR outside the box are considered to be outliers and

are marked with stars.

4. Whiskers extend from the ends of the box to the smallest and largest observations

that are not outliers.

Mean

The statistical mean of a set of observations is the average of the measurements in a set of data. The

population mean and sample mean are defined as follows:

Class mid points

F r e q u e n c y

32302826242220

40

30

20

10

0

Mean 25.74

StDev 2.389

N 250

Histogram of BMI

34

32

30

28

26

24

22

20

Box plot of BMI


5/108

Given the set of data values , , . . . ., from a finite population of size , the population mean is calculated as

1

=

Given the set of data values , , . . . ., from a sample of size , the sample̅ 1

=

The sample mean is often used as an estimator of the mean of the population from whence the sample

was taken. In fact, the sample mean is statistically proven to be a most effective estimator for the

population mean.

A tr immed mean of a set of values is a mean with a specified percentage of the largest and smallestvalues excluded from the calculation.

Median

The median of a set of observations is that value that, when the observations are arranged in an

ascending or descending order, satisfies the following condition:

1. If the number of observations is odd, the median is the middle value.2. If the number of observations is even, the median is the average of the two middle values.

The median is the same as the 50th percentile of a set of data.

Mode

The mode of a set of observations is the specific value that occurs with the greatest frequency. There

may be more than one mode in a set of observations, if there are several values that all occur with the

greatest frequency. A mode may also not exist; this is true if all the observations occur with the same

frequency.

Another measure of central location that is occasionally used is the midrange. It is computed as the

average of the smallest and largest values in a set of data.

Example 1.1: Given the following set of data

1.2 1.5 2.6 3.8 2.4 1.9 3.5 2.5 2.4 3.0

It can be sorted in ascending order:

1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8

The mean, median and mode are computed as follows:


6/108

x =10

0.34.25.25.39.14.28.36.25.12.1

= 2.48

x~ = (2.4 + 2.5) / 2

= 2.45

The mode is 2.4, since it is the only value that occurs twice.

The midrange is (1.2 + 3.8) / 2 = 2.5.

Note that the mean, median and mode of this set of data are very close to each other. This suggests

that the data is very symmetrically distributed.

Range

The range of a set of observations is the absolute value of the difference between the largest and

smallest values in the set. It measures the size of the smallest contiguous interval of real numbers that

encompasses all the data values.

Example 1.2: Given the following sorted data:

1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8

The range of this set of data is 3.8 - 1.2 = 2.6.

Variance and Standard Deviation

The variance of a set of data is a cumulative measure of the squares of the difference of all the data

values from the mean.

The population and sample variance are calculated as follows:

Given the set of data values , , . . . ., from a finite population of size , the population varianceis calculated as,

1 ( )

=

Given the set of data values , , . . . ., from a sample of size , the sample variance iscalculated as,

1 1 ( ̅)

=


7/108

Note that the population variance is simply the arithmetic mean of the squares of the difference

between each data value in the population and the mean. On the other hand, the formula for the sample

variance is similar to the formula for the population variance, except that the denominator in the

fraction is ( 1 ) instead of . Using the above formula, the sample variance is statistically provento be a most effective estimator for the variance of the population to which the sample belongs.

The standard deviation of a set of data is the positive square root of the variance.

Example 1.3: Given the following sorted data:

1.2 1.5 1.9 2.4 2.4 2.5 2.6 3.0 3.5 3.8

x = 2.48 as computed earlier

2 s =

110

1

((1.2 - 2.48) 2 + (1.5 - 2.48) 2 + (1.9 - 2.48) 2 + (2.4 - 2.48)2

+ (2.4 - 2.48) 2 + (2.5 - 2.48)2 + (2.6 - 2.48) 2 + (3.0 - 2.48) 2

+ (3.5 - 2.48)2 + (3.8 - 2.48)2)

= (1 / 9) × (1.6384 + 0.9604 + 0.3364 + 0.0064 + 0.0064 + 0.0004 + 0.0144 + 0.2704 + 1.0404

+ 1.7424)

= 0.6684

s = (0.6684) 1/2 = 0.8176

The sample variance can also be calculated as follows:

2

11

22

)1(

1 n

i

i

n

i

i x xnnn

s

Example 1.4: Given the above data, we can calculate s using the above formula:

n

i

i x1

2 = 2222222222 8.35.30.36.25.24.24.29.15.12.1

= 1.44 + 2.25 + 3.61 + 5.76 + 5.76 + 6.25 + 6.76 + 9.00 + 12.25

+ 14.44= 67.52

2 s =910

1

× (10 × 67.52- 28.24 )

= 0.6684

1.2 Studying two variables at a time

A two-way frequency table gives the number of cases within each combination ofcategories of two qualitative variables.

A Scatter plot is a two-dimensional graphical display of two quantitative variables.


8/108

1.3 Studying more than two variable at a time

A multiway frequency table or multidimensional contingency table displays the number

of cases within each combination of categories of several qualitative variables.

1.4 Effect of Transformation

A transformation of a variable is a mathematical manipulation of each value of the variable. When

we make a transformation, we transform the original scale of measurement for the variable to a new

scale.

Many statistical techniques require that the data is approximately normally distributed so we often

apply a transformation to the data. If the data is skewed to the right, we can try the natural logarithms

or the square root. If the data is skewed to the left, we can try a power transformation greater than one.

All these transformations are non-linear.


9/108

1.5 Percentiles

Percentiles are values in a given set of observations that divide the data into 100 equal parts. These

values can be denoted by , , . . . . . , where1 % of the data falls below (is less than or equal to) P 12 % of the data falls below P2

:

:

99 % of the data falls below P99

Percentiles can be calculated using a sorted list of observations or the cumulative frequency

distribution table corresponding to the observations. In the latter method, it is assumed that the values

in a class interval are uniformly distributed within it; extrapolation is then used to calculate the

percentiles. As this assumption is often untrue, percentile values can differ depending on whether raw

data or frequency distributions were used in the computation. Therefore, percentiles are often treated

as estimates for the value below which certain percentages of the observations fall.

Example 1.1: Given the following sorted list of observations:

0.7 0.8 0.9 1.1 1.2 1.4 1.9 2.2 2.2 2.3

2.5 3.1 3.2 3.3 3.4 3.8 3.9 4.0 4.1 4.2

4.3 4.6 4.7 5.0 5.2 5.5 5.6 5.8 5.9 6.1

6.4 6.6 6.8 7.0 7.7 8.2 8.9 9.2 9.5 9.9

P75 = 6.1, since 40 x 75 % = 30 and 6.1 is the 30th ranked value.

P45 = 4.0, since 40 x 45 % = 18 and 4.0 is the 18th ranked value.

P62 = 5.2, since 40 x 62 % = 24.8 and 5.2 is the 25th ranked value.

This set of observations has the following cumulative frequency distribution:

Measurements Cumulative Frequency Relative Cumulative Frequency

0.0 - 1.0 3 0.075

1.0 - 2.0 7 0.175

2.0 - 3.0 11 0.275

3.0 - 4.0 18 0.450

4.0 - 5.0 24 0.600

5.0 - 6.0 29 0.725

6.0 - 7.0 34 0.8507.0 - 8.0 35 0.875

8.0 - 9.0 37 0.925

9.0 - 10.0 40 1.000

Totals 40 1.000

The percentiles can also be calculated from the cumulative frequency distribution table, using

extrapolation to arrive at estimates:

6.0 + 1.0 ∗(0.750.725)(0.850.725) 6.0 +

0.0250.125 6.2


10/108

where 6.0 is the upper class limit of interval 5.0 - 6.0 with cumulative frequency 0.725, and 0.850 is

the cumulative frequency of the next interval, 6.0 - 7.0, with class width 1.0.

4.0 , since the interval 3.0 - 4.0 has a cumulative frequency of 0.45 5.0 + 1.0 ∗ (0.6200.600)(0.7250.600) 5.0 +

0.0200.125 5.16

where 5.0 is the upper class limit of interval 4.0 - 5.0 with cumulative frequency 0.600, and 0.725 is

the cumulative frequency of the next interval, 5.0 - 6.0, with class width 1.0.

The values of P75 and P62 differ between the two methods of calculation, while the values of P45 for both

methods are the same.

Deciles are values in a given set of observations that divide the data into 10 equal parts. These values

can be denoted by , , . . . . . , , where10 % of the data falls below D1

20 % of the data falls below D2

:

:

90 % of the data falls below D9

It is easy to see that

D1 = P10 D4 = P40 D7 = P70

D2 = P20 D5 = P50 D8 = P80

D3 = P30 D6 = P60 D9 = P90

Quartiles are values in a given set of observations that divide the data in 4 equal parts. These values

can be denoted by Q1, Q2 and Q3, where

25 % of the data falls below Q1



Again, it is obvious that , and .Deciles and quartiles are calculated in the same manner as percentiles.

1.6 Exercises

1. Construct a box-plot using the five-number summary, minimum, Q1, median, Q3, maximum,given as 48, 63, 70, 81 and 100 respectively.

2. Consider the following strength measurements.

66 117 132 111 107 85 89 79 91 97 138 103

111 86 78 96 93 101 102 110 95 96 88 122 115 92 137 91 84 96 97 100 105 104 137 80 104


11/108


12/108

In 1984-1985, 482,528 men and 496,949 women received bachelor’s degrees 143,390 men

and 142,861 women received master’s degrees, 21,700 men and 11,243 women received

doctorates, and 50,455 men and 24,608 women received first professional degrees.

(a) Arrange this information in one or more frequency tables.

(b) Discuss the relationship between sex and degree, separately for the two academic years.(c) Discuss the relationship between year and degree, separately for men and women.

1.7 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th edition)Exercises - Page 52 - Question 1.13 & 1.14


13/108

LECTURE 2

PROBABILITY

So far we have used tools of data analysis to learn about a collection of information. In formal statisticalanalysis, we go beyond the goals of data analysis. In general, statistical analysis (inference) involves

making probability statements about populations based on what we observe in our samples. The ideas

in probability that are needed for formal statistical inference are discussed in this lecture.

Statisticians use the word experiment to describe any process that generates a set of data.

An experiment is a process leading to a well-defined observation or outcome that generates a set of

data.

A simple example of a statistical experiment is the tossing of a coin. In this experiment there are only

two possible outcomes, heads and tails.We are particularly interested in the observations obtained by repeating the experiment several times

under the same conditions. In most cases the outcomes will depend on chance and, therefore, cannot

be predicted with certainty. When a coin is tossed repeatedly, we cannot be certain that a given toss

will result in head. However we know the entire set of possibilities for each toss.

The sample space is the set of all possible outcomes of the experiment and is denoted by S. Each of

the possible outcomes is called an element or a member of the sample space , or simply a sample point.

If the sample space has finite number of elements we can list them as follows.

The sample space of S, of possible outcomes when a coin is tossed, may be written as

S = {H, T}Where H and T corresponds to “heads” and “Tails”.

In some experiment it is helpful to list the elements of the sample space systematically by means of a

tree diagram.

Sample spaces with large or infinite number of sample points are best described by a statement or a

rule. For example, if the possible outcomes of an experiment are the set of cities in the world with a

population over 1 million, the sample space is written

S = { x | x is a city with a population over 1 million}

A finite sample space is a sample space that contains a finite number of outcomes.

The sample spaces that contain the outcomes of tossing a coin, drawings from a bag of mixed-colour

balls, and dealings from a regular 52-card deck are examples of discrete sample spaces.

A continuous sample space is a sample space that contains an interval of values.

Sample spaces that contain the outcomes of temperature readings, height measurements, and salaries

are examples of continuous sample spaces.

An event is a subset of the sample space and is denoted by E.

It may contain some, all or none of the outcomes comprising the sample space. If the event contains

only one sample point, it is a simple event . If the event contains two or more sample points, it is a

compound event . And if the event contains no sample points, it is known as a null space.


14/108

For any given experiment we may interested in occurrence of certain events rather than in the outcome

of a specific element in the sample space. For example we may interest in the event A that the outcome

when a die is tossed is divisible by 3. This will occur if the outcome is an element of the subset A =

{3,6} of the sample space S = {1,2,3,4,5,6} of tossing a die experiment.

To each event we assign a collection of sample points, which constitute a subset of the sample space.That subset represents all of the elements for which the element is true.

The complement of an event A with respect to S is the subset of all the elements of s that are not in

A. we denote the complement of a by the symbol A .

For example consider the sample space

S = {A, B, C, D, E}. Let A = {B, D}. Then A ={A, C, E}.

The intersection of two events A and B, denoted by the symbol B A is the event containing allelements that are common to A and B.

In the tossing of a die we might let A be the event that an even number occurs and B the event that a

number greater than 3 shows. Then the subsets A = {2,4,6} and B ={4,5,6} are subsets of the same

sample space S={1,2,3,4,5,6}. Both A and B will occur on a given toss if the outcome is an element of

the subset {4,6}, which is the intersection of A and B. So B A = {4,6}

For certain statistical experiments it is usual to define two events that cannot occur simultaneously.

Such events are said to be mutually exclusive.

Two events A and B are mutually exclusive, or disjoint if B A , that is, if A and B have noelements in common.

The Union of the two events A and B, denoted by the symbol B A , is the event containing all theelements that belong to A or B or both.

Example2.1:

Consider tossing a die and observing the number that appears on top face. This has a well-defined

outcome that is top face can be 1,2 3, 4, 5 or 6. So this can be taken as an experiment .The sample space S of the experiment is S = {1,2 ,3,4,5,6}.

S consists of 6 definite outcomes. So S is a finite sample space.

Some events on this sample space can be identified as even number occurs, odd number occurs and

number greater than 3 occurs.

Let A be the event that an even number occurs, B that an odd number occurs, and C that a numbergreater than 3 occurs. Then

Throwing the die

* * * * * *

| | | | | |

1 2 3 4 5 6


15/108

A = {2,4,6}

B = {1,3,5}

C = {4,5,6}

C B = {1,3,4,5,6}C B = {5}

B A = {}, So A and B are mutually exclusive.

2.1 Probability

The probability of an event is the chance or likelihood of the event occurring.

In this chapter we consider only those experiments for which the sample space contains a finite number

of elements. The probability of an outcome or sample point is a real number, between 0 and 1 that

provides a measure of likelihood that the outcome or sample point will actually occur. A sample point

that absolutely cannot occur has a probability of 0, while a sample point that will always occur has a

probability of 1; all other sample points are assigned a probability based on this relative measure.

A probability function assigns a unique number or probability to each outcome.

The probability of an event A is the summation of the probabilities of all the sample points in A and

is denoted by P(A).

If event A is a subset of the sample space S, then 0 P(A) 1. If A = , then P(A) = P( ) =

0; if A = S, then P(A) = P(S) = 1. Otherwise, the value of P(A) is between 0 and 1.

Example2.2: A coin is tossed twice. What is the probability that at least one head occurs?

Solution:

The sample space for this experiment is S = {HH, HT, TH, TT}. If the coin is balanced each of these

outcomes would be equally likely to occur. Therefore, we assign a probability of w to each sample

point. Then 4w = 1,or w =1/4.If A represents the event of at least one head occurring, then

A = {HH,HT,TH} and4

3

4

1

4

1

4

1)( A P

If an experiment can result in any one of N different equally likely outcomes, and if exactly n of theseoutcomes correspond to event A, then the probability of event A is

N

n A P )(

Example2.3:

A mixture of candies contains 6 mints, 4 toffees, and 3 chocolates. If a person makes a random selection

of one of these candies, find the probability of getting (a) mint, or (b)a toffee or a chocolate.

Solution:

Let M, T, and C represent the events that the person selects, respectively, a mint, toffee, or chocolatecandy. The total number of candies is 13, all of which are equally likely to be selected.


16/108

(a) Since 6 of the 13 candies are mints, the probability of event M , selecting a mint at random, is

13

6)( M P

(b) Since 7 of the 13 candies are toffees or chocolates, it follows that 13

7 B A P

2.2 Operations with Probability

Often it is easier to calculate the probability of other events. This may well be true if the event in

question can be represented as the union or intersection of two other events or as the complement of

some event.

Just as events can be treated as sets, so can probabilities of an event (in a sense). The formulas used to

calculate the probability of unions, intersections and complements of events are similar to the ones

used for sets.


17/108

2.3 Additive Rules of Probability

Given that event A and event B are subsets of the sample space S, the following rules Union of events

A and B (Additive Rule of Probability)

If A and B are any two events, then B A P B P A P B A P

where B A P is the probability that either events A or B occur and B A P is the probability that both events A and B occur.

If A and B are mutually exclusive, then

B P A P B A P

Since if events A and B are mutually exclusive (i.e. A and B cannot occur together), P(A B) =P( ) = 0

In general we can write,If a set of events A1, A2 , A3,....., An are mutually exclusive, then

nn A P A P A P A A A P 21121

2.4 Complement of event A

Since S A A , and the event A and its complement are mutually exclusive,

1 A P A P A A A A P S P

A P A P 1

Example 2.4:What is the probability of getting a total of 7 or 11 when a pair of dice are tossed?

Solution:

Let A be the event that 7 occurs and B the event that 11 comes up. Now, a total of 7 occurs for 6 of

the 36 sample points and a total of 11 occurs for only 2 of the sample points. Since all sample points

are equally likely, we have 6 A P and 81 B P .The events A and B are mutually exclusive, sincea total of 7 and 11cannot both occur on the same toss. Therefore,

9

2

18

1

6

1 B P A P B A P

This result could also have been obtained by counting the total number of points for the event B A P , namely 8, and writing

.9

2

36

81

N

n B A P


18/108


19/108

95.082.0

78.0|

A P

A D P A D P

2.6 Independent Events

Although conditional probability allows for an alteration of the probability of an event in the light of

additional material, it also helps to understand the concept of Independent Events. In the above

example A D P | differs from D P . This suggests that the occurrence of A influenced D. Howeverconsider the situation where we have events A and B and A P B A P | . In other words theoccurrence of B had no impact on the occurrence of A. Here the occurrence of A is independent of the

occurrence B.

Two events A and B are independent if and only if

B P A B P | and A P B A P | . Otherwise, A and B are dependent.

2.7 Intersection of events A and B (Multiplicative Rules of probability)

If in an experiment the events A and B can both occur, then

)|( A B P A P B A P

Thus the probability that both A and B occur is equal to the probability that A occurs multiplied

by the probability that B occurs, given that A occurs. Since the events B A P and B A P are equivalent, it follows from above rule that we can also write

)|( B A P B P A B P B A P

If two events A and B are independent then

B P A P B A P

If in an experiment, the events k A A A A ,,,, 321 can occur, then

121213121321 || k k k A A A P A P A A P A P A A P A P A A A A P If the events k A A A A ,,,, 321 are independent, then

k k A P A P A P A P A A A A P 321321

Example 2.5:A card is drawn from a regular deck of 52 cards. Event A is the event that the card drawn is a Jack.

Event B is the event that the card drawn is a diamond. Find the probability that the

a. card drawn is a diamond and a Jack. b. card drawn is a Jack given that the card is a diamond.c. card drawn is a diamond given that the card is a Jack.

Solution:

P(A) = 4 / 52 = 1 / 13 since there are 4 Jacks in the deck

P(B) = 13 / 52 = 1 / 4 since there are 13 diamonds in the deck

P(A B) = 1 / 52 since there is only 1 Jack of diamonds in the deck


20/108

P(A|B) = P(A B) / P(B) = (1 / 52) / (13 / 52) = 1 / 13 = P(A)P(B|A) = P(A B) / P(A) = (1 / 52) / (4 / 52) = 1 / 4 = P(B)

As event A and B are two independent events we can get P (A B) using the formula also.That is P (A B) = P (A) P (B) = (1/13) (1/4) = 1/52

Example 2.6:A bag contains 6 blue balls and 4 red balls. Two balls will be drawn from the bag. Calculate the

probability of either one of the balls is blue.

Solution:

Let event A be the event that the first ball is blue, and let event B be the event that the second ball is

blue. Then, the event A' will be the event that the first ball is red, and event B' will be the event that

the second ball is red.

Since there are 6 blue balls out of a total of 10 balls, the probability of choosing a blue ball in the firstdrawing is 6/10. If a blue ball is taken out, then there will only be 5 blue balls and 9 total balls left;

the probability of choosing a blue ball will be 5/9. On the other hand, if the first ball is a red ball, then

there will be 6 blue balls and a total of 9 balls, in which case there would be a 6/9 (or 2/3) probability

of getting a blue ball.

Therefore

P(A) = 6/10

P(A') = 1 - 6/10 = 4/10

P(B|A) = 5/9

P(B|A') = 2/3

P(A B) = P(A) P(B|A) = (6/10)(5/9) = 1/3P(A' B) = P(A') P(B|A') = (4/10)(2/3) = 4/15

Since A and A' are mutually exclusive events, (B A) and (B A') are also mutually exclusiveevents. Thus, we can calculate P(B) as follows:

P(B)= P(B S) = P( B (A A') ) = P( (B A) (B A') )= P(B A) + P(B A')= 1/3 + 4/15

= 9/15

P (A B) = P(A) + P(B) - P(A B)

= 6/10 + 9/15 - 1/3

= 13/15

(A B) is the event that either one of the two balls drawn is blue. This being the case, P(A B)is the probability that the first ball is blue, plus the probability that the first ball is red and the second

ball is blue. Thus,

P (A B) = P(A) + P(A' B)= 6/10 + 4/15


21/108

= 13/15

This is a different way to obtain the solution, but the result is the same nevertheless (Events A and B

are independent events.)

2.8 Bayes’ Rule

If the set of events A1 , A2 , ....., An constitutes a partition of the sample space S, and event B is a

subset of S, then

B = B S= B (A1 A2 ......... An )= (B A1 ) (B A2 ) ....... (B An )

As the events A1 , A2 , ......, An are mutually exclusive, then the events (B Ai ), where i {1, 2,...., n }, is also mutually exclusive. Assuming that none of the events A1 , A2 ,....., An is null, i.e. P(Ai

) 0 , i {1, 2,...., n }

P(B) = P(B A1 ) + P(B A2 ) + ....... + P(B An)= P(A1) P(B | A1 ) + P(A2) P(B | A2) + ....... + P(An ) P(B | An )

Theorem of total probability

If the events k A A A ,...,, 21 constitute a partition of the sample space S such that, 0i A P fori=1,2,…,k, then for any event B of S,

k

i

ii

k

i

i A B P A P A B P B P 11

|

From the definition of conditional probability,

)(

)|()(

)(

)()|(

B P

A B P A P

B P

B A P B A P iiii

Thus we have derived Bayes' Rule, which states the following:

Bayes' Rule :

If the set of events A1 , A2 ,....., An constitutes a partition of the sample space S, P(Ai ) 0 , i {1,

2,....., n }, and event B is a subset of S, P(B) 0,

)|()(........)|()()|()(

)|()()|(

2211 nn

ii

i A B P A P A B P A P A B P A P

A B P A P B A P

Example 2.7:A family had plans to go fishing on a Sunday afternoon, but their plans were dependent on the weather

at noon Sunday. If it was sunny, then there was a 90 % chance that they would go fishing. If it was

cloudy, then the probability that they would go fishing would drop to 50 %. And if it was raining, the

chances dropped to 15 %. The weather prediction, which we can assume to be accurate, called for a10 % chance of rain, a 25 % chance of clouds, and a 65 % chance of sunshine.


22/108

Set event F as the event that the family goes fishing

S as the event that the weather is sunny at Sunday noon

C as the event that the weather is cloudy at Sunday noon

R as the event that the weather is rainy at Sunday noon

Assuming that the family ends up going fishing, find the probability of each type of weather occurring.

Solution:

P(S) = 0.65, P(C) = 0.25, P(R) = 0.10

Note that P(S) + P(C) + P(R) = 1, and of course S, C and R are mutually exclusive events.

P(F|S) = 0.90, P(F|C) = 0.50, P(F|R) = 0.15

P(F) = P(F|S) P(S) + P(F|C) P(C) + P(F|R) P(R)

= (0.90)(0.65) + (0.50)(0.25) + (0.15)(0.10)

= 0.585 + 0.125 + 0.015

= 0.725

Assuming that the family ends up going fishing, the probability of each type of weather occurring isP(S|F) = probability of sunny weather, given that the family went fishing.

=)(

)()|(

F P

S P S F P =

725.0

)65.0)(90.0( = 0.807

P(S|F) = probability of cloudy weather, given that the family went fishing.

=)(

)()|(

F P

C P C F P =

725.0

)25.0)(50.0( = 0.172

P(S|F) = probability of rainy weather, given that the family went fishing.

=)(

)()|(

F P

R P R F P =

725.0

)10.0)(15.0( = 0.021

Note that P(S|F) + P(C|F) + P(R|F) = 0.807 + 0.172 + 0.021 = 1.000

2.9 Exercises (Extracted from Schaum’s Series by Walpole & Mayer)

1. A pair of dice is tossed and the two numbers appearing on the top are recorded. Draw the samplespace and find the number of elements in each of the following events:

(a) A = { two numbers are equal }(b) B = { sum is 10 or more }(c) C = { 5 appears on first die }(d) D = { 5 appears on at least one die }

2. Determine the probability p of each event:(a) An even number appears in the toss of a fair die.


23/108

(b) At least one tail appears in the toss of 3 fair die.(c) A white marble appears in the random drawing of 1 marble from a box containing 4 white

marbles, 3 red marbles and 5 blue marbles.

3. A box contains 15 billiard balls, which are numbered from 1 to 15. A ball is drawn at random andthe number recorded. Find the probability P that the number is;

(a) Even(b) Less than 5(c) Even and less than 5(d) Even or less than 5

4. A class contains 10 men and 20 women of which half the women and half the men have browneyes. Find the probability P that a person chosen at random is a man or has brown eyes.


24/108

5. A sample space S consists of 4 elements, that is, S = 4321 ,,, aaaa . Under which of the followingfunctions P does become a probability space?

(a) 3.0,2.0,3.0,4.0 4321 a P a P a P a P

(b) 1.0,7.0,2.0,4.0 4321 a P a P a P a P

(c) 3.0,1.0,2.0,4.0 4321 a P a P a P a P (d)

1.0,5.0,0,4.0 4321 a P a P a P a P

6. Suppose A and B are events with ,6.0 A P ,3.0 B P and .2.0 B A P Find the probabilitythat:

(a) A does not occur.(b) B does not occur.(c) A or B occurs.(d) Neither A nor B occurs.

7. Three fair coins, a penny, a nickel, and a dime, are tossed. Find the probability p that they are allheads if:

(a) The penny is heads(b) At least one of the coins is heads,(c) The dime is tails

8. A billiard ball is drawn at random from a box containing 15 billiard balls numbered 1 to 15, andthe number n is recorded.

(a) Find the probability p that n exceeds 10.(b) If n is even, find the probability p that n exceeds 10.

9. In a certain college, 25 percent of the students failed mathematics, 15 percent failed chemistry, and10 percent failed both mathematics and chemistry. A student is selected random.

(a) If the student failed chemistry, what is the probability that he or she failed mathematics?(b) If the student failed mathematics, what is the probability that he or she failed chemistry?(c) What is the probability that the student failed mathematics or chemistry?(d) What is the probability that the student failed neither mathematics nor chemistry?

10. Find A B P | if :(a) A is a subset of B.(b) A and B are mutually exclusive (disjoint) Assume 0 A P .

11. If the probabilities are, respectively, 0.09,0.15,0.21, and 0.23 that a person purchasing a newautomobile will choose the color green, white, red, or blue, what is the probability that a given

buyer will purchase a new automobile that comes in one of those colors?

12. Suppose that a factory has a fuse box containing 20 fuses, of which 5 are defective. If 2 fuses areselected at random and removed from the box in succession without replacing the first, what is the

probability that both fuses are defective?

13. Items sampled on a production line may be classified as defective (D) or non-defective (N). Listelements in the sample space if sampling process terminates:

(a) After 4 items have been sampled.


25/108

(b) After 3 defectives in a row have been observed or 4 items have been sampled.(c) When the first defective is observed.

Suppose, 5% of the products are defective.

(d) Find the probability of exactly 2 defective items if sampling processes (a) is adopted.(e) In the sampling process (c), what is the probability that the sampling process is terminated

before the 3rd item is sampled?

14. It is compulsory for the driver of a car to wear a seat belt while driving. The results of a surveyshow that not all drivers are wearing seat belts.

Age Driver wearing

seat belt

Driver not

wearing seat belt

< 40 375 52

>= 40 425 148

Use the data to estimate the probability that a randomly chosen driver(a) Is wearing a seat belt.(b) Is under 40 and wearing a seat belt.(c) Suppose the randomly chosen driver is under 40. What is the probability that the driver is

wearing a seat belt?

15. In a certain region of the country it is known from past experience that the probability of selectingan adult over 40 years of age with cancer is 0.05. If the probability of a doctor correctly diagnosing

a person with cancer as having the disease is 0.78 and the probability of incorrectly diagnosing a

person without cancer as having the disease is 0.06, what is the probability that a person isdiagnosed as having cancer?

16. In a certain assembly plant, three machines, B1, B2, and B3, make 30%,45%,and 25%,respectively,of the products. It is known from past experience that 2%, 3%, and 2% of the products made by

each machine, respectively, are defective. Now, suppose that a finished product is randomly

selected. What is the probability that it is defective?


26/108

17. Suppose that three machines at a factory are used to produce a large quantity of identical parts. The production machines have different capacities: Machine A has a large capacity and produces 60%

of the parts, while machines B and C produce 30% and 10% of the parts, respectively.

Historical data indicate that 10% of the parts produce by Machine A are defective, compared to

30 % for Machine B and 40% for Machine C.

(a) Complete the following table.

(b) What are the conditional probabilities, updated in light of the evidence that the part is defective

of machine A, B or C having produced it?

2.9 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th edition)

Exercises - Page 97 - Question 2. 109, 2. 110, 2. 111, 2. 112, 2. 129

LECTURE 3

DISCRETE RANDOM VARIABLES

3.1 Introduction

Machine Defective Nondefective Total

A

B

C

Total 100


27/108

Definition: A random variable X is a numerically valued variable defined on the sample space, . R:X

We say that X is a discrete random variable if it can take only a countable set of values, i.e. integer

or rational values.

Consider tossing a fair coin. We know that the outcome is either a head or a tail.

P(head) =2

1 , P (tail) =

2

1

If we denote the number of heads by X, then

P(X = 1) =2

1 , P (X = 0) =

2

1

X is an example of a random variable. Note that a random variable is usually labelled with a capital

letter (say X). The realised value of the random variable X is denoted by x.

Definition: If we have a discrete random variable X taking values n21 x,.....,x,x

with probabilities n21 p,......., p, p respectively, where

i,0 p1 p..... p p p in321 ,

then this defines a discrete probability distribution for X. Although we have written the random

variable X as taking a finite set of values in this definition, it also holds for an X which takes an infinite

countable set of values, e.g. all non-negative integers.

We may write P( X = xi) as pi. This is sometimes referred to as the probability function for X.

Example 3.1: Two fair dice are thrown. Let X be the sum of the values on the faces turned uppermost.Find the probability distribution for X.

The sample space can be shown as follows.

X 2 3 4 5 6 7 8 9 10 11 12

P(X)1

36 2

36 3

36 4

36 5

36 6

36 5

36 4

36 3

36 2

36 1

36

Note that X = 2 if and only if both dice show 1. Also, X = 3 if and only if one die shows 1 and the

other 2.

Note that the sum of the probabilities is one and all are positive so this is a valid probability distribution.

Example 3.2: The discrete random variable X has probability function given by

P(X=x) = cx2 , x=1,2,3,4. Find C.

X 1 2 3 4

P(X) c 4c 9c 16c

We know that c + 4c + 9c + 16c = 1 and hence c =301


28/108

Definition: Suppose X is a discrete random variable taking values n21 x,.....,x,x , with probabilities

n321 p,....., p, p, p then the mean or expected value of X, written as or E[X] is given by

n

i 1

ii x pE(X)μ

If X takes an infinite number of values the sum is taken over all values of i.

To justify this definition, suppose we had a sample of x values where x occurs with frequency f . Thenthe sample mean would be

ixf

f

f

xf x

i

i

i

ii

In the limit if we collect enough data ii f f tends to p.

Example 3.3: A die is thrown, what is the mean (or expected) score?

6

16

6

15

6

14

6

13

6

12

6

11)( X E = 3.5

Note that the expected value of a random variable is not necessarily a value the random variable can

take. The expected score when we throw a. fair die is 2 1/2, but a die cannot take this value. Think of

the expected value or mean of a random variable as a measure of where the distribution is centred

around.

The expectation of any function of a random variable, g(X) say, is defined in a similar way.

Definition: If X is a discrete random variable then the expectation of X is given by

n

1i

ii )g(x pE[g(X)]

We can also define the variance and standard deviation of a random variable.

Definition: If X is a discrete random variable then its variance, written Var[X] is defined by

n

1i

2

ii μ)(x pVar[X]

The standard deviation of X is the positive square root of the variance of X.By multiplying out the bracket it is straightforward to see that the variance is given by

22ii μ)x p(Var[X] or 22 (E[X])]E[XVar[X]

Example 3.4 : A die is thrown, what is the variance of the score?

6

91

6

16

6

15

6

14

6

13

6

1.2

6

11]E[X 2222222

12

35

4

49

6

91]V[X

The variance gives an idea of how spread out the distribution is.


29/108

3.2 Bernoulli Distribution

X 0 1

P(x) 1-p P

Example 3.5 : Toss a coin once. Let p be the probability of getting a head and

X = 0 if T occurs

= 1 if H occurs

Then X ~ Bernoulli (p).

3.3 Binomial Distribution

If we have n Bernoulli trials with probability of a success equal to p then the probability of r successes

is given by the binomial probability

n.,2,1,0,r p)(1 pc)( r nr r n

r X P

Thus, if we consider the random variable X which is the number of successes in n Bernoulli trials, then

P(X = r) is given by the binomial probability with parameters n and p.

Statistical table 1 gives the probability of r or more successes in n independent trials with the

probability of success p. For example if we wanted the probability of obtaining 23 or more heads in50 tosses of a fair coin we find that the answer is 0.76006.

Example 3.6: Suppose that 5% of the articles made by a factory are defective. What is the probability

of finding 1 defective in a sample of 10 from a very large batch? Since it is a large batch we may treat

this as sampling without replacement and the number of defectives, X, will have a binomial distribution

with n =10 and

p = 0.05. Thus

475.095.005.01

10)1( 9

X P

We can also find this quantity from the tables, 31512.008614.040126.0)2()1()1( X P X P X P

The tables are only given for some values of n and p so are not always useful, but you should knowhow to use them. Note that although p is only given up to 0.5, we can always turn a problem where the

probability of a ‘success’ is greater than 0.5 into a question about ‘failures’ which will have probability

less than 0.5. An example of this is given next.

Example 3.7: Fifty seeds were planted and it is known that the probability of any seed germinating is

0.8. Assuming that the number of seeds germinating follows a binomial distribution, using tables find

the probabilities of the following events (a) exactly 40 seeds germinate,

(b) more than 12 seeds fail to germinate,(c) more than 38 but fewer than 45 seeds germinate.


30/108


31/108

np

p pnp

p pk

nnp

p pr

nr np

p pr

nn

p pr nr

nn

p pr nr

nr

p pr

nr

p pr

nr

r X rP X E

n

n

k

k nr

n

r

r nr

n

r

r nr

n

r

r nr

n

r

r nr

n

r

r nr

n

r

r nr

n

r

1

1

0

1

1

1

1

1

1

1

0

0

)]1([

)1(1

)1(1

1

)1(1

1

)1()!()!1(

)!1(

)1()!(!

!.

)1(

)1(

)()(

Recall that 22 ]][[][][ X E X E X Var

Now ][)]1([][][ 22

X E X X E X E X E

Then

2

22

2

0

22

2

22

2

2

2

0

)1(

)]1([)1(

)1(2

)1(

)1(2

2)1(

)1(2

2)1(

)1()!()!2(

)!2)(1(

)1()!(!

!)1(

)()1()]1([

pnn

p p pnn

p pk

n pnn

p pr

n pnn

p pr

nnn

p pr nr

nnn

p pr nr

nr r

r X P r r X X E

n

n

k

r nk

n

r

r nr

n

r

r nr

n

r

r nr

n

r

r nr

n

r


32/108

Thus )1()(.)1(][ 22 pnpnpnp pnn X Var

Example 3.8: The random variable X has a binomial distribution with parameters n=100 and p=0.8.

Find the mean and the variance of X.

The mean = np = 80, the variance is np (1- p) =16

3.4 Poisson Distribution

Suppose events occur at random at an average rate per minute. Examples include radioactive decayand arrivals in a queue. Then the distribution of the number of events which occur in one minute, is

said to have a Poisson distribution with parameter . If X has a Poisson distribution then

,2,1,0!

)exp(][ r r

r X P r

where >0. Note that

1

)exp()exp(

!)exp(

!)exp(

0 0

r r

r r

r r

So this is a valid probability distribution.

It can be shown that )(,)( X V X E

Statistical Table (3) gives the probability that a Poisson random variable with mean will be greateror equal to r in the same way as the binomial tables. For example, suppose that X is a random variable

with Poisson distribution with mean 2.0.Find )2()3()3()2()2()1( X P X P X P

27067.032332.059399.0)3()2()2( X P X P X P

40611.059399.01)2(1)2(

32332.0)3(

X P X P

X P

A property of the Poisson distribution is that if X is Poisson with mean then kX is Poisson with mean

k . This can be useful in calculating probabilities of numbers of event in a time period different tothat for which information is given.

Note that if X has a binomial distribution with parameters n and p that np X E )( and

)1(][ pnp X Var . Now if p is small then 1-p is close to one and np(1-p) np .This suggest that if pis small we may be able to approximate X by a Poisson random variable with mean np. So long as p

is small (may be < 0.1) and n is large (may be >50) a binomially distributed random variable is well

approximated by a Poisson random variable of mean np.

Example 3.9: IF X has a binomial distribution, n=100, p=0.01 then from the tables

36973.0)1(

026424)2(

63397.0)1(

X P

X P

X P


33/108

The corresponding quantities from the Poisson tables with =1 are

36788.0)1(

26424.0)2(

63212.0)1(

Y P

Y P

Y P

Example 3.10: The probability that a car has defective gearbox is 0.02. If I check the gearboxes of

140 cars what is a suitable approximation to the probability that I find

(a) 2 defectives (b) more than 5 defectives (c) fewer than 4 defectives

Let X be the number of defective gearboxes that I find. Then X has a binomial distribution with n=140

and p=0.02. Since n is large and p is small a Poisson random variable with mean 8.2 np will

give a good approximation to X tables

692.030806.01)4(1)4()(

065.0)6()5()(

238.053055.076892.0)3()2()2()(

X P X P c

X P X P b

X P X P X P a

3.5 Exercises

1. A manufacturing process produces components which are free from any faults with probability p. Find the probability that in a sample of size 50 from a large batch there are fewer than 4

faulty components when p = 0.95. Find the probability that in a sample of size 50 there are

fewer than 10 faulty when p = 0.75.

2. Use the table to give a suitable approximation to the probability that 5 X where X is binomialrandom variable with parameters p = 0.05 and n = 400.

3. A car-pooling study shows that the number of passengers, X in a car (excluding the driver) islikely to assume the values 01,2,3 and 4 with probabilities given by the table.

X 0 1 2 3 4

P(X=x) 0.7 0.1 0.1 0.05 0.05

(a) Determine the probability of at least two passengers in a car.(b) Find the cumulative distribution function of X and sketch it.(c) Calculate

(i) E(X)(ii) E(X2)(iii) V(X)

4. Suppose that in late summer, the Fremantle Surf Life Saving club makes an average of two surfrescues per day Use the Poisson probability distribution to determine the probability that

(a) More than two rescues are made on a particular day.


34/108

(b) Five surf rescues are made in a 3-day period.

3.6 Further Exercises (Probability and statistics: Walpole, Myers and Myers – 8th Edition)

1. Exercises - Page 189 - Question 5.51 – 5.70


35/108

LECTURE 4

CONTINUOUS RANDOM VARIABLES

4.1 Introduction

A random variable X is a numerically valued variable defined on the sample space, X: R

We say that X is a continuous variable if it is not discrete.

Definition: If X is a continuous random variable then there exists a non-negative function, f(x), called

the probability density function of X such that

And

b

a

dx x f b X a P

dx x f

)()(and

1)(

Note that any function, which is non-negative and integrates to one is a possible probability density

function for a random variable X. As with discrete random variables some density functions are

commonly used to model continuous random variables. It is also convenient to define the following

function.

Definition The cumulative distribution function, F(x) of a continuous random variable x is defined by

t

dx x f t X P t F )()()(

Note that for a discrete random variable the cumulative distribution function

P(X x) will be a step function with steps of height P(X = x) at the points at which X is defined. Thecontinuous version can be thought of as a limiting case when all values of x in an interval are possible.

Note that the cumulative distribution function is always non-decreasing and

x

x F 0)(lim

x

x F 1)(lim

We define the mean of a continuous random variable as follows.

Definition If X is a continuous random variable with probability density function f(x) then the mean

or expected value of X, E[X] or p is defined by

dx x xf X E )(][

We define the expectation of a function of X in a similar way

Definition If X is a continuous random variable with probability density function f(x) then the

expected value of g(x) is defined by


36/108

dx x f x g X g E )()()]([

Similarly the variance is defined by

Definition If X is a continuous random variable with probability density function f(x) then the variance

of X, Var [X] is defined by

22

2

)(

)()(][

dx x f x

dx x f x xVar

We can also define the median of a continuous random variable.

Definition if X is a continuous random variable with probability density function f(x) then the median

of x is the value m satisfying the equation

m

m

dx x f dx x f 2

1)()(

It is the value such that X is equally likely to be more than the median as less than it.

Example 4.1: A random variable X has probability density function.

otherwise0

10)1()(

2 xif xcx x f

1. Determine c.

2. Find E[X].

3. Find Var[X].

4. Show that the median m satisfies the equation

0186 34 mm Solution:

1. We know that

1)( dx x f

so


37/108

12

1

1

043

1

0

32

1)(

1)(

43

c

c

dx x xc

x x

and hence c = 12.

2.

5

3

)54

(12

).(12)(

1

0

54

1

0

43

x x

dx x x X E

3.

3

2

30

12

)65

(12

)(12][

1

0

65

1

0

542

x x

dx x x X E

Thus25

1

5

3

5

2][

2

X Var

4.


38/108

0186

5.034

5.0)43

(12

5.0)43

(12

5.0)(12

34

43

430

43

0

32

mm

mm

mm

x x

dx x x

m

4.2 Exponential Distribution

The exponential distribution can be used to model the lifetimes of components. It is also linked to the

Poisson distribution. If X has a Poisson distribution then the time between occurrences of X follows

an exponential distribution.

The probability density function for an exponential distribution is

otherwise0

0,)exp()(

xif x x f

We shall check first that this is a valid p d f. Clearly 0)( x f . Also

1]exp[]exp[

00

xdx x

To find the mean we use integration by parts

1

exp[1

)exp(]]exp[[

)exp(][

0

0

0

0

x

dx x x x

dx x x X E


39/108

To find the variance we first find E[X2]. This is also done by integration by parts

2

00

2

0

22

2

12

)(1

)exp(2]]exp[[

)exp(][

X E

dx x x x x

dx x x X E

Therefore

222

112][

X Var

The cumulative distribution function is given by

0)(00

)()(

0

xif dt t f

xif x X P x F x

Now

]exp[1

]exp[[

)exp()(

0

0 0

x

t

dt t dt t f

The median m is given by F(m) = ½. Therefore substituting into the cdf

.2ln

2ln

2/1ln

2/1]exp[

2/1].exp[1

1

m

m

m

m

m

4.3 Exercises

1. The random variable X has probability density function 3)2()( xc x f for 0


40/108

2. Assume that the continuous random variable x has the probability density function

otherwise0

2/30)49()(

2 x for xk x f

(a) Calculate the value of k .

(b) Find the mean and variance of x.(c) Find the cumulative distribution function of x.

(d) Find the median of x.

(e) Find P(1/2x< 1).

3. The time (in hours) between successive calls has an exponential distribution with parameter 1/6 . What is the probability of waiting more than 15 minutes between any two successivecalls?

4. Identify and name the continuous random variables from the following list of variables: X : the number of automobile accidents per year in Virginia.

Y : the length of time to play 18 holes of golf.

M : the amount of milk produced yearly by a particular cow.

N : the number of eggs laid each month by a hen.

P : the number of building permits issued each in a certain city.

Q: the weight of grain produced per acre.


1. Exercises - Page 112- Question 3.7, 3.9, 3.12, 3.21


41/108

LECTURE 5

NORMAL DISTRIBUTION

5.1 Introduction

The normal, or Gaussian, distribution is the most commonly used distribution in statistics. A normally

distributed random variable with mean and variance 2 has its probability function given by

xfor ]/2σμ)(xexp[2πσ

1φ(x) 22

It is denoted by )σ N(μ(~X 2 .

If X is normally distributed with mean 0 and variance 1, then we write N(0,1)~X . Its probability

density function is usually written as (x) and is given by

xfor )2xexp(2π

1φ(x) 2

The cumulative distribution function is denoted by (x).

We can calculate probabilities for a normal distribution from the standard normal using

N(0,1)~σ

μX

Statistical Table (4) gives the probability that a standard normal random variable, i.e. with mean zero

and variance 1, is larger than specified value. i.e. 1-(x).. In using the tables we utilise the symmetryof the normal distribution, and the fact that 0.50)P(Z0)P(Z

Example 5.1: Calculate the probabilities of the following events.

(i) Z < -2.45,(ii) (Z < - 2.1) ( Z > 2.1)(iii) 0 < Z < 1.2

Solution:

(i) By symmetry P ( Z < -2.45) = P (Z > 2.45) = 0.00714

(ii) By symmetry P [( Z < -2.1) P ( Z > 2.1) ] = 2 P ( Z > 2.1) = 2 x 0.01786 = 0.03572(iii)

P [ Z > 1.2] = 0.11507P [Z < 1.2 ] = 1 – 0.11507

= 0.88493

P[0 < Z < 1.2] = 0.88493 – 0.5

= 0.38493

Example 5.2: It is known that in a certain district the heights of adult males are normally distributed

with mean 175cm and standard deviation 7cm. Find the probability that a man selected at random from

this district will be

(a) over 182cm tall.

(b) between 170cm and 181cm tall.

(c) under 179cm tall.

Let X be the height of the selected man.


42/108

Then ) N(175,7~X 2 Z = (X-175)/7 ~ N (0,1)

(a) P( X > 182) = P ( Z > (182 – 175)/7) = P (Z > 1) = 0.159

(b) P( 170 < X < 181) = P ( -5/7 < Z < 6/7)

= P (Z >-5/7) – P (Z > 6/7)

= 0.7625 – 0.1968 = 0.566

(c) P (X< 179) = P ( Z < 4/7) = 1 – P( Z > 4/7) 1 – 0.284 = 0.716

5.2 Normal as an Approximating Distribution

When n is large and p moderate we may use the normal distribution to approximate binomial

probabilities. Note that as we are approximating a discrete random variable by a continuous one, wehave to employ continuity correction.

For discrete random variable P( X < x) = P ( X x-1) We approximate these quantities by P (Y < x -

21 ) . We illustrate the technique in the following example.

Example 5.3: A fair coin is tossed 150 times. Find a suitable approximation to the

probability of each of the following events.

(a) more than 70 heads

(b) fewer than 82 heads(c) more than 72 but fewer than 79 heads.

Let X be the number of heads thrown, then X has a binomial distribution with n = 150 and p = ½ . As

n is larger and p moderate we may approximate X by Y a normal random variable with mean np = 75

and variance np(1-p) = 37.5.

a. We require P(X > 71) but this is the same as P(X 70 ) so we approximate by P (Y > 70.5).

0.7690.735)P(Z)37.575)/(70.5P(Z

b. We require P( X < 82) but this is the same as P(X 81) so we approximate

by P(Y < 81.5). P(Z < (81.5 – 75)/ 5.37 ) P(Z < 1.06) = 1- 0.145 = 0.855


43/108

(c) We require P (72 < X < 79) which is the same as P (73 X 78) and thus we approximate by(72.5 < y < 78.5).

P(-0.408 < Z < 0.571) = 0.658 – 0.284 = 0.374

We may similarly approximate a Poisson random variable by a normal one of the same mean and

variance so long as this mean is moderately large. We again have to use the continuity correction.

Example 5.4: A radioactive source emits particles at random at an average rate of 36 per hour. Find

an approximation to the probability that more than 40 particles are emitted in one hour.

Let X be the number of particles emitted in one hour. Then X has a Poisson distribution with mean 36

and variance 36. We can approximate X by Y which has a N(36, 36) distribution. We require P(X >

40). This is approximately P(Y 40.5).

0.2266

0.75)P(Z

)6

3640.5P(Z40.5)P(Y

5.3 Exercises

1. The sample data consists of the values:

0.325 0.317 0.375 0.325 0.508 0.117 0.150 0.317 0.275 0.383

Do they appear to come from a Normal Distribution?

(i) What is the percentage of values within one standard deviation of the mean?(ii) What is the percentage of values within two standard deviations of the mean?Do they appear to come from a Normal Distribution? Justify your answer.

2. Construct a Normal probability plot using SPSS for the data given in (1) and explain how itcould be used for checking normality.

3. 94 95 30 98 76 73 95 97 86 91 85 70 96

70 91 72 97 97 84 28 19 90 77 58 58 47

48 28 20 65

(a) Plot these data.(b) Find the mean and the standard deviation for this data.(c) Let X be a Gaussian (Normal) random variable with mean and standard deviation you

calculated in part (b). Find the following probabilities.


44/108

(i) P(X < 30)(ii) P(X > 90)(iii) P(50 < X < 80)

(d) Find the proportion of data values that are

(i) less than 30(j) greater than 90(k) from 50 to 80

(e) Can the distribution of these values approximated by a Normal distribution?

4. The lengths of a batch of bolts are assumed normally distributed with mean 4cm and standard

deviation 0.1cm. What is the probability that a bolt selected at random will be more than

4.1655cm in length? (Give answer to 5 dp)

5. A coin is to be tossed 100 times.

(a) Assuming the coin is biased with P(head) =0.6, use a normal approximation to estimate

the probability that between 56 and 63 heads occur.

(b) Assume P(head)=0.99. Use a suitable approximation to estimate the probability that

exactly 99 heads occur. (Do not calculate the exact binomial probability).




45/108

LECTURE 6

RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS

6.1 Introduction

Definition: The sampling distribution of a random variable is the collection or distribution of all

possible values of the random variable over all possible samples. If the sample is a random sample of

size n from an infinite population then, x1,x2,…xn are independent random variables each with the

same distribution (i.e. same p.d.f or probability function) as the population so that

E(xi) = Var (xi) = 2

Theorem 1 Averaging over all random samples of size n from an arbitrary population with mean and variance

2 , the sample mean x and sample variance 2

s have the following three properties:

E ( x) = i.e is an unbiased estimator of

Var ( x) = 2/ni.e. the variability of as an estimator decreases with n.

22 σ]E[s i.e. s2 is an unbiased estimate of 2 .

Thus s2/n is used as an unbiased estimate of the variability or variance of x as an estimator of μ .

Example 6.1: An infinite population is described by an asymmetrical discrete distribution with just

two values: -3 with probability 0.3 and +1 with probability 0.7. Thus we have

0.20.7)(10.3)3(E[X]μ

36.3)2.0(7.0)(10.33)(μ]E[XVar[X]σ 222222

These are the values of the (usually unknown) population parameters. Let us now look at all samples

of size of 3. There are infinitely many, but we can tabulate them as follows:

Sample observations x s2 P(sample)

-3, -3, -3 -3 0 (0.3)3 = 0.027-3, -3,1 -5/3 32/6 3(0.3)2 x 0.7 = 0.189

-3,1,1 -1/3 32/6 3(0.7)2 x 0.3 = 0.441

1,1,1 1 0 (0.7)3 = 0.343

Thus we see that 2 as an estimate of p is 2.8 below the ’true ’value in 2.7% of

samples, 1.2 above in 34.3% of samples etc. Taking the average over all samples or equivalently the

expectation over the sampling distribution, we see that

)xE( = - 3x0.027+3

5x 0.189 +

3

1 x 0.441 + 1 x 0.343 = - 0.2

exactly, confirming the first result of Theorem 1.

Then


46/108

22])X(E[]XE[]XV[

= 2222

2 )2.0(343.0)1(441.03

1189.0

3

5027.0).3(

x

= 2

12.1 which confirms the second result. The third result is verified for this example by averaging over all possible values of s2 thus:

22 σ3.360.34300.4416

320.189

6

320.0270)E(s

Proof of Theorem 1

μ

μ)(nn

1

μ)(μn

1

])E[x](E[xn

1.]xE[

)x(x

n

1x

n1

n1

n

σ.

n

nσ

)σ(σn

1

])Var[x](Var[xn

1

]x[xVar n

1]x[Var

2

2

2

22

2

n12

n12

Note: x1, …,xn are independent (random sample)

The theorem shows thatn

is the standard deviation of the sampling distribution of x. A sample

estimate of this variability isn

s and is called the (estimated) standard error of the (sample) mean.

Theorem 2 Central Limit Theorem says that as n the sampling distribution of x tends to a Normal distribution with the same mean and variance.

The importance of this result is that we do not need to know the form or type of the original populationdistribution if our sample size is sufficiently large. We can use instead the Normal distribution for


47/108

statistical inference with the knowledge that the probabilities we calculate will be good approximations

to the true (but generally unknown) probabilities.

n N X

2

,~ approximately for large n.

Then using the properties of the Normal distribution we can say that for large n,

u

n

X P

can be found approximately for any specified value u without knowing the original form of the

population.

If, however, we do know the form of the population and it follows a Normal distribution, then for any

sample size n > 1 it can be shown that

n

σμ, N~X

2

Thus 1); N(0~nσ

μXZ

has a sampling distribution which is Standard Normal for any n (Table 4).

As the population standard deviation is often unknown, replacing it by the corresponding samplequantity s changes the sampling distribution.

However, provided the underlying population is Normal, it can be shown that

ns

μXT

has a ‘Student’s t-distribution’ with v ‘degrees of freedom’, where 1 nv (named after W. S.Gossett, who took the pseudonym ‘Student’). The per centiles of this distribution are given in Table 7.

Another distribution, which arises from random samples of Normal popu lations, is the ‘chi-square’

distribution, whose percentage points are given in Table 8. It can be shown that

2

1n2

2

χ ~σ

1)s(nV

the chi-square distribution with 1n degrees of freedom, whatever the value of X . Yet anotherdistribution is the (Fisher) F-distribution with percentage points in Table 9. The F and 2 distributions

are used for statistical inference on the variances of Normal populations as well as for wider application

in Goodness-of-Fit tests.

Note:

Using SPSS it is possible to check these distributional results empirically by generating a sufficient


48/108

number of random samples from a Normal population.

6.2 Exercises

1. The heights of 1000 students are approximately normally distributed with a mean of 174.5 cm anda standard deviation of 6.9 cm. If 200 random samples of size 25 are drawn from this population

and the means recorded, determine

(a) The expected mean and standard deviation of the sampling distribution of the mean.(b) The number of sample means that fall between 172.5 and 175.8 cm inclusive.(c) The number of sample means that falling below 172 cm.

2. Show that the sample variance is unchanged if a constant is added to or subtracted from each value

of the sample.

3. If the size of a sample is 36 and the standard error of the mean is 2, what must the size of thesample become if the standard error is to be reduced to 1.2?

4. The amount of time that a drive-through bank teller depends on a customer is a random variablewith a mean 3.2 minutes and a standard deviation 1.6 minutes. If a random sample of 64customers is observed, find the probability that their mean time at the teller’s counter is

a) At most 2.7 minutes; b) More than 3.5 minutes;

c) At least 3.2 minutes but less than 3.4 minutes.

5. If all possible samples of size 16 are drawn from a normal population with mean equal to 50 and

standard deviation equal to 5, what is the probability that a sample mean will fall in the intervalfrom 1.9 , 0.4? Assume that the sample means can be measured to any degree ofaccuracy.




49/108


50/108

LECTURE 7

TEST OF HYPOTHESIS

7.1 Introduction to Hypothesis Testing

Definition 1 The null hypothesis H 0 is a statement about the value of the parameter of interest. A simple

null hypothesis specifies the population distribution exactly. We examine the data to see whether they

support ‘or provide evidence against the null hypothesis H0.

The alternative hypothesis H1 describes only the possibilities (there may be many) that we are prepared

to consider if H0 is not true.

Definition 2 The test statistic for H0 versus H1 is a random variable with known (or approximately

known) distribution-assuming H0 to be true ‘under H0’. The observed value of the test statistic can

indicate departures from H0 in favour of H1.

Definition 3 The P-value gives the probability of, under H 0 , observing a value of the test statistic at

least as extreme as the value actually observed, where extremities indicate departures from H0 in favour

of H1. If the P-value is as small or smaller than , we say the test is statistically significant.

7.2 Procedure

Null and Alternative Hypotheses A clear statement of both should be given in terms of the population

parameter of interest, together with a short verbal interpretation.

Test Statistics: The formula in terms of sample statistics such as mean and standard deviation should

be stated with the (sampling) distribution under the null hypothesis. Then the observed value of the

test statistic should be calculated to at least three significant figures.

Assess evidence: The P-value should be used to form a verbal statement or conclusion regarding the

truth or otherwise of the null hypothesis. Finally a verbal interpretation of this conclusion should be

given for the non-statistician.

Depending on the conclusion reached (if any) the investigator may wish to quote a confidence interval

for the parameter at the desired level.

Example 7.1: Articles produced by a manufacturer should have mean length 4 cm. and standard

deviation 0.02cm. A test sample of size 10 from a large batch of production has x = 4.01. Is there

evidence that the unknown mean length μ , say, of articles in the batch is unsatisfactory?


51/108

The Null hypothesis is H0 : = 4 (batch satisfactory) to be tested against the alternative

H1 : μ 4 (batch unsatisfactory).

We need a test statistic whose distribution is known under the null hypothesis i.e. assuming H 0 to be

true. We know that in general for random samples from a Normal population

)n

σ., N(μ~X

2

so, under H 0

)10

(0.02), N(4~X

2

N(0,1)~100.02/

4XZ

is standard Normal . Large values of Z (either positive or negative) indicate departures from H 0 in

favour of H1 and the observed value of Z is

58.110/02.0

401.4

Z

So, the probability of observing a value of Z at least as extreme as this (the P-value) is

P (Z> 1.58) + P( Z < -1.58) = 2 x P (Z> 1.58) = 0.1141,

using the symmetry of the Normal distribution.

Thus there is a 11.4% chance of observing this sample result or worse even if the batch is satisfactory.

We therefore conclude that there is no evidence against the null hypothesis.

Note that this was a ‘two-sided’ or ‘two-tailed’ test as the alternative hypothesis is ‘two sided’, namely

4 . If there was a legal requirement of a maximum mean length of cm, then we would not be

concerned with the possibility that 4. We would ask

whether there was sufficient evidence in the data to make us worry about failing the requirement, and

the test statistic and observed value would be the same as before. Only large positive and not negative

values of Z would indicate departures in favour of H1 so the P-value is just P(Z >1.58) = 0.057. Now

we have slight evidence against H0 in favour of.H1 i.e. slight evidence that the batch may fail to meet

the legal requirement. This is called a ‘one-tailed’ or ‘one-sided’ test as the alternative hypothesis is

“one-sided’, namely > 4.

However, the assumption that the population variance 2 is known is often unrealistic:

Example 7.2: A random sample of 5 men had a mean height x of 70 inches and a sample standard

deviation s of 2 inches. Is there any evidence in these data against the (null) hypothesis that the mean

of the population is 67 inches? To test H0 : = 67 versus H1 : 67 we need a test statistic whosedistribution is known under H0. Such a statistic is

ns

μXT

~ t0 under Ho


52/108

That is, Student’s ‘t’ with 4 degrees of freedom. The observed value of T is 3.35 so to calculate P we

must refer this value to percentage points of the t-distribution with 4 degrees of freedom (d.o.f.) Now

t4(0.025) = 2.776 lie on either side of our observed value.

Alternatively we can use the 2-values on the second row of Table 7 to arrive at the same answer. We

cannot therefore say exactly what the probability of obtaining a value at least as extreme as the oneobserved is, but we can specify it within a suitable range and this is sufficient to enable us to conclude

that there is moderate evidence against the null hypothesis. So even this small sample provides

evidence.

7.3 Confidence Intervals for Hypothesis Testing

Often we may be asked to estimate the population mean, , rather than testing a hypothesis about it.Or we may have performed a test and found evidence against the null hypothesis casting doubt on our

original hypothesised value. We can (and indeed must) give an estimate of uncertainty along with our

best estimate of p, which is ~, the sample mean.

Whatever the value of

95.0]96.196.1[

n

X P

,

cross multiplying we get

95.0]/96.1/96.1[ n X n P

Subtracting X gives

95.0]/96.1/96.1[ n X n X P

and finally multiplying by — 1 gives,

95.0]/96.1/96.1[ n X n X P

and this is true whatever the value of , so we can say that the random interval)/96.1,/96.1( n X n X has a probability of 0.95 of containing or covering the value of ;

that is, 95% of all samples will give intervals (calculated according to this formula) which contain the

true value of the population mean. This interval is called a 95% confidence interval for . Note that

there is no guarantee that any specific sample contains with 95% probability.

In general a 100(1 — )% confidence interval for is given by

[n

x )2/(1 ]

where )2/(1 denotes an upper percentage point of the standard Normal distribution when 2 is

known, and given by

[n

st x )2/(1 ]

when 2

is unknown.


53/108

Example 7.1 revisited: The above argument can be used for a 95% confidence interval as we are

assuming that the population variance 2 is known. Thus)10/02.096.1,10/02.096.1( X X

is a 95% confidence interval for p and substituting the observed value x = 4.01 we obtain (3.9976,

4.0224).

This interval includes 4 cm. Therefore, no evidence to say that the articles in the batch are

unsatisfactory.

Example 7.2 revisited : When 2 is unknown,

)52.776s/X,52.776s/X(

which contains 4 cm with probability 0.95 as t4(0.025) = 2.776 from Table 7, there being o

Documents

PS - Handbook.pdf