Statistics lecture 9 (chapter 8)

1

2

Data collection

Graphs

Measures • location • spread

Descriptive statistics

Statistical inference

Estimation

Hypothesis testing

Decision making

Raw data

Information

• Hypothesis testing is a procedure for

making inferences about a population

• A hypothesis test gives the opportunity

to test whether a change has occurred

or a real difference exists

• A hypothesis is a statement or claim that

something is true

– Null hypothesis H0

• No effect or no difference

• Must be declared true/false

– Alternative hypothesis H1

• True if H0 found to be false

– If H0 is false – reject H0 and accept H1

– If H0 is true – accept H0 and reject H1

H0 always contains = sign

• State the null and alternative hypotheses that would be used to test each

of the following statements:-

• A manufacturer claims that the average life of a transistor is at least 1000

hours (h)

H0: µ = 1000

H1: µ > 1000

• A pharmaceutical firm maintains that the average time for a certain drug

to take effect is 15 mins

H0: µ = 15

H1: µ ≠ 15

• The mean starting salary of graduates is higher than R50000 per annum

H0: µ = 50000

H1: µ > 50000

6

• Hypothesis testing consists of 6 steps:

1. State the hypothesis

2. State the value of α

3. Calculate the test statistic

4. Determine the critical value

5. Make a decision

6. Draw conclusion

7

Hypothesis Testing Step 1

State the hypothesis

Ho – Null hypothesis

H1 – Alternative Hypothesis

One/Two populations

Two tailed

H0: parameter = ?

H1: parameter ≠ ?

Right tailed

H0: parameter = ?

H1: parameter > ?

Left tailed

H0: parameter = ?

H1: parameter < ?

8


State the value of α Decision Actual situation

H0 is true H0 is false

Do not

reject H0

Correct

decision

Reject H0 Correct

decision

9

α - Probability of Type I error

Level of significance

1%, 5%, 10%

Determine critical value/s


State the value of α Decision Actual situation

H0 is true H0 is false

Do not

reject H0

Correct

decision

Type II error =

Reject H0 Type I error

=

Correct

decision

Two types of errors:

10

LEVEL OF SIGNIFICANCE

• Probability (α) of committing a type I error is called the LEVEL OF SIGNIFICANCE

• α is specified before the test is performed

• You can control the Type I error by deciding, before the test is performed, what risk level you are willing to take in rejecting H0 when it is in fact true

• Researchers usually select α levels of 0.05 or smaller

11

There are different test statistics for testing:

• Single population – Mean, proportion, variance

• Difference between two population – Means, proportions, variances


Calculate value of test statistic

REJECTION AND NON-REJECTION REGIONS

• To decide whether H0 will be rejected or not , a

value, called the TEST STATISTIC has to be

calculated by using certain sample results

• Distribution of test statistic often follows a normal

or t distribution.

• Distribution can be divided into 2 regions:-

– A region of rejection

– A region of non- rejection 12

13

Left tailed Two tailed Right tailed

H0

α

H0

α

H0

α/2 α/2

- Critical value – from tables


Determine the critical value/values

H1: parameter > ? H1: parameter ≠ ? H1: parameter < ?

14

Right tailed Two tailed Left tailed

H0

Accept Rej

H0 H0

α

H0

Rej Accept

H0 H0

α

Rej Accept Rej

H0 H0 H0

H0

α/2 α/2


Make decision

H1: parameter > ? H1: parameter ≠ ? H1: parameter < ?

15

H0

Test statistic

? ? ?

Accept H0

?

Reject H0


Draw conclusion

STEPS OF A HYPOTHESIS TEST

Step 1 • State the null and alternative hypotheses

Step 2 • State the values of α

Step 3 • Calculate the value of the test statistic

Step 4 • Determine the critical value

Step 5 • Make a decision using decision rule or graph

Step 6 • Draw a conclusion

16

Concept Questions

• 1 – 12 page 261 textbook

17

18

Hypothesis test for Population Mean, n ≥ 30

- population need not be normally distributed

- sample will be approximately normal

Testing H0: μ = μ0 for n ≥ 30

Alternative

hypothesis

Decision rule:

Reject H0 if Test statistic

H1: μ ≠ μ0 |z| ≥ Z1- α/2

H1: μ > μ0 z ≥ Z1- α

H1: μ < μ0 z ≤ -Z1- α Use σ if known

0xz

s

n

• Example

– It will be cost effective to employee an additional staff

member at a well known take away restaurant if the

average sales for a day is more than R11 000 per day.

– A sample of 60 days were selected and the average sales

for the 60 days were R11 841 with a standard deviation of

R1 630.

– Test if it will be cost effective to employ an additional staff

member.

– Assume a normal distributed population. Use α = 0,05.

• Solution

– The population of interest is the daily sales

– We want to show that the average sales is more than

R11 000 per day.

– H1 : μ > 11 000

– The null hypothesis must specify a single value of the

parameter

– H0 : μ = 11 000

– Need to test if R11 841 is significant more than

R11 000(μ)

( )x

0 z1-α

• Solution

– H0 : μ = 11 000

– H1 : μ > 11 000

– α = 0,05

–

– Reject H0

α = 0,05 0 11841 11000

3,991630

60

xz

s

n

1,65

Accept H0 Reject H0

At α = 0,05 if it will be cost effective to

employ an additional staff member –

the average monthly income is more

than R11 000

Critical value Z 1-α = Z 0.95 = 1.65

22

• Hypothesis test for Population Mean, n < 30

– If σ is unknown we use s to estimate σ

– We need to replace the normal distribution with the

t-distribution with (n - 1) degrees of freedom

Testing H0: μ = μ0 for n < 30

Alternative

hypothesis

Decision rule:


H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2

H1: μ > μ0 t ≥ tn-1;1- α

H1: μ < μ0 t ≤ -tn-1;1- α

0xt

s

n

Concept Questions

23

Questions 13 – 19, page 267, textbook

24

• Example

– Health care is a major issue world wide. One of the

concerns is the waiting time for patients at clinics.

– Government claims that patients will wait less than 30

minutes on average to see a doctor.

– A random sample of 25 patients revealed that their average

waiting time was 28 minutes with a standard deviation of 8

minutes.

– On a 1% level of significance can we say that the claim

from government is correct?

25

• Solution

– The population of interest is the waiting time at clinics

– Want to test the claim that the waiting time is less than 30

minutes

– H1 : μ < 30


parameter

– H0 : μ = 30

– Need to test if 28 is significant less than 30(μ) ( )x

• Solution

– H0 : μ = 30

– H1 : μ < 30

– α = 0,01

–

– Accept H0

26

-t1-α 0

α = 0,01

0 28 301,25

8

25

xt

s

n

-2,492

Reject H0 Accept H0

At α = 0,01 we can not say that

the average waiting time at

clinics is less than 30 minutes

tn-1;1-α = t24;0.99= -2.492

27

• Hypothesis testing for Population proportion

–

– Proportion always between 0 and 1

number of successesˆSample proportion = =

sample size

xp

n

Testing H0: p = p0 for n ≥ 30

Alternative

hypothesis

Decision rule:


H1: p ≠ p0 |z| ≥ Z1- α/2

H1: p > p0 z ≥ Z1- α

H1: p < p0 z ≤ -Z1- α

0

0 0

ˆ

(1 )

p pz

p p

n

28

• Example

– A market research company investigates the claim of a

supplier that 35% of potential buyers are preferring their

brand of milk

– A survey was done in several supermarkets and it was

found that 61 of the 145 shoppers indicated that they will

buy the specific brand of milk

– Assist the research company with the claim of the supplier

on a 10% level of significance.

29

• Solution

– The population of interest is the proportion of buyers

– Want to test the claim that the proportion is 35% = 0,35

– H0 : p = 0,35

– The alternative hypothesis must specify that the proportion is not 35%

– H1 : p ≠ 0,35

–

– Need to test if 0,42 is significant different from 0,35(p)

number of successes 61ˆSample proportion p = = 0,42

sample size 145

x

n

p̂

30

• Solution

– H0 : p = 0,35

– H1 : p ≠ 0,35

– α = 0,10

– Reject H0

At α = 0,10 we can not say

that 35% of the clients will

prefer the brand of milk

0

0 0

ˆ 0,42 0,351,76

(1 ) 0,35(1 0,35)

145

p pz

p p

n

/2 0,05

-z1-/2 z1-/2

Reject H0 Accept H0 Reject H0

-1,65 +1,65

Z1-α/2 = Z0.95 = +/- 1.65

31

• Hypothesis testing for Population Variance

– Draw conclusions about variability in population

– Χ2 –distribution with (n - 1) degrees of freedom

Testing H0: σ2 = σ2

0

Alternative

hypothesis

Decision rule:


H1: σ2 ≠ σ2

0

Χ2 ≤ Χ2n-1;α/2 or

Χ2 ≥ Χ2n-1;1- α/2

H1: σ2 > σ2

0 Χ2 ≥ Χ2n-1;1- α

H1: σ2 < σ2

0 Χ2 ≤ Χ2n-1;α

22

2

0

( 1)n s

Concept Questions

• Questions 20 – 24 page 272

32

33

• Example

– The variation in the content of a 340ml can of beer should

not be more than 10ml2.

– To test the validity of this, 25 cans of beers revealed a

variance of 12ml2.

– On a 5% level of significance can we say the variation in

the content of the cans is too large?

34

• Solution

– The population of interest is the variation in content

– Want to test the claim that the variance is more than 10ml2

– H1 : σ2 > 10

– The null hypothesis must specify that the variance is 10ml2

– H0 : σ2 = 10

– Need to test if 12(s2) is significant more than 10(σ2)

• Solution

– H0 : σ2 = 10

– H1 : σ2 > 10

– α = 0,05

– Accept H0

35

22

2

0

( 1) (25 1)1228,8

10

n s

At α = 0,05 we can not say

that the variation in the

content of the cans is more

than 10ml2

Χ2n – 1; 1-α

0,05

Accept H0 Reject H0

+36,42

Χ2n-1; 1-α = Χ2

24; 0.95 = 36.42

36

• Hypothesis tests for comparing two Populations

– Difference between two means

• Independent samples – Large samples

– Small samples

• Dependent samples

– Difference between two proportions

– Difference between two variances

Drawn from different

samples, samples

have no relation

Samples are

related

• Hypothesis tests for comparing two Populations

– H0: Population 1 parameter = Population 2 parameter

– H1: Population 1 parameter ≠ Population 2 parameter

– H1: Population 1 parameter > Population 2 parameter

– H1: Population 1 parameter < Population 2 parameter

μ1 / σ2

1 / p1 μ2 / σ2

2 / p2

38

Difference between two Population Means,

independent samples, n1 ≥ 30 and n2 ≥ 30

Testing H0: μ1 = μ2 for n1 ≥ 30 and n2 ≥ 30

Alternative

hypothesis

Decision rule:


H1: μ1 ≠ μ2 |z| ≥ Z1- α/2

H1: μ1 > μ2 z ≥ Z1- α

H1: μ1 < μ2 z ≤ -Z1- α

Use σ12 and σ2

2 if known

1 2

2 2

1 2

1 2

x xz

s s

n n

Example

A leading television manufacturer purchases

cathode ray tubes from 2 businesses (A and B). A

random sample of 36 cathode ray tubes from

business A showed a mean lifetime of 7.2 years

and a std dev of 0.8 years, while a random sample

of 40 cathode ray tubes from business B showed a

mean lifetime of 6.7 years and a std dev of 0.7 yrs.

Test at a 1% significance level whether the mean

lifetime of the cathode ray tubes from business A is

longer than the mean lifetime of business B

39

Answer

40

Company A: n1 = 36,

x 1 = 7,2 and s1 = 0,8.

Company B: n2 = 40,

x 2 = 6,7 and s2 = 0,7.

H0: 1 = 2

H1: 1 > 2

= 0,01

z =

x 1 x 2

s12

n1

s22

n2

=

7,2 6,7

0,8 2

360,7

2

40

=

0,5

0,1733

= 2,8852

Z1 = Z0,99 = 2,33

Therefore, reject H0.

There is enough evidence to say that the mean lifetime of the tubes from Company A is longer

than that of Company B.

41

Difference between two Population Means, independent

samples, n1 < 30 and n2 < 30

Testing H0: μ1 = μ2 for n1 < 30 and n2 < 30

Alternative

hypothesis

Decision rule:


H1: μ1 ≠ μ2 |t| ≥ tn1 + n2 – 2 ; 1- α/2

H1: μ1 > μ2 t ≥ tn1 + n2 – 2 ; 1- α

H1: μ1 < μ2 t ≤ -tn1 + n2 – 2 ;1- α

1 2

1 2

2 2

1 1 2 2

1 2

with1 2

1 1

2

p

p

x xt

sn n

n s n ss

n n

42

Difference between two Population Means, dependent

samples – pairs of observations

Observation

1 2 3 - - - - - - - - - - n

Sample 1

Sample 2

X11 X12 X13 - - - - - - - - - - X1n

X21 X22 X23 - - - - - - - - - - X2n

Difference (d) d1 d2 d3 dn

(X11 - X21) (X12 - X22) (X13 – X23) (X1n - X2n)

212

1 and

1

n

d

d dd d s

n n

43

Difference between two Population Means, dependent

samples

Testing H0: μ1 = μ2

Alternative

hypothesis

Decision rule:


H1: μ1 ≠ μ2 |t| ≥ tn – 1 ; 1- α/2

H1: μ1 > μ2 t ≥ tn – 1 ; 1- α

H1: μ1 < μ2 t ≤ -tn – 1 ;1- α

d

dt

s

n

Concept Questions

44

Questions 25 – 29 p 283, textbook

45

Difference between two Population Proportions, large

independent samples

Testing H0: p1 = p2 for n1 ≥ 30 and n2 ≥ 30

Alternative

hypothesis

Decision rule:


H1: p1 ≠ p2 |z| ≥ Z1- α/2

H1: p1 > p2 z ≥ Z1- α

H1: p1 < p2 z ≤ -Z1- α

1 2

1 2

1 1 2 2

1 2

ˆ ˆ

1 1ˆ ˆ(1 )

ˆ ˆˆwhere

p pz

p pn n

n p n pp

n n

46

Difference between two Population Variances, large

independent samples

Testing H0: σ2

1 = σ22

Alternative

hypothesis

Decision rule:


H1: σ2

1 ≠ σ22 F ≥ Fn1-1 ; n2-1 ; α/2

H1: σ2

1 > σ22 F ≥ Fn1-1 ; n2-1 ; α

Assume population 1 has the larger

variance. Thus always: s21 > s2

2

2

1

2

2

sF

s

47

• Example • There is a belief that people staying in Cape Town

travel less than people staying in Johannesburg.

– Random samples of 43 people in Cape Town and 39

in Johannesburg were drawn.

– For each person the distance travelled during

October were recorded.

– Test the belief on a 5% level of significance

48

• Solution

– The population of interest is the km travelled

– Samples are large and independent

– We want to show that Cape Town travel less than

Johannesburg

– H1 : μ1 < μ2


parameter

– H0 : μ1 = μ2

49

• From the data: –

– H0: μ1 = μ2

– H1: μ1 < μ2

–

– Accept H0

1 2

1 2

1 2

Cape Town Johanesburg

= 604 x = 633

n = 43 n = 39

s = 64 s = 103

x

-1,65 0

Reject H0 Accept H0

1 2

2 2 2 2

1 2

1 2

604 6331,51

64 103

43 39

x xz

s s

n n

The belief that people staying in

Cape Town travel less than people

staying in Johannesburg is not true

on a 5% level of significance

50

Example

• Pathological laboratories have a problem with time it takes a blood sample to be analyzed. They hope by introducing some new equipment, the time taken will be reduced.

– Blood samples for 10 different types of test were analyzed by the traditional laboratories and by the newly equipped laboratories.

– The time, in minutes, were captured for each test.

– Did the time reduce? α = 0,01

51

• Solution

– The population of interest is the test times

– The samples are dependent

– We want to show that new times is less than the old times

– H1 : μ1 > μ2


parameter

– H0 : μ1 = μ2

52

- Calculate the

difference for each xi

- Calculate the average

differences and the

standard deviation of

the differences

2,2

19,14

d

d

x

s

Blood

sample

Existing

lab

New

lab Difference

1 47 70 -23

2 65 83 -18

3 59 78 -19

4 61 46 15

5 75 74 1

6 65 56 9

7 73 74 -1

8 85 52 33

9 97 99 -2

10 84 57 27

53

- The hypotheses test for this

problem is

H0: 1 = 2

H1: 1 > 2

2,2

19,14 10

0,36

d

dt

s

n

- The statistic is 0 t0.90,9

Accept H0 Reject H0

1,383

α =0,10

Using α = 0,10, introducing some new equipment the time taken did not reduce.

54

• Example

– A clothing manufacturer introduced two new swim suit ranges

on the market.

– Of the 266 clients asked if they will wear range A, 85 indicated

they will.

– Of the 192 clients asked if they will wear range B, 50 indicated

they will.

– Can we say there is a difference in the preferences of the two

ranges. Use α = 0,05.

55

• Solution

– The population of interest is the proportion of clients who

will wear the clothing

– We want to determine if the proportion of range A differ form

the proportion of range B

– H1: p1 ≠ p2


parameter

– H0: p1 = p2

56

• From the data: –

–

– H0: p1 = p2

– H1: p1 ≠ p2 –

– Accept H0

85 50

1 2266 192ˆ ˆRange A : 0,32 Range B: 0,26p p

There is no difference in the preferences

of the two ranges if α = 0,05.

1 2

1 1

266 192

1 2

ˆ ˆ 0,32 0,261,93

1 1 0,29(1 0,29)ˆ ˆ(1 )

p pz

p pn n

266(0,32) 192(0,26)ˆ 0,29

266 192p

Reject H0 Accept H0 Reject H0

-1,96 +1,96

57

• Example

– An important measure to determine service delivery in the banking sector is the variability in the service times.

– An experiment was conducted to compare the service times of two bank tellers.

– The results from the experiment: • Teller A: nA = 18 and s2

A = 4,03

• Teller B: nB = 26 and s2B = 9,49

– Can we say that the variance in service time of teller A is less than that variance of teller B on a 5% level of significance.

– We will then test if the variance in service time of teller B is more that the variation of teller A: s2

B > s2A

Remember:

Population 1 has the larger variance.

Thus always: s21 > s2

2

58

• Solution

– The population of interest is the variation in the service time

of the two bank tellers.

– We want to determine if the variation of teller B is more than

the variation of teller A.

– H1: S2

1 > S22


parameter

– H0: S2

1 = S22

• From the data:

– The results from the experiment:

• Teller A: nA = 18 and s2A = 4,03

• Teller B: nB = 26 and s2B = 9,49

– H0: S2

1 = S22

– H1: S2

1 > S22

– Reject H0

Accept H0 Reject H0

F26-1;18-1;0,05 = 2,18

59

Variation of teller B is more

than the variation of teller A

on a 5% level of

significance.

2

1

2

2

9,492,35

4,03

SF

S

Concept Questions

• Questions 30 -31, p 289, textbook

60

HOMEWORK

• Self review test, p289

• Izmivo

• Revision test

• Supplementary Questions p292 – 298

61

Education

Statistics lecture 9 (chapter 8)