44
Sheldon and Juliet Wednesday 2-3pm Red Centre M010 QMB PASS Week 3 2010 1. INTRO Q: Can you explain the difference between the following pairs of terms? a) descriptive statistics vs. inferential statistics? b) population vs. sample? c) parameter vs. statistic? d) discrete vs. continuous random variable? 2. DESCRIPTIVE TECHNIQUES a) Tabular A variable is some characteristic of a population or sample. o Values are possible observations of the variable. o Data is the word we use for the actual observed values of a variable. We can classify data into 3 different categories: o i) interval/quantitative/numerical o ii) nominal/qualitative/categorical o iii) ordinal For nominal data, we can draw up a table to describe the data with a column each for: o a) class (a collection of data which are mutually exclusive) o b) frequency (grouping that data into classes) o c) relative frequency (representing the number of data in a class as a percentage of the total data) b) Graphical If we were to describe nominal data, then we would use either a bar chart (for frequencies) or a pie chart (for relative frequencies) But where we have interval data, we can consider using a histogram steps being: o 1. Collect interval data. o 2. Create classes / class limits for the data, for which you could consider using SturgesFormula or the Class Width formula. o 3. Plot the data on a graph with frequency on the Y axis. It is important to be able to describe shapes of histograms ~ a skill often tested in tutorial and final exams. Lets look at that now. Describing the shapes of histograms a) Symmetry o Your data may be symmetrical or non-symmetrical. Use common sense. Topics to be covered: 1. Introduction to Statistics 2. Descriptive techniques Tabular, Graphical, Numerical 3. Introduction to Linear Regression

PASS Notes

Embed Size (px)

DESCRIPTION

Pass notes

Citation preview

Page 1: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

QMB PASS Week 3 2010

1. INTRO Q: Can you explain the difference between the following pairs of terms?

a) descriptive statistics vs. inferential statistics? b) population vs. sample? c) parameter vs. statistic? d) discrete vs. continuous random variable?

2. DESCRIPTIVE TECHNIQUES a) Tabular A variable is some characteristic of a population or sample.

o Values are possible observations of the variable. o Data is the word we use for the actual observed values of a variable.

We can classify data into 3 different categories: o i) interval/quantitative/numerical o ii) nominal/qualitative/categorical o iii) ordinal

For nominal data, we can draw up a table to describe the data with a column each for: o a) class (a collection of data which are mutually exclusive) o b) frequency (grouping that data into classes) o c) relative frequency (representing the number of data in a class as a percentage of

the total data) b) Graphical If we were to describe nominal data, then we would use either a bar chart (for

frequencies) or a pie chart (for relative frequencies) But where we have interval data, we can consider using a histogram – steps being:

o 1. Collect interval data. o 2. Create classes / class limits for the data, for which you could consider using

Sturges’ Formula or the Class Width formula. o 3. Plot the data on a graph with frequency on the Y axis.

It is important to be able to describe shapes of histograms ~ a skill often tested in tutorial and final exams. Let’s look at that now.

Describing the shapes of histograms

a) Symmetry o Your data may be symmetrical or non-symmetrical. Use common sense.

Topics to be covered: 1. Introduction to Statistics 2. Descriptive techniques – Tabular, Graphical, Numerical 3. Introduction to Linear Regression

As shared on...

Page 2: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

b) Skewness

o Positive skew o Tail to the right o Mode < Median < Mean

o Negative skew o Tail to the left o Mean < Median < Mode

c) Modal classes o The modal class is the class with the largest number of observations. o The 3 descriptions which could come in handy in describing your histogram are:

i) unimodal histogram = a histogram with one peak. ii) bimodal histogram = a histogram with two peaks. iii) bell-shaped histogram = a symmetric unimodal histogram.

Apart from the histogram, you should also be familiar with stem and leaf displays and

ogives. c) A separate note on bivariate relations Bivariate relations are an extension of univariate analyses to characterise relationships

between variables. You could represent them graphically using a scatter plot which could be a time series plot in particular, or in a table with a contingency table.

d) Numerical We can measure interval data in terms of central location:

o i) the mean / arithmetic mean = the average of the scores. o ii) the median is the middle term after they have been ordered. o iii) the mode is the observation that occurs with the greatest frequency.

Question time... 1. Oh no! How do we find the median if we have an even number of observations? 2. What are the advantages and disadvantages of each method of measuring central

location? We can also measure interval data with respect to variability – the spread of the data:

o i) range = largest – smallest observation

o ii) variance:

population variance:

sample variance:

o iii) standard deviation – simply take the square root of the variance. Also recall the empirical rule and Chebysheff’s Theorem when required to

interpret the standard deviation of your data.

o iv) coefficient of variation = standard deviation / mean

As shared on...

Page 3: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

We can also measure with respect to relative standing.

o i) percentiles – the Pth percentile is the value for which: P% < (that value), and

(100-P%) > (that value)

location of a percentile, Lp: Lp =

100

p x (n + 1)

o ii) in extension to percentile theory, we can measure the interquartile range

interquartile range = 75th – 25th percentile = upper – lower quartile

this measures the spread of the middle 50% of observations

o At this point, make sure you’ve checked out what box plots and outliers are. 3. INTRODUCTION TO LINEAR REGRESSION Recall from your lecture or earlier from this PASS class about bivariate relations ~ in

particular, think scatter plots and the types of relationships between the variables we spot when we look at the plot. If we want to determine the intercept and slope of a relationship between X & Y axis variables, we need values that will give us the line of best fit.

To do so, we need to minimise the residual sum of squares by utilising the least

squares method. Much more on this and linear regression generally in the second half of the course, but

for the moment, take precautionary notice with these equations and formulas:

1. assume the regression equation:

2. then:

and

As shared on...

Page 4: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

QUESTION BANK 1. Using these numbers: 2 3 3 6 8 9 14 16 17 20, find the:

a. mean b. median (8.5) c. mode d. lower quartile (3) e. upper quartile (16.25) f. interquartile range

2. You’re an investment banker and work 22.5 hours a day. Your monthly pay in recent months has looked like this cuz you’re a money machine: $23,000 $36,500 $47,200 $20,200 $61,300

a. What’s the sample mean? ($37,640) b. How about sample variance? ($292,743,000) c. And sample standard deviation? ($17,109.73)

3. A set of test scores has a mean of 890 and standard deviation of 120. What’s the coefficient of variation?

4. Check out these test scores: 88 76 67 90 98 68 75 86 82 90. Calculate: a. sample mean b. sample standard deviation c. coefficient of variation

As shared on...

Page 5: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

BES PASS Week 4 Aims:

To learn about Data Collection and Random Sampling

To understand Joint, Marginal and Conditional Probability

To learn the probability rules and apply it to sampling with/without replacement

1. Methods of Data Collection Remembering that “Data” are mere observed values of a variable, and that a variable is just something that is of interest of us, we will proceed to use the following methods of data to observe these variables.

Direct Observation – measures the actual behaviour or outcomes o E.g. Asking people whether they’ve bought a product because of an advertisement

Experimental Data – imposes a treatment and measures the resulting behaviour or outcomes o E.g. Asking people to try aspirin and see whether they suffer fewer heart attacks or not

Surveys

Self administered Surveys – Surveys sent to people who then mail back with their responses

Personal Interviews

Telephone Interviews

Q. What do you think are the pros and cons for each of the methods of data collection? Hint: Think of the costs, response rate, purpose and biases that may arise

2. Random Sampling

The primary incentive for examining sample rather than a population is cost. Compiling statistics is usually expensive, imagining conducting experiments on 10,000 people and asking them to take an aspirin every day for 3 weeks and coming back to test on them!

Main Concept: Our Target Population can be inferred by the Sample Population if the sample statistic can come quite close to the parameter it is designed to estimate

There are 3 different types of sampling plans:

Simple Random Sample: A sample selected in such a way that every possible sample with the same number of observations is equally likely to be chosen

o E.g. Drawing ticket stubs in a raffle to determine the winner

Stratified Random Sample: Separating the population into ‘strata’ and then drawing simple random samples from each stratum

Cluster sampling: is a simple random sample of groups or clusters of elements

From these samples of observations, two main types of error arise: 1. Sampling Error – is the difference between the sample and the population that exists only because of

the observations that happened to be selected for the sample 2. Non-sampling Error – more serious than sampling error, and are due to mistakes made in the acquisition

of data or due to sample observations being selected improperly

Q. Discuss with the person next to you, examples of non-sampling errors.

3. Probability Questions to think about...

a) Independence v Mutually Exclusive b) Joint v Marginal Probability c) Intersection v Union

As shared on...

Page 6: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

Conditional Probability is the probability of an event A, occurring given another event B, also occurring. It is represented by: which is read as “Given that B has occurred, what is the probability of A occurring?” Expanding this we get...

𝐏( 𝐀 | 𝐁 ) = 𝑷( 𝑨 𝒂𝒏𝒅 𝑩)

𝑷(𝑩)

One of the reasons we compute conditional probability is to find whether two events are related. I.e. we want to know whether they are independent events. If they are independent, the probability of one event is not affected by the occurrence of the other event

𝐏( 𝐀 | 𝐁 ) = 𝐏 (𝐀) 𝐏( 𝐁 | 𝐀 ) = 𝐏 (𝐁)

4. Other Rules

The Multiplication Rule: is used to calculate the joint probability of two events. Based on the conditional probability formula.... and then multiplying both sides by P(B) i.e.

P(A and B) = P(B).P(A|B)

For Independent events,

P(A and B) = P(A).P(B) since P(A|B) = P(A)

The Complement Rule: The complement of event A(denoted AC) is the event that occurs when event A does NOT occur i.e.

P(AC) = 1- P(A)

The Addition Rule: allows us to calculate the union of two events The probability that event A, OR event B, OR both occur is:

P(A or B) = P(A) + P(B) – P(A and B) For Mutually exclusive events,

P(A or B) = P(A) + P(B)

5. Sampling with or without replacement

If we were to finite (limited size) sample, we could:

a) Select without replacement: each time you select an observation you remove it from the pile, the

outcome of each selection will depend on the outcomes of previous selections because the size

of the population is getting smaller each time.

b) Select with replacement: each time you select an observation you re-place it back into the pile,

effectively this would mean population size stays the same and the outcomes of each selection

will be independent of one another.

P( A | B )

As shared on...

Page 7: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

GROUP EXCERCISE 1

1. P (Female) 2. P (High Dist) 3. P (Female U High Dist)

4. P ( Pass )

5. P(( Pass l Male ) 6. Which ones of the above are Marginal Probabilities? 7. Which ones are Joint Probabilities?

GROUP EXCERCISE 2

Probability Trees Probability trees are a very neat and fast way for working out many probability problems. Example: (QMB Final ’99s2): “An advertising executive is studying the television viewing habits of married men and women during prime-time hours. The executive has determined that during prime-time, husbands are watching television 60% of the time. It has also been determined that when the husband is watching television, 40% of the time the wife is also watching. When the husband is not watching television, 30% of the time the wife is watching television.”

i. Find the probability that the wife is watching television. ( 0.36 ) ii. Find the probability that, if the wife is watching television, the husband is also watching television.

(0.6667 )

Male Female Row Total

High Distinction 75 61 136

Pass 215 155 370

Column Total 290 216 506

As shared on...

Page 8: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

Useful Practice Questions

1.

2 3 3 4 4 5 5 5

5 6 6 7 7 8 9 17

a. find the mean and the mode of this data set (2 marks)

b. Find the median and the third quartile of this data set (2 marks)

2. 2. Suppose A and B are mutually exclusive events. If P(A) = 0.4 and P(B) = 0.2, then P(AlB)=?

3. 2 teams A and B are of equal ability, so each has a probability of 0.5 of defeating the other. Assume that the outcome of any game is independent of the outcome of any other game. What is the probability that team A wins 4 games in a row?

4. Approximately 30% of the sales representatives hired by a firm quit in less than 1 year. Suppose that two

sales representatives are hired and assume that the first sales representative’s behaviour is independent of the second sales representative’s behaviour.

a. What is the probability that both quit within the year? b. Find the probability that exactly one representative quits

5. A group of individuals concerned about environmental problems claims that 30% of the adults in a certain

town have been adversely affected by a new nuclear power plant that pollutes the air and causes lung damage. To test their claim, you randomly select 4 adults of the town

a. If the environmental group is correct, what is the probability that all 4 people have been adversely affected?

b. What is the probability that at least one of the 4 individuals has been adversely affected?

Answers 1. A) mean = 6, mode = 5

B) median = 5, 75% quartile = 7, observing that 50% of data points are below 5 and 75% below 7 2. 0 3. 0.0625 4. A) 0.09

B) 0.42 5. A) 0.0081 B) 0.7599

As shared on...

Page 9: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

QMB PASS Week 5 2010

1. RANDOM VARIABLES & PROBABILITY DISTRIBUTIONS Here are a few questions as a warm up.

1. What is a random variable? 2. Can you recall from Week 3 PASS the difference between a discrete and continuous random variable?

If we are happy with this, we now approach the concept of a probability distribution, which is a table, formula, or graph that describes the values of a random variable and the probability associated with those values. For discrete probability distributions, there are 2 requirements:

1. 0 ≤ P(x) ≤ 1, for all x. 2. ∑ P(x) = 1.

Let’s think about the methods/techniques we can use to describe the population/probability distribution. From memory, or using your lecture notes, fill out the following table with assistance from group members around you:

ANALYSING PROBABILITY DISTRIBUTIONS

Term Definition Formula

Population Mean aka. Expected Value of X

Population Variance

(Full)

(Shortcut)

Population Standard Deviation

We also come across a new concept of the laws of expected value & variance. These are:

a) Expected Value 1. E(C) = C 2. E(X+C) = E(X) + C 3. E(CX) = C.E(X)

Topics to be covered: 1. Random variables & Probability Distributions 2. Bivariate Distributions 3. Applications in Finance: Portfolio Diversification & Asset Allocation

As shared on...

Page 10: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

b) Variance 1. V(C) = 0 2. V(X+C) = V(X) 3. V(CX) = C2V(X)

Now, try these questions.

Q1. Sheldon has trouble sleeping at night because sometimes there is this one girl who calls him up at like 3am in the morning for no reason. It means he’s in a bad mood the next day. It happens so much he could actually create a probability distribution for it:

Number of time she calls Sheldon Probability she will call Sheldon

1 .05

2 .12

3 .20

4 .30

5 .15

6 .10

7 .08

Help Sheldon compute the mean and variance of the number of times the annoying girl calls him. (Mean = 4, variance = 2.40)

Q2. Continuing on, this girl is crazy. Every time she walks past a Louis Vuitton store, she has this burning temptation to buy a LV handbag. She used to buy, like, 2 or 3 at a time, but now that Sheldon dumped her, she’s more reluctant to buy one these days. This is the probability distribution for the number of LV handbags she buys each time she goes out:

Number of LV handbags she wants to buy

Probability she buys that number of handbags

0 .10

1 .25

2 .40

3 .20

4 .05

How many LV handbags should we expect her to buy on Thursday night? (1.85)

As shared on...

Page 11: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

2. BIVARIATE DISTRIBUTIONS Do you recall bivariate relations from Week 3 PASS? We now come across the concept of bivariate distributions which provide the probabilities of combinations of 2 variables. There are 2 measures that are important in describing the bivariate distribution.

IMPORTANT FORMULAS FOR BIVARIATE DISTRIBUTIONS

Term Formula

Covariance

(Full)

(Shortcut)

Coefficient of Correlation

Importantly, we also have laws of expected value & variance for the sum of 2 variables too:

1. E(X+Y) = E(X) + E(Y) 2. V(X+Y) = V(X) + V(Y) + 2.COV(X,Y) ...noting that if X and Y are independent, then COV(X,Y) = 0.

Group Question This question is quite long so divide parts up with your partner to get it done in time.

Sheldon and Juliet are PASS leaders by day, and drug dealers by night. Let X and Y be the weight in kilograms of drugs Sheldon and Juliet sell each night respectively. Bivariate Probability Distribution:

X

0 1 2 Total

Y 0 .12 .42 .06 .6

1 .21 .06 .03 .3

2 .07 .02 .01 .1

Total .4 .5 .1 1.00

You are given the following information to assist you:

E(X) = .7 V(X) = .41 E(Y) = .5 V(Y) = .45

a) Calculate the covariance using either the full or shortcut formula. (-.15) b) Calculate the coefficient of correlation between the kilograms of drugs sold by Sheldon and Juliet. (-.35) c) Draw an inference/conclusion regarding your findings in part b).

As shared on...

Page 12: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

3. APPLICATIONS IN FINANCE: PORTFOLIO DIVERSIFICATION & ASSET ALLOCATION Here are some questions to consider with the person sitting next to you:

1. Why should we diversify our portfolio’s investments? 2. How do we diversify? 3. Why did your lecturer include it in the lecture slides – ie. how is diversification related to statistics?

FORMULAS FOR A PORTFOLIO OF 2 STOCKS

Term Formula

Mean

E(Rp) = w1.E(R1) + w2.E(R2)

Variance

V(Rp) = w12σ12 + w22σ22 + 2w1w2ρσ1 σ2

Question

Sheldon has also joined the recent craze of investing in English football clubs. This is what his investment portfolio looks like:

Stock Manchester United (#1) Liverpool (#2)

Proportion of Portfolio .30 .70

Mean .12 .25

Standard Deviation .02 .15

For each of the following coefficients of correlation, calculate the expected value and standard deviation of the portfolio. a) ρ = .5 (.211, .1081) b) ρ = .2 (.211, .1064) c) ρ = 0 (.211, .1052)

As shared on...

Page 13: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

BES PASS Week 6 Aims:

To understand the last of the discrete probability distributions – binomial distribution

To introduce continuous probability distributions – uniform distribution

Question Recall our discussion of discrete and continuous random variables.

Discrete = countable/finite

Continuous = range of values/infinite number of values in a given interval Which of the following are discrete and which are continuous? a) The number of goals scored in 20 attempts (discrete) b) The time it takes to write an essay (continuous) c) The number of people in a bar (discrete) d) The temperature inside a room (continuous) e) The amount of energy used by a computer (continuous)

1. Binomial Distribution Let’s recall the properties of a Binomial Experiment, there’s 4, so give it a shot!

1) Fixed number of trials (n) 2) Two possible outcomes: success and failure 3) P(Success) = p and P(Failure) = (1-p) 4) Trials are independent

Examples: Flipping a coin 10 times, Drawing 5 cards out of a shuffled deck Note: In a binomial experiment, there is an assumption of ‘a sequence of Bernoulli trials’, i.e. the random variables are independently and identically distributed (iid) Binomial Random Variable The probability of ‘x’ successes in a binomial experiment with ‘n’ trials and the probability of success ‘p’ is

o X ~ Bin(n,p)

o P ( X = x ) = nCx px qn-x

N.B. Learn to use Binomial tables!!!

P(X = k) – Individual binomial probability P(X ≤ k) – Cumulative binomial probability P(X > k) – Survivor probability

Also, from Perms and Combs,

!

! !n r

nC

r n r

Which means we can also write our Binomial Function as:

(Sheldon, my word won’t type equations! I shall write this one out >_<.. I had to copy and paste all these

equations)

As shared on...

Page 14: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010 Mean and Variance of a Binomial Distribution

μ = E(X) = np

σ2 = Var(X) = np(1-p) o σ = √ np(1-p)

Exercises:

1. Sheldon knows that 15% of all the girls he goes out with want expensive presents during the first month of dating. He decides to test this theory out and goes out with 6 girls. Assume the performances of the girls are independent of one another. What’s the probability that: a) All six girls will require expensive presents during the month of dating? (0.0000) b) 1 of them will demand an expensive present during the first month of dating? (0.3993) c) At least 3 of them will require expensive presents during the first month of dating? (Hint: use cumulative binomial probabilities) (0.0473)

2. The Koch Electric Company makes electric shavers. If the probability that an electric shaver is defective is 0.01, what is the probability of the following in a shipment of 500 electric shavers that: a) None are defective? (0.0067) b) One is defective? (0.0337) c) More than three are defective? (0.735)

3. A plumber installs six hot water heaters in a housing development. The probability that any individual heater will last more than 10 years is 0.7, and their life lengths are independent. Let X denote the number of water heaters that last more than 10 years. a) Find the probability that more than 3 of the water heaters will last more than 10 years (0.7443) b) Find the mean and variance of the random variable X (4.2; 1.26)

4. A quality control manager for a manufacturer has instituted “acceptance sampling” in order to monitor the quality of incoming parts that are bought in bulk. The policy is that all incoming parts are checked by selecting at random 10 parts and then determining whether each part contains any defects or not. If 2 or more parts are found to have defects then the entire order is rejected and is returned to the supplier. What is the probability that an order from a particular supplier is rejected if that supplier is known to have 5% of parts with defects? (0.0861)

5. The probabilities that three independent members of a committee will vote in favour of electing a PASS leader as president are 0.2, 0.3 and 0.5, respectively. The probability that at most one member of the committee will elect a PASS leader is? (0.75)

As shared on...

Page 15: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

Taking our focus to continuous random variables, o P(X = x) = 0 since there are an infinite number of total values that can be obtained o P(X < k) = P(X ≤ k)

Probability Density Function

Consider a probability histogram for random variables o If we make the widths of the columns so small that they are approximately continuous, it

will form a smooth curve

The area under that curve becomes a part of our probability density function f(x) whose range is a ≤ x ≤ b:

1) f(x) ≥ 0 for all x between a and b 2) The total area under the curve between a and b is 1

2. Uniform Distribution

o Also known as a rectangular probability distribution (from its shape) o A distribution where all random variable values within the range a ≤ X ≤ b are all equally as likely to

occur

o Defined by the function:

o f(x) = 1 where a ≤ x ≤ b

b – a Taking the example from your lecture notes... Store deliveries 7-8am Let X = no. of minutes after 7:00am Some formulas for uniform distribution:

E(X) = (a + b)/2

Median = (a + b) /2

Var(X) = (b – a)2/12

P(x1≤X≤x2) = Area under graph = (x2 - x1) x 1/(b - a)

As shared on...

Page 16: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010 Exercises:

1. The time before a baby cries is a uniformly distributed random variable between 0 and 30 minutes a) Find the probability distribution function (1 /30) b) Find the probability that a baby cries within 20 minutes (0.67) c) Find the probability that a baby does not cry within 10 minutes (0.67) d) Find the probability that a baby cries between 15 minutes and 20 minutes (0.17)

2. If the random variable X is uniformly distributed between 2 and 10: a) Calculate the formula of the probability distribution function. What type of line would this be if drawn? b) Calculate P (2 ≤ X ≤ 10), that is, find the probability that X will assume a value of between 2 and 10 (P (2 ≤ X ≤ 10) equals 1, as the area under the curve must be equal to 1, that is, all the possible outcomes fall within this range) c) Find the mean and variance of X (6; 5 ⅓) d) Calculate P (2 ≤ X ≤ 8) (0.75) e) Calculate P (X = 6) (0)

3. If X, a continuous random variable, is symmetric about μ, is P (X < μ - 2) equal to P (X > μ + 2)? (yes)

4. If X, a continuous random variable, is symmetric about X = 2, find P (X > 2) (0.5)

As shared on...

Page 17: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

BES PASS Week 7 2010

1. THE NORMAL DISTRIBUTION a) Just for starters – a warm-up question In the space below, draw what a normal distribution looks like. Then, in a different colour, show what happens when:

a) the mean increases/decreases b) the standard deviation increases/decreases

b) Calculating normal probabilities To calculate the probability that a normal random variable falls into any interval, we need to compute the area in the interval under the curve. But since that’s too hard (we need calculus), we can use the probability tables, provided we standardise this random variable.

THE STANDARDISED NORMAL RANDOM VARIABLE

Z = ( X – μ ) / σ

Class Example

You make an investment of stocks with an average return of 10%. Find the probability that you will lose money: a) if the standard deviation of returns is 5% (0.0228) b) if the standard deviation of returns is 10% (0.1587) Clue! Use the tables.

Topics to be covered: 1. The Normal Distribution & Finding Probabilities 2. The Normal Approximation to the Binomial 3. Concepts of Estimation

As shared on...

Page 18: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

c) Finding values of Z Just then, we focused on working out Z, then using that to work out the probability of something. However, often questions can ask us to ‘reverse engineer’ the process, by giving us a probability first, then working out what Z is. This is the complete opposite of the previous process.

FINDING VALUES OF Z, GIVEN A PROBABILITY

ZA = The value of Z such that the area to its right under the standard normal

curve is A. ie. ZA = The value of a standard normal random variable such that

P ( Z > ZA ) = A

Question

a) Find Z0.25 (1.96) b) Find Z0.05 (1.645)

d) ZA and percentiles

ZA & PERCENTILES

ZA = 100 ( 1 – A ) th percentile of a standard random variable. eg. Using question (b) from above, Z0.05 = 1.645 = the 95th percentile.

Let’s do some questions

1. The amount of time students spend each week on Facebook (FB) is a normally distributed random variable with a mean of 7.5 hours and a standard deviation of 2.1 hours. a) What proportion of students go on FB for more than 10 hours per week? (.1170) b) Find the probability that a student spends between 7 and 9 hours on FB. (.3559) c) What proportion of students spend less than 3 hours on FB? (.0162) d) What is the amount of time below which only 5% of students spend on FB? (4.05 hours)

2. An analysis of the amount of interest paid monthly by Visa cardholders reveals that the amount is normally distributed with a mean of $27 and a standard deviation of $7. a) What proportion of the cardholders pay more than $30 in interest? (.3336) b) What proportion of the cardholders pay more than $40 in interest? (.0314) c) What proportion of the cardholders pay less than $15 in interest? (.0436) d) What interest payment is exceeded by only 20% of the cardholders? ($32.88)

As shared on...

Page 19: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

2. THE NORMAL APPROXIMATION TO THE BINOMINAL a) Why we approximate the binomial by the normal Discrete distributions such as the binomial distribution are not that easy to draw inferences from. But inferences is the reason why we need sampling distributions. Because of this, we approximate the binomial distribution by a normal distribution, by drawing a bell-shaped curve to smooth out the ends of the rectangles in the histogram. b) Nuts and bolts of normal approximation to the binomial You should recall from last week that, for the binomial distribution:

BINOMIAL FORMULAS

mean: μ = n.p standard deviation: σ = √ n p ( 1 – p )

Note that, however, we can’t directly apply the normal to the binomial. We actually need a continuity correction factor of 0.5 to adjust for the approximation. In particular:

USING THE CONTINUITY CORRECTION FACTOR

Let Y be the normal random variable approximating the binomial random variable X. P ( X = x ) ≈ P ( x – 0.5 < Y < x + 0.5 ) P ( X ≤ x ) ≈ P ( Y < x + 0.5 ) P ( X ≥ x ) ≈ P (Y > x – 0.5 )

c) Some questions to try

3. Juliet and Sheldon are stars of the next Batman movie, the Dark PASS Leader. Juliet is Batwoman and Sheldon is Joker. We are in that moment close to the end of the film where Sheldon as Joker flips the very special coin with probability of heads as 0.95. Juliet doesn’t believe such a coin exists, so Sheldon, being Joker, messes around and flips it 100 times. What is the probability that: a) Juliet sees 100 heads flipped? (0.0133) b) Juliet sees at least 90 heads flipped? (0.9941) c) Juliet sees no more than 98 heads flipped? (0.9463) d) Juliet saves Gotham city from Sheldon’s destruction? (No chance at all, ever...)

As shared on...

Page 20: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

4. Anyway...so...moving on...Sheldon and Juliet have now become avatars. Sheldon is Corporal Jake Sully from Earth and Juliet is Neytiri from Na’vi. They are fighting against the evil humans who want to destroy their magic tree, from where they generate energy to teach their PASS classes. Sheldon is actually from the human team but betrays them and links up with Juliet to fight against the evil humans. Over the past few centuries, they fight 50 wars, with the probability of the Na’vi winning being a massive 10%. The possible outcomes of wars are only winning and losing – there is no such thing as ‘drawing’ a war. Calculate the probability that: a) Na’vi wins the war 8 times? (0.0695) b) Na’vi wins the war at least than 3 times? (0.8810) c) Na’vi wins the war no more than 7 times? (0.8810) d) Sheldon as Corporal Jake Sully will officially turn into an avatar (110%)...I will teach you in PASS class next week in the other world.

3. CONCEPTS OF ESTIMATION a) Some chilled questions to consider with the person next to you... What is the purpose of estimation? What is the difference between a point estimator and an interval estimator? b) The 3 desirable qualities of estimators 1. Unbiased-ness 2. Consistency 3. Relative efficiency

As shared on...

Page 21: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

BES PASS Week 8

Aims:

Learn about the sampling distribution of the mean

Learn the Central Limit Theorem (CLT)

Learn about Confidence Interval o Estimation of Population mean when Population variance is known

Selecting Sample size

Q. Recall the difference between: a) population vs sample b) parameter vs statistic

1. Sampling Distribution of the mean is the distribution of all possible values that can be assumed by that statistic, computed of samples of the

same size drawn from the same population i.e. allows us to estimate the population parameter using a sample statistic

o The population of a random variable will have certain parameters E.g. Mean μ and Variance σ2

o For a particular sample of size n, the sample statistic is unlikely to be the same as its population parameter

Known as sampling error: The cost of sampling which can be reduced by taking larger samples. (NB: Standard Deviation of Sampling distribution of the mean = Sampling Error)

o Different samples (of size n) will have different sample statistics i.e. Sample mean/variance will vary for each sample

o Taking repeated samples of size n, the distribution of this statistic can be computed

Properties of the Sampling Distribution of the Sample Mean 1) μx̅ = μ 2) σ2

x ̅ = σ2 and σx ̅ = σ n √n

3) If X is normal, X ̅ is normal If X is non-normal, X ̅ is approximately normal for sufficiently large sample sizes

2. Central Limit Theorem

Definition: The sampling distribution of the mean of a random sample drawn from any population is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of X̅ will resemble a normal distributi on.

Mean Variance Distribution

Population parameter μ σ

Samples (any size) from normal pop μ σ2/n Normal

Samples (n≥30) from non-normal population μ σ2/n Approx normal re: CLT

As shared on...

Page 22: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010 Group Excercise 1

A sample of 100 observations is drawn from a normal population, with 1000 and 200 .

Find 1050 960 1100

Group Excercise 2 In a certain PASS community, 60% of all leaders are in favour of electing Sheldon as the ‘genius’. A random sample of 200 leaders is taken. What is the probability that 100 or less of these leaders favour the election of Sheldon as the one and only ‘genius’? (0.0025)

Group Excercise 3

Cadbury Yowie chocolates are known to have a mean weight of 27g and a variance of 6.25g

squared. If a random sample of 60 Yowie’s is examined, find the probability that its average is:

a) Below 26g (0.1075)

b) Between 27.50g and 28g (0.1601)

Group Excercise 4

A basketball coach is seeking tall recruits who are smart enough to be eligible for college. The

recruit must be at least 74 inches tall and have an IQ of 115 or above. Height and IQ are

independent of one another. IQ is normally distributed with mean 100 and standard deviation 12,

and height is normally distributed with mean 70 and standard deviation 2 inches. What percentage

of the population satisfies the coach’s requirements? (0.24%)

Additional Questions for you to do...

1. The amount of time lawyers devote to their jobs per week is normally distributed with a mean of 52 hours and standard deviation of 6 hours.

a.) Find the probability that the mean amount of work per week for three randomly selected professors is more than 60 hours. (0.0104)

b.) If the strict boss finds out that the average time worked by his 7 employees is less than 48hours, he will fire them all. What is the probability they will be fired? (0.29454)

2. The time it takes for a statistics professor to mark his mid-session test is normally distributed with a mean of 4.8 mins and a standard deviation of 1.3 mins. If there are 60 students in the class, what is the probability that he needs more than 5 hours to mark all the mid-session tests? (0.1170)

3. Pierre’s goose farm claims that its jars of foie gras have a weight of 250g and a standard deviation of 6g. After buying 36 jars, before eating them on petits blinis with some fig jam, salt and pepper, you weighed them and found them to have a mean of 245g. What general statement can we make about the Pierre’s claim? (Pierre is a lying Frenchman)

4. The mean of a population is 18.75, and the standard deviation is 7.8:

a) If a sample of 50 values is taken, what is the probability that the sample mean is greater than 20?

(0.1292)

b) If a sample of 100 is taken, what is the probability that the sample mean is greater than 20? (0.0548)

As shared on...

Page 23: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

3. Confidence Interval

Recall: o Point Estimators – produce a single estimate of the parameter of interest o Interval Estimators – produce a range of values and attach a degree of confidence with that interval

Confidence Interval: is a interval estimator defined by the confidence level (1-α) This implies that we start with the confidence level we want and then work out the width of the interval Deriving it step by step...

1. Recall: Standard Normal: Z = X ̅ – μ σ/√n

2. Employ the definition of a confidence interval

Symmetrical Interval: P(-Zα/2 < X ̅– μ < Zα/2) = 1-α σ/√n

3. Rearranging, P(X ̅ - Zα/2 σ < μ < X ̅+ Zα/2 σ) √n √n Lower Confidence Level Upper Confidence Level (LCL) (UCL)

Therefore the confidence interval estimator (confidence level 1- α) is:

How do we interpret this interval?

Confidence Level

(1 - ) /2 Z/2

90% 0.1 0.05 1.645

95% 0.05 0.025 1.960

98% 0.02 0.01 2.326

99% 0.01 0.005 2.576

4. Selecting sample size Previously, we determined our level of confidence first and then the width of the interval. However, sometimes, the width of the confidence interval may be determined before sampling. To calculate the sample size based on a desired sampling error, we use the formula: (Solve for n from the above confidence interval formula)

where B = the sampling error, and is equal to . *Note that this is equal to half the width of the confidence interval

Consider what happens to the width of a CI when: o Standard deviation changes o Confidence level changes o Sample size changes o Sample mean changes

As shared on...

Page 24: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

Group Excercise 5 If we know that σ = 40, and we obtain a sample mean of 136, construct a 95% confidence interval for the population mean using a sample size of:

a) 20 (-118.47 ≤ μ ≤ 153.53 ) b) 160 ( 129.80 ≤ μ ≤ 142.20 )

Group Excercise 6 If we know that σ = 40, and we obtain a sample mean of 136 using 25 values, construct a confidence interval for the population mean having:

a) A confidence level of 99% (115.392 ≤ μ ≤ 156.608) b) A confidence level of 50% (130.6 ≤ μ ≤ 141.40)

Group Excercise 7 John wants to estimate the average time it takes for customers to have lunch at his new cafe. He knows from past experience that the standard deviation will be 18. John wants to use a confidence interval of 90% and have a sampling error no greater than 3 minutes. How many customers does he need to time? (98)

Additional Questions for you to do... 1. The average mark for an exam was between 65% and 75% using a confidence coefficient of 90%. Indicate

whether the following statements are true or false: a) 90% of students scored a mark between 65% and 75% b) If random samples were taken, then 90% of the samples would have a mean between 65% and 75% c) The probability that the average population mark will be contained in this interval is 90% d) The probability that the average population mark will fall between this interval is 90%

2. In an article about disinflation, various investments were examined.

The investments included stocks, bonds, and real estate. Suppose that a random sample of 200 rates of return on real estate investments was computed and recorded. The sample mean was calculated to be 12.10% return. Assuming that the standard deviation of all rates of return on real estate investments is 2.1%, estimate the mean rate of return on all real estate investments with 90% confidence. Interpret the estimate. (11.86% : 12.34%)

3. An economist wants to estimate the mean annual income of households in a particular district. It is assumed that the population standard deviation is $4000. The economist wants to estimate the sample mean to

within D = $500 of the true mean with 95% level of confidence. Calculate the sample size required.

4. Starting annual salaries for university graduates with business degrees are believed to have a standard deviation of approximately $1800. A 95% confidence interval estimate of the mean annual starting salary is desired. How large a sample should be taken if we want to be 95% confident that the maximum sampling error is:

a. $500

b. $200

55.. A medical researcher wants to investigate the amount of time it takes for patients’ headache pain to be relieved after taking a new prescription painkiller. She plans to use statistical methods to estimate the mean of the population of relief times. She believes that the population is normally distributed with a standard deviation of 20 minutes. How large a sample should she take to achieve 90% confidence to within 1 minute?

John wants to estimate the average time it takes for customers to have lunch at his new café. He knows

from past experience that the standard deviation will be 18. John wants to use a confidence interval of

90% and have a sampling error no greater than 3 minutes. How many customers does he need to time?

(98)

As shared on...

Page 25: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

BES PASS Week 9 2010

1. HYPOTHESIS TESTING a) Setting up a model to help us think about hypothesis testing. What exactly is a hypothesis test? We can try and simplify the issue by using the most popular model of thinking – a criminal trial. Imagine Sheldon has been arrested for drink driving...although this would never happen. Suppose that we set up 2 hypotheses to test:

o H0: Sheldon is innocent (the null hypothesis) o H1: Sheldon is guilty (the alternative hypothesis)

The testing procedure begins with the assumption that the null hypothesis is true. o ie. Assume Sheldon is innocent – assume he is not a drink driver.

What is the goal of hypothesis testing? It is to determine whether there is enough

evidence to infer that the alternative hypothesis is true. In statistics, the speak of the result of the hypothesis test in either 1 of 2 ways:

a) ‘rejecting the null hypothesis in favour of the alternative’ b) or ‘not rejecting the null hypothesis in favour of the alternative’

Notice that we don’t say that we ‘accept’ the null hypothesis...why? b) Errors induced when running hypothesis tests There are 2 possible errors:

o A Type I error occurs when we reject a true null hypothesis. ie. Sheldon is innocent (Juliet was the actual drink driver), but he is still

wrongly convicted. P ( Type I error ) = α

o A Type II error occurs when we do not reject a false null hypothesis. ie. Sheldon is actually guilty of drink driving, but he is acquitted. P ( Type II error ) = β

c) Group discussion exercise In the following 2 scenarios, identify what the null and alternative hypotheses would be: 1. You are considering whether you should apply to be a PASS leader in 2011. If you

succeed, a life of fame, fortune and happiness awaits you. If you fail, no one will like you. Should you apply?

2. You are faced with 2 investments. One is very risky, but the potential returns are high.

The other is safe, but the potential is quite limited. Which one should you choose?

Topics to be covered: 1. Hypothesis Testing – Type I & Type II errors 2. Test about the mean when the population standard deviation is known 3. p-values

As shared on...

Page 26: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

2. TESTING THE POPULATION MEAN WHEN THE POPULATION STANDARD DEVIATION IS KNOWN. Let’s go through an example as a class (adapted from Keller) to illustrate the required process. a) Factual scenario Sheldon and Juliet run an Asian mini-goods store where they sell, amongst other things,

Hello Kitty mobile phone chains and Totoro pillows. Juliet wants to introduce a new profit strategy - selling Easyway drinks too.

They determine the new profit strategy will be cost-effective only if the mean monthly account is more than $170.

A random sample of 400 accounts is drawn, for which the sample mean is $178. Juliet knows the accounts are approximately normally distributed with a standard

deviation of $65. Can we conclude the new strategy will be cost-effective, if we run the test at 95%

confidence? b) Setting up the model What is the null and alternative hypothesis?

H0: H1:

NB: There are 2 methods in which we can proceed with this problem, using either:

a) the rejection region method b) the p-value approach

We will consider both methods. c) The rejection region method The rejection region is a range of values such that if the test statistic falls into that range, we decide to reject the null hypothesis in favour of the alternative hypothesis. Show how we can use the rejection region method to solve this problem in the space below. Draw diagrams where appropriate.

As shared on...

Page 27: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

d) The p-value approach Why do we have 2 methods to do the same problem? Well...there are actually some

drawbacks to using the rejection region method. Can you think of some? The p-value of a test is the probability of observing a test statistic at least as extreme as

the one computed given that the null hypothesis is true. So in our example here, what would the p-value be? e) Interpreting the p-value Interpreting the p-value we just calculated, this means that the probability of observing

a sample mean at least as large as 178 from a population whose mean is 170 is _______, which is very small. o In other words, we have just observed an unlikely event, an event so unlikely that

we seriously doubt the assumption that began the process (that the null hypothesis is true).

o Consequently, we have reason to reject the null hypothesis and support the alternative.

However, one thing must be clear to you – the p-value is not the probability that the null hypothesis is true. You cannot make a probability statement about a parameter; it is not a random variable. This is a similar case to what we did last week with confidence interval estimators.

You may notice that the further the sample mean is from the hypothesized mean, the smaller the p-value is.

We can develop this further to use the idea of significance to describe the p-value. Let’s fill out the table below together:

DESCRIBING THE P-VALUE

Range of p-value Amount of evidence to infer that the alternative hypothesis is true

Term for level of significance

< 0.01 Overwhelming

0.01 – 0.05 Strong

0.05 – 0.10 Weak

> 0.10 None

f) The p-value and rejection region methods Note that another way to make the rejection / non-rejection decision is to compare the

p-value with the selected value of the significance level.

COMPARING P-VALUES WITH THE SIGNIFICANCE LEVEL

o If p-value < α → p-value is small enough to reject the null hypothesis. o If p-value > α → we do not reject the null hypothesis.

As shared on...

Page 28: PASS Notes

Sheldon and Juliet Wednesday 2-3pm Red Centre M010

g) A note – be careful when you draw conclusions from hypothesis tests

HOW TO DRAW CONCLUSIONS FROM HYPOTHESIS TESTS

o If we reject the null hypothesis, we conclude that there is enough statistical

evidence to infer that the alternative hypothesis is true. o If we do not reject the null hypothesis, we conclude that there is not

enough statistical evidence to infer that the alternative hypothesis is true.

NB: You cannot prove that either the null or hypothesis is true.

h) One and Two Tail tests

ONE VS TWO TAIL TESTS

o One Tail tests are of the form:

H0: μ = μ0 H1: μ > μ0 or H1: μ < μ0

o Two Tail tests are of the form: H0: μ = μ0

H1: μ ≠ μ0

THE ADJUSTMENT REQUIRED FOR RUNNING TWO-TAIL TESTS

1. Change the rejection region into a two-tail rejection region of the form

z < - zα/2 or z > zα/2 2. See if the Z-score you calculate falls into either of these 2 rejection regions. 3. In terms of the p-value, you need to determine the p-value in both tails.

3. PRACTICE QUESTIONS 1. Juliet gets really annoyed when her BES students take too long to finish her class tests.

She’s shot a few of them in the leg before. To investigate further, she randomly samples 10 students and measures the amount of time they spend doing a BES test. The results are listed below. Assuming that the times are normally distributed with a standard deviation of 2 minutes, test to determine whether the owner can infer at the 5% significance level that the mean amount of time spent on the tests is greater than 6 minutes. Data: 8 11 5 6 7 8 6 4 8 3. (Answer: z = .95, p-value = .1711, no.)

2. Sheldon owns a telecommunications company called ‘Shel-Tel’ which specializes in providing cheap phone call rates back to Hong Kong. Suppose mean and standard deviation of monthly long-distance bills of customers are $17.09 and $3.87 respectively. Sheldon takes a random sample of 100 customers and recalculates their monthly bill using rates quoted by a leading competitor, as $17.55. Can we conclude at the 5% significance level that there is a difference between the average Shel-Tel bill and that of the competitor? (Answer: z = 1.19, p-value = .2340, no.)

As shared on...

Page 29: PASS Notes

Juliet and Sheldon Wednesday 2-3pm Red Centre M010

0 :H

Rejection Region

:Actual

Non-Rejection Region

But SHOULD reject!

1

x zn

Correctly Rejected

Rejection Region

When H0 is False

Hypothesized Mean

Distribution

Actual Mean

Distribution

BES PASS WEEK 10 Aims:

Learning to calculate the probability of Type II errors and the Power of the Test

Hypothesis Testing when population variance is unknown

Sampling distribution of sampling proportion

1. Some clarification...

Sample Mean: 𝑋 ~ 𝑁(𝜇𝑋 ,𝜎𝑋 2 )– subscripts to show this is different to the population

Hypothesis testing: testing where our 𝑋 lies in relation to our hypothesised μ0 o Methods: Critical values for 𝑋 (not a confidence interval) and critical values using z-scores o State H0 with a strong equality sign (=) and your conclusion with the level of significance. o Value of α is called our significance level

2. Type I and Type II errors

ERRORS Given a true H0 Given a false H0

Reject H0 Type I error Correct Decision Do not reject H0 Correct Decision Type II error

Type I error occurs when we reject a true null hypothesis The significance level α is usually given as 0.01, 0.05 or 0.10 P (Type I error) = α P (Reject H0 | H0 is true) = α

Type II error occurs when we do not reject a false null hypothesis P ( Type II error) = β P (Do not reject H0 | H0 is false) = β

NB. There is a trade-off between the two types of errors. Changing our significance level α will produce resultant changes in β.

Power of the test The power of the test is the probability of correctly rejecting a false null hypothesis. Power = 1 - β NB. 1 – β ≠ α!!!! Steps:

1. Draw the distribution of μ0 under the null hypothesis, H0

2. Find the critical value, c and rejection region for

α level of significance P(Type I error) 3. Draw a new distribution for the true population

mean, μ1 in relation to H1 P(Type II error)

For an upper tailed test: P(Type II error) = 𝐏(𝐗 <𝑐|𝜇=𝛍1) - see diagram

For a lower tailed test: P(Type II error) = 𝐏( 𝐗 >𝑐|𝜇=𝛍1)

As shared on...

Page 30: PASS Notes

Juliet and Sheldon Wednesday 2-3pm Red Centre M010

Z-Normal

t-dist

Problems for you to do... 1) N.S.W. Police are testing if vehicles are exceeding the speed limit of 90km/hr on South Dowling Road. A sample of 81 vehicles yields a mean driving speed of 98km/hr. If the population of vehicle speeds is normally distributed with a standard deviation of 25 km/hour, test the hypothesis, at the 5% level of significance, H0: μ= 90; H1: μ> 90. If H0 is rejected, calculate β, the probability of Type II error, given that the true μ= 100. (β = 0.0253; power of test = 0.9747; we reject H0) 2) Miss Rose was researching dress sizes. She had thought the mean dress size was 9. But her suspicion is that it will be larger than that. Thus, being the relative unknown and incredible mathematician she was, Miss Rose decided to do a hypothesis test. She found the population to be normally distributed, with standard deviation of 4. If α = 0.05 and the sample size was 64, calculate the power of the test if the mean was actually a. 9.5 (0.1844) b. 10 (0.6387) 3) What will be the answer for (a) and (b) in the above example if Miss Rose only suspected that the mean size was not 9? (0.17, 0.516)

3. T-distribution

So far, the problems we have dealt with assume that the population variance 2 is known

This is unrealistic, we’re more likely to know the sample variance

Note that s2 is an unbiased and consistent estimator of 2

For large sample sizes: (n > 30)

By the CLT, 𝑋 is approximately normal regardless of the population distribution

Standardised test statistic remains

approximately normal even when replacing with s

Use Z-scores and normal distribution table

For small sample sizes: (n < 30) Must use the t-distribution

Similar to the normal distribution but with “fatter” tails

Our variance is determined by our ‘degrees of freedom’, v

T-distribution is only valid if the underlying distribution is normal

We have a new t-statistic:

𝑡𝛼 ,𝑣 = 𝑥 − 𝜇𝑠 𝑛

Where P( t > tα,v) = α and v = n-1 (degress of freedom)

Confidence Intervals when 2 is unknown

The procedure is still the same….The only difference is that we replace our z-score with a t-score, and with an s!!!

𝑥 ∓ 𝑡𝛼 2

𝑠

𝑛 , 𝑣 = 𝑛 − 1

Where α = significance level and 1 – α is the confidence level

As shared on...

Page 31: PASS Notes

Juliet and Sheldon Wednesday 2-3pm Red Centre M010 Hypothesis Testing when σ2 is unknown

Procedure also remains the same except we replace our z-score with a t-score and σ with s

Using the unstandardised method, our critical values for X are: Assuming H0: μ = μ0

If H1: μ > μ0 then c = μ0 + tα(s/√n) If X > c, reject H0, otherwise we do not reject

If H1: μ<μ0 then c = μ0 – tα(s/√n) If X < c, reject H0, otherwise we do not reject

If H1: μ≠μ0 then c = μ0 ± tα/2(s/√n) If X < c or X > c, reject H0, otherwise we do not reject

Or using the standardised method, our critical values for t are: tα, v (one-tailed) or tα/2,v (two-tailed test) where α = level of significance ν = n-1 (degrees of freedom) Assuming H0: μ = μ0

o If H1: μ > μ0 reject H0 if t > tα,v

o If H1: μ < μ0 reject H0 if t < -tα,v

o If H1: μ ≠ μ0 reject if: t > tα/2,v or t < -tα/2,v Problems for you to do… 4) Sheldon owns a farm and needs to know the number of strawberries that can be picked on weekday mornings. On

a sample of 8 Monday mornings, the number of strawberries picked between 7am and 9am are counted. Assume

the population is normal. Construct a 99% confidence interval for the population mean if the sample mean is 1500

and the sample standard deviation is 300. (1128.88, 1871.12)

5) Petit Restaurant claims to sell at least 60 cakes per day. Assume Petit’s sales are approximately normally distributed. To test Petit’s claims, 16 days are selected at random and tested. The sample yields a mean of 56 and a sample standard deviation of 5.25. Perform the test of Petit’s claims against a suitable alternative, assuming α = 0.05. (Reject null) 6) A car rental company is interested in the amount of time its vehicles are out of operation for repair work. A random sample of 12 cars showed that, over the past year, the numbers of days each had been inoperative were as follows: 15, 11, 19, 24, 6, 18, 20, 15, 18, 12, 14, 19. Given that the population is normally distributed, find with 99% confidence and interval which the actual mean may be within. (11.618, 20.216) 7) In a study to determine the capability of the BES PASS leaders, Judith, the PASS coordinator has to measure the mean exam marks of every student that attends his class. She takes 9 random students and their exam marks. The sample mean and standard deviation were 80 percent and 4 marks respectively. Assuming that the marks are normally distributed, calculate a 95% confidence interval for the true exam mark.

As shared on...

Page 32: PASS Notes

Juliet and Sheldon Wednesday 2-3pm Red Centre M010

4. Sampling Distribution of a Sample Proportion Recall the binomial distribution where X is the number of successes for a fixed number of trials

n = no. of trials

p = probability of successes q = (1-p) = probability of failure

E(X) = np and Var(X) = npq

Similar to the distribution of a sample mean, if we take many samples of size n and calculate the sample

proportion of success, 𝒑 (𝑿

𝒏) for each of them, you will find..

E(𝑝 )= p

Var (𝑝 )= pq/n

What is the distribution of 𝒑 ? By the CLT, for large sampel sizes X is approximately normal, therefore the sample proportion 𝑝 is also approximately normal.

𝑝 ~ 𝑁 𝑝,𝑝𝑞

𝑛 and therefore our z-score is

Our confidence level for the Population Proportion (p) is

Hypothesis Testing for the Population Proportion (p)

Assuming H0: p = p0 our critical values are:

o If H1:p < p0 then p* = p0 – Zα√(p0q0/n)

o If H1:p > p0 then p* = p0 + Zα√(p0q0/n)

o If H1:p ≠ p0 then p*= p0 ± Zα/2√(p0q0/n)

Or using the standardised test-statistic:

o If H1:p < p0 , reject H0 if Z < -Zα

o If H1:p > p0 , reject H0 if Z > Zα

o If H1:p > p0 , reject H0 if Z < -Zα/2 or Z > Zα/2

Problems for you to do..again….last one!

8) The proportion of families buying milk from Company A in a certain city is p = 0.6. A random sample of 10 families shows that 4 buy milk from Company A. a) Conduct a hypothesis test with a null H0: p = 0.6 against the alternative H1: p < 0.6. Find the critical values using both unstandardised and standardised methods at the 5% significance level. (Do not reject null)

b) Construct a 95% confidence interval for p. Does this interval include 0.6? [0.096, 0.799] If we reject the null when 3 or fewer families buy milk from Company A: c) Find the probability of committing a Type I error. (0.055)

d) If the true proportion of families buying milk from Company A is p = 0.5, what is the probability of committing a Type II error based on the above decision rule? (0.828)

As shared on...

Page 33: PASS Notes

Sheldon & Juliet Wednesday 10-11am OMB229

BES PASS Week 11 2010

1. SIMPLE LINEAR REGRESSION – AN INTRODUCTION Introducing regression

Regression analysis is used to predict the value of one variable on the basis of other variables.

The technique involves developing a mathematical equation/model that describes the relationship between the dependent variable (Y) and the independent variables (X1, X2, X3, …, Xk, where k = the number of independent variables.

Regardless of why regression analysis is performed, we begin by developing this mathematical equation/model that describes the relationship between the dependent variable and independent variables.

The Simple Linear Regression Model (aka. the First-Order Linear Model)

THE SIMPLE LINEAR REGRESSION MODEL

y = β0 + β1x + ε

In order to investigate the relationship between x and y, we need to calculate the value of the coefficients β0 and β1 using the least squares method, with whom you had a friendly encounter in Week 2.

Least Squares Method

Why is it called the ‘least squares method’? Recall that when we draw a line through a set of sample data, we aim for the best line – the line of best fit. In particular, this line is the one which is closest to the sample data points; the line that minimizes the sum of the squared differences between the points and the line.

LEAST SQUARES LINE COEFFICIENTS

b1 = sxy

sx2

b0 = y − b1x

Class Example

The annual bonuses (millions) of 6 football players from Chelsea FC [the 2010 Premier League (clearly dominating Man Utd) AND FA Cup Champions] with different years of experience are recorded as follows. The manager, Carlo Ancelotti, has hired you as his private statistician to determine the relationship between annual bonus and years of experience.

Years of experience (x) 1 2 3 4 5 6

Annual Bonus (y) 6 1 9 5 17 12

Topics to be covered: 1. Simple linear regression 2. Assumptions of the regression model 3. Methods of assessing and analysing the model

As shared on...

Page 34: PASS Notes

Sheldon & Juliet Wednesday 10-11am OMB229

Frank Lampard has already performed some initial calculations for you:

SOME HELPFUL DATA FOR THIS QUESTION

xi = 21ni=1

yi = 50ni=1

xiyi = 212ni=1

xi2 = 91n

i=1

WHAT YOU NEED TO CALCULATE

sxy =

sx2 =

b1 =

x =

y =

b0 =

Finally, the least squares simple regression line is:

y =

HOW TO INTERPRET THE SIMPLE REGRESSION LINE

Advise Carlo on the relationship between bonuses and years of experience at Chelsea FC.

2. ASSUMPTIONS OF THE REGRESSION MODEL

THE 7 ASSUMPTIONS OF THE REGRESSION MODEL

1. 2. 3. 4. 5. 6. 7.

As shared on...

Page 35: PASS Notes

Sheldon & Juliet Wednesday 10-11am OMB229

3. METHODS OF ASSESSING AND ANALYSING THE MODEL Introduction

Having established the required conditions for our assessment methods to be valid in the previous section, we can now look at the methods to assess our regression model.

However, we need to look at the concept of the sum of squares for error, which forms the foundation for all these methods.

Sum of squares for error

Recall that the least squares method determines the coefficients that minimize the sum of squared

deviations between the points and the line defined by the coefficients – aka. the sum of squares for

error (SSE).

SHORT-CUT FORMULA FOR SSE

SSE = (yi − y i)2 = n

i = 1

Method 1: Standard Error of the Estimate (SEE)

STANDARD ERROR OF ESTIMATE

sε = SSE

n − 2

QUESTION

1. Calculate the standard error of estimate for Chelsea FC. (1.596) 2. Interpret what it tells you about the model’s fit.

Method 2: The t-test of the slope (a hypothesis test)

In this method of assessing the regression model, we look in particular at the slope of the simple regression line and run a hypothesis test on it. Steps are below.

Step 1: Set up hypothesis test

Ho: β1 = 0 (ALWAYS)

H1: β1 ≠, >, < 0

Step 2: Find rejection region

If our test statistic falls within the rejection region, we can conclude that the variables are linearly related.

If β1 > 0, then the variables are positively related.

If β1 < 0, they are inversely related.

Since β1 is our X coefficient, this means that a one unit change in X will cause a β1 change in Y.

As shared on...

Page 36: PASS Notes

Sheldon & Juliet Wednesday 10-11am OMB229

QUESTION

1. Perform a hypothesis t-test of the slope for Chelsea FC at 5% significance. (t-stat = 5.5413, reject null). 2. Interpret what it tells you about the model’s fit.

Method 3: Coefficient of Determination

The coefficient of determination, R2, allows us to determine the strength of a linear relationship.

R2 = the amount of variation in the dependent variable that is explained by variation in the independent variable.

To fully understand this, we will need to break down the total variation in y, as follows:

COEFFICIENT OF DETERMINATION

R2 =sxy

2

sx2sy

2 = 1 − SSE

(y i− y )2=

(y i− y )2− SSE

(y i− y )2=

EXPLAINED VARIATION

VARIATION IN Y

QUESTION

1. Calculate the coefficient of correlation for Chelsea FC. (0.491) 2. Interpret what this tells you about the regression model.

Critical Values & Decision Rule

H1: β1 > 0 t > t α, n-2

H1: β1 < 0 t < - tα, n-2

H1: β1 ≠ 0 |t|< tα/2, n-2

t = b1 − β1

sb1

Step 3: Calculate test statistic

Step 4: Conclusion

If we don’t reject Ho we can conclude y is not linearly related to x

sb1=

n − 1 sx2

As shared on...

Page 37: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

BES PASS Week 12

Aim:

Learn about the prediction in linear regression

Learn about the multiple regression model

Prediction in linear regression We can use our model to forecast or estimate values of Y (dependent variable)

Point Prediction

o Use the fitted regression line to predict a value of Y for a given level of X

i.e. ŷ = b0 + b1x

o NB. This prediction is less accurate if the value of X falls outside the range of OLS

o This point estimate does not provide any information on how close our predicted value is

from our true value.

Thus..we use,

Interval Prediction

Formula

Prediction Interval

This prediction interval is used to predict a one-time occurrence for a particular value of the dependent variable

Confidence Interval Estimator of the Expected Value of Y

This is the confidence interval used to predict the mean of y or the long-run average of y.

Why is there a missing “1” under the square root for the confidence interval estimator? Ans. There is less error in estimating a mean value as opposed to predicting an individual value

As shared on...

Page 38: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010 Class Excercise In television’s early years, most commercials were 60seconds long. Now, however, commercials can be any length.

The objective of commercials remains the same-to have as many viewers as possible remember the product in a

favorable way. A total of 60 participants were shown advertisements of varying length and each was given a test

score based on what they would remember. Using the data set (Keller, 16.06)

a) Determine the least squares line of test scores on the length of the advertisement.

b) Interpret the coefficients and their significance. Comment on the overall fit of the model.

c) Predict with 95% confidence the memory test score of a viewer who watches a 36 second commercial.

d) Estimate with 95% confidence the mean memory test score of people who watch 36 second commercials.

Multiple Regression Recall the assumptions of a classical linear regression model

Problem? Only measured the effect of ONE variable on the model

All the other factors were omitted and included in the error term (ε) This can cause confoundment and omitted variable bias. Bias occurs when:

Omitted variable is correlated with explanatory or other independent variable

Omitted variable is a determinant with the explanatory variable Violates the assumption of the zero conditional mean and therefore, OLS estimates are no longer unbiased. Our new population regression model is:

Interpretation of β1

Measures the effect of a change in X1 holding X2, X3, ... , Xk constant

Also known as the partial effect of X1 holding all other explanatory variables constant What happens if the variables X2, X3, ... , Xk are omitted and these variables are correlated with X1?

Omitted variables will appear in the disturbance/error term

ZCM assumption will be violated (error term now correlated with independent variable)

Produces a biased estimator of β1 (will also include the effect of other variables on Y)

Also,

𝑋 = 38 𝑆𝑋

2 = 193.90

𝑦 = 13.80

𝑆𝑦2 = 47.96

𝑠𝑥𝑦 = 57.86

n = 60

As shared on...

Page 39: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010

Essentially, the process of multiple regression remains the same as linear regression

Minimising SSE

(𝑦𝑖− 𝑦𝑖 𝑛𝑖=1 )2 gives us 𝑦𝑖 = 𝑏0 + 𝑏1𝑥𝑖 + ⋯+ 𝑏𝑘𝑥𝑘𝑖

Additional Assumption (to the 7 you already have) No perfect (multi)collinearity or exact linear relationships between the explanatory variables This is particularly important for dummy variables

Assessing the model

Standard Error of Estimate

𝑠∈ = 𝑆𝑆𝐸

𝑛−𝑘−1 where k-1 is the number of explanatory variables

Hypothesis Testing

where v = n-k-1

Coefficient of Determination – Adjusted R2

Excel prints off an additional R2 statistic which is called the coefficient of determination adjusted for

degrees of freedom.

This is because, adding an extra explanatory variable will never imply a fall in R2

𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 −

𝑆𝑆𝐸(𝑛 − 𝑘 − 1)

(𝑦𝑖 − 𝑦𝑖 )2

(𝑛 − 1)

o If n > k, unadjusted and adjusted R2 will be similar

o If SSE is statistically significant (i.e. quite different to 0), the values of unadjusted and

adjusted R2 will differ substantially

Class Exercise: (Adapted from QMB Final Exam S2 2007) The Human Rights and Equal Opportunity Commission has asked you for further analysis on gender discrimination in the law firms; (see Question 2), including an examination of the difference in income. In order to provide some evidence on whether differences in the incomes of male and female lawyers are due to discrimination or some other factors, a regression model is constructed based on human capital theory. The model is specified as follows: HRINCOMEi = β0 + β1EXPi + β2FIRMSIZE2i + β3FIRMSIZE3i + β4FIRMSIZE4i + β5PARTNERi + β6FEMALEi + Ui HRINCOME = Hourly income from legal practice in dollars (i.e. total income divided by hours worked) EXP = Experience measured as the number of years working as a lawyer FIRMSIZEI = Dummy variable equal to 1 if firm has between 1 - 10 lawyers inclusive, and 0 otherwise FIRMSIZE2 = Dummy variable equal to 1 if firm has between 11 - 50 lawyers inclusive, and 0 otherwise FIRMSIZE3 = Dummy variable equal to 1 if firm has between 51 - 200 lawyers inclusive, and 0 otherwise FIRMSIZE4 = Dummy variable equal to 1 if firm has equal to or greater than 201 lawyers, and 0 otherwise PARTNER = Dummy variable equal to 1 if lawyer is a partner, and 0 otherwise FEMALE = Dummy variable equal to 1 if lawyer is female, and 0 otherwise.

As shared on...

Page 40: PASS Notes

Sheldon and Juliet Wednesday 2-3pm RC M010 The regression was estimated by Ordinary Least Squares and a portion of the EXCEL output is reproduced below in Table 3:

a) The sample mean of HRINCOME for males is $59 and for females is $34. Why is this difference not necessarily evidence of gender discrimination? [2 marks] b) Use the regression output to conclude whether there is evidence of gender discrimination in hourly incomes. Justify your answer. [3 marks] c) Interpret the estimate for the EXP variable in terms of both economic and statistical significance. Is it consistent with your expectations? Discuss. [3 marks] d) Test the null hypothesis that β5 is equal to zero against the alternative that it is greater than zero. Use a 1% significance level. [1 mark] e) What are the "Standard Error" and "R Square" statistics reported amongst the "Regression Statistics" in the EXCEL output? Interpret the R Square result for this regression model. [3 marks] f) Calculate the predicted hourly income for a male lawyer, with 10 years experience who works in a firm with 20 lawyers but who is not a partner. [1mark]

Distributions thus far:

o Binomial Distribution (Week 6)

o Uniform Distribution (Week 6)

o Normal Distribution (Week 7)

o Distribution of the Sample Mean (Week 8)

o T-Distribution (Week 10)

o Distribution of the Sample Proportion (Week 10)

Next week....(Our last week!)

Chi-Squared Distribution

Revision on whatever we decided today...Confidence Interval, Hypothesis Testing?

As shared on...

Page 41: PASS Notes

Sheldon & Juliet Wednesday 2-3pm Red Centre M010

BES PASS Week 13 2010

1. CHI-SQUARED DISTRIBUTION

WARM-UP QUESTION

1. What does the chi-squared distribution look like? 2. What is the effect of increasing v (the degrees of freedom)?

IMPORTANT STATS FOR THE CHI-SQUARED DISTRIBUTION

Chi-squared random variable: χ2

Mean: E (χ2) = v

Variance: V (χ2) = 2v

HOW TO DETERMINE CHI-SQUARED VALUES

χ2 > 0 (always)

area to the right of χ2 = χA,V2

area to the left of χ2 = χ1−A,V2

use the table of values at the back of your yellow booklet

2. INFERENCES ABOUT A POPULATION VARIANCE There are 2 ways of drawing inferences about a population variance:

1) Confidence Interval Estimator of σ2 2) Run a hypothesis test on σ2

1) Confidence Interval Estimator of σ2

CONFIDENCE INTERVAL ESTIMATOR OF σ2

lower confidence limit (LCL)

= n−1 s2

χα 2 2

upper confidence limit (UCL)

= n−1 s2

χ1−α 2 2

Topics to be covered: 1. Chi-squared distribution 2. Inferences about population variance 3. Chi-squared Goodness-of-Fit Test 4. Chi-squared Test of a Contingency Table

As shared on...

Page 42: PASS Notes

Sheldon & Juliet Wednesday 2-3pm Red Centre M010

2) Running a hypothesis test on σ2

PRACTICE QUESTIONS

1. The sample variance of a random sample of 50 observations from a normal population was found to be s2 = 80. Can we infer at the 1% significance level that σ2 is less than 100? (No) 2. Estimate σ2 with 90% confidence given that n=15 and s2=12. (7.0932, 25.5684)

3. CHI-SQUARED GOODNESS OF FIT TEST The purpose of a Chi-squared goodness of fit test is to examine whether observed & expected frequencies are the same in a multinomial experiment. But first, let’s check out some stats for multinomial experiments.

PROPERTIES OF MULTINOMIAL EXPERIMENTS

Fixed number of trials (n)

Outcome of each trial falls into one of k categories (cells)

p1 + p2 + p3 + ... + pk = 1

Each trial is independent.

Step 1: Define Hypothesis Test

H0: σ2 = 1

H1: σ2 ≠, >, < 1

Step 2: Establish Rejection Region

If H1: σ2 > 1, RR: x2 > xα ,v2

If H1: σ2 < 1, RR: x2 < x1−α ,v2

If H1: σ2 ≠ 1, RR: x2 > xα/2,v2 or x2 < x1−α/2,v

2

Step 3: State Decision Rule If our test statistic falls within the rejection region, there is sufficient evidence to suggest that the population variance ≠ 1

χ2 =

n − 1 s2

σ2

Step 4: Calculate Test Statistic

Step 5: Conclusion Do we have enough evidence to reject H0, that the population variance = 1?

As shared on...

Page 43: PASS Notes

Sheldon & Juliet Wednesday 2-3pm Red Centre M010

FREQUENCY

Frequency = the number of outcomes falling into each of the k cells/categories.

It is notated by f1, f2, f3, ..., fk (where fi = the observed frequency of outcomes falling into cell i)

f1 + f2 + f3 + ... + fk = n

How to run a Chi-squared Goodness-of-Fit test – another flow-chart by Shel

PRACTICE QUESTION

3. We would like to make inferences about the market shares of Dell, HP, Apple, and the rest at the 5% significance level. In a random sample of 200 computers, we find that 48 are Dell, 42 are HP, 12 are Apple and 98 are the rest. Test the hypothesis that:

H0: p1=0.2, p2=0.2, p3=0.1, p4=0.5

H1: At least one pi is not equal to its specified value (Answer: Don’t reject H0 at 5% significance level)

Step 1: Check the Rule of Five For each cell, ei ≥ 5, where ei = npi

Step 2: Define hypothesis test

H0: p1 = ..., p2 = ..., p3 = ..., etc.

H1: at least one of the pi ≠ its specified value

Step 3: Critical Value, Rejection Region

Rejection region: x2 > x∝,k−12

Step 4: Decision Rule If our test statistic falls within the rejection region, there is sufficient evidence to suggest the observed frequency of a multinomial variable ≠its expected value

X2 = (fi − ei)

2

ei

k

i=1

Step 5: Calculate Test Statistic

Step 6: Conclusion Do we have enough evidence to reject H0 that at least one of the pi ≠ its specified value?

As shared on...

Page 44: PASS Notes

Sheldon & Juliet Wednesday 2-3pm Red Centre M010

4. CHI-SQUARED TEST OF A CONTINGENCY TABLE The purpose of running a Chi-squared test of a contingency table is to determine whether there’s enough evidence to infer that

a) 2 nominal variables are related

b) differences exist between 2 or more populations of nominal variables How to run a Chi-squared test of a Contingency Table – the final flow-chart Shel will ever make for you...

PRACTICE QUESTION

4. Test the hypothesis that income and education are independent at the 1% significance level.

Education/Income < $50k $50k - $100k > $100k TOTAL

Secondary 40 30 12 82

Tertiary 30 40 20 90

Doctorate 1 12 15 28

TOTAL 71 82 47 200 (Answer: reject H0 – ie. the variables are not independent.

Step 1: Define hypothesis test

H0: variables are independent

H1: variables are dependent.

Step 2: Rejection region, critical value Rejection region: x2 > x∝,v

2

where v = ( r – 1 ) ( c – 1 )

Step 3: Decision rule If our test statistic falls within the rejection region, there is sufficient evidence to suggest the variables are dependent. X2 =

(fi − ei)2

ei

k

i=1

eij = total of row i . (total of column j)

sample size

Step 4: Calculate test statistic

where

Step 6: Conclusion Do we have enough evidence to reject H0 that the variables are dependent?

As shared on...