Upload
kaki-lau
View
18
Download
0
Embed Size (px)
DESCRIPTION
Pass notes
Citation preview
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
QMB PASS Week 3 2010
1. INTRO Q: Can you explain the difference between the following pairs of terms?
a) descriptive statistics vs. inferential statistics? b) population vs. sample? c) parameter vs. statistic? d) discrete vs. continuous random variable?
2. DESCRIPTIVE TECHNIQUES a) Tabular A variable is some characteristic of a population or sample.
o Values are possible observations of the variable. o Data is the word we use for the actual observed values of a variable.
We can classify data into 3 different categories: o i) interval/quantitative/numerical o ii) nominal/qualitative/categorical o iii) ordinal
For nominal data, we can draw up a table to describe the data with a column each for: o a) class (a collection of data which are mutually exclusive) o b) frequency (grouping that data into classes) o c) relative frequency (representing the number of data in a class as a percentage of
the total data) b) Graphical If we were to describe nominal data, then we would use either a bar chart (for
frequencies) or a pie chart (for relative frequencies) But where we have interval data, we can consider using a histogram – steps being:
o 1. Collect interval data. o 2. Create classes / class limits for the data, for which you could consider using
Sturges’ Formula or the Class Width formula. o 3. Plot the data on a graph with frequency on the Y axis.
It is important to be able to describe shapes of histograms ~ a skill often tested in tutorial and final exams. Let’s look at that now.
Describing the shapes of histograms
a) Symmetry o Your data may be symmetrical or non-symmetrical. Use common sense.
Topics to be covered: 1. Introduction to Statistics 2. Descriptive techniques – Tabular, Graphical, Numerical 3. Introduction to Linear Regression
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
b) Skewness
o Positive skew o Tail to the right o Mode < Median < Mean
o Negative skew o Tail to the left o Mean < Median < Mode
c) Modal classes o The modal class is the class with the largest number of observations. o The 3 descriptions which could come in handy in describing your histogram are:
i) unimodal histogram = a histogram with one peak. ii) bimodal histogram = a histogram with two peaks. iii) bell-shaped histogram = a symmetric unimodal histogram.
Apart from the histogram, you should also be familiar with stem and leaf displays and
ogives. c) A separate note on bivariate relations Bivariate relations are an extension of univariate analyses to characterise relationships
between variables. You could represent them graphically using a scatter plot which could be a time series plot in particular, or in a table with a contingency table.
d) Numerical We can measure interval data in terms of central location:
o i) the mean / arithmetic mean = the average of the scores. o ii) the median is the middle term after they have been ordered. o iii) the mode is the observation that occurs with the greatest frequency.
Question time... 1. Oh no! How do we find the median if we have an even number of observations? 2. What are the advantages and disadvantages of each method of measuring central
location? We can also measure interval data with respect to variability – the spread of the data:
o i) range = largest – smallest observation
o ii) variance:
population variance:
sample variance:
o iii) standard deviation – simply take the square root of the variance. Also recall the empirical rule and Chebysheff’s Theorem when required to
interpret the standard deviation of your data.
o iv) coefficient of variation = standard deviation / mean
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
We can also measure with respect to relative standing.
o i) percentiles – the Pth percentile is the value for which: P% < (that value), and
(100-P%) > (that value)
location of a percentile, Lp: Lp =
100
p x (n + 1)
o ii) in extension to percentile theory, we can measure the interquartile range
interquartile range = 75th – 25th percentile = upper – lower quartile
this measures the spread of the middle 50% of observations
o At this point, make sure you’ve checked out what box plots and outliers are. 3. INTRODUCTION TO LINEAR REGRESSION Recall from your lecture or earlier from this PASS class about bivariate relations ~ in
particular, think scatter plots and the types of relationships between the variables we spot when we look at the plot. If we want to determine the intercept and slope of a relationship between X & Y axis variables, we need values that will give us the line of best fit.
To do so, we need to minimise the residual sum of squares by utilising the least
squares method. Much more on this and linear regression generally in the second half of the course, but
for the moment, take precautionary notice with these equations and formulas:
1. assume the regression equation:
2. then:
and
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
QUESTION BANK 1. Using these numbers: 2 3 3 6 8 9 14 16 17 20, find the:
a. mean b. median (8.5) c. mode d. lower quartile (3) e. upper quartile (16.25) f. interquartile range
2. You’re an investment banker and work 22.5 hours a day. Your monthly pay in recent months has looked like this cuz you’re a money machine: $23,000 $36,500 $47,200 $20,200 $61,300
a. What’s the sample mean? ($37,640) b. How about sample variance? ($292,743,000) c. And sample standard deviation? ($17,109.73)
3. A set of test scores has a mean of 890 and standard deviation of 120. What’s the coefficient of variation?
4. Check out these test scores: 88 76 67 90 98 68 75 86 82 90. Calculate: a. sample mean b. sample standard deviation c. coefficient of variation
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
BES PASS Week 4 Aims:
To learn about Data Collection and Random Sampling
To understand Joint, Marginal and Conditional Probability
To learn the probability rules and apply it to sampling with/without replacement
1. Methods of Data Collection Remembering that “Data” are mere observed values of a variable, and that a variable is just something that is of interest of us, we will proceed to use the following methods of data to observe these variables.
Direct Observation – measures the actual behaviour or outcomes o E.g. Asking people whether they’ve bought a product because of an advertisement
Experimental Data – imposes a treatment and measures the resulting behaviour or outcomes o E.g. Asking people to try aspirin and see whether they suffer fewer heart attacks or not
Surveys
Self administered Surveys – Surveys sent to people who then mail back with their responses
Personal Interviews
Telephone Interviews
Q. What do you think are the pros and cons for each of the methods of data collection? Hint: Think of the costs, response rate, purpose and biases that may arise
2. Random Sampling
The primary incentive for examining sample rather than a population is cost. Compiling statistics is usually expensive, imagining conducting experiments on 10,000 people and asking them to take an aspirin every day for 3 weeks and coming back to test on them!
Main Concept: Our Target Population can be inferred by the Sample Population if the sample statistic can come quite close to the parameter it is designed to estimate
There are 3 different types of sampling plans:
Simple Random Sample: A sample selected in such a way that every possible sample with the same number of observations is equally likely to be chosen
o E.g. Drawing ticket stubs in a raffle to determine the winner
Stratified Random Sample: Separating the population into ‘strata’ and then drawing simple random samples from each stratum
Cluster sampling: is a simple random sample of groups or clusters of elements
From these samples of observations, two main types of error arise: 1. Sampling Error – is the difference between the sample and the population that exists only because of
the observations that happened to be selected for the sample 2. Non-sampling Error – more serious than sampling error, and are due to mistakes made in the acquisition
of data or due to sample observations being selected improperly
Q. Discuss with the person next to you, examples of non-sampling errors.
3. Probability Questions to think about...
a) Independence v Mutually Exclusive b) Joint v Marginal Probability c) Intersection v Union
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
Conditional Probability is the probability of an event A, occurring given another event B, also occurring. It is represented by: which is read as “Given that B has occurred, what is the probability of A occurring?” Expanding this we get...
𝐏( 𝐀 | 𝐁 ) = 𝑷( 𝑨 𝒂𝒏𝒅 𝑩)
𝑷(𝑩)
One of the reasons we compute conditional probability is to find whether two events are related. I.e. we want to know whether they are independent events. If they are independent, the probability of one event is not affected by the occurrence of the other event
𝐏( 𝐀 | 𝐁 ) = 𝐏 (𝐀) 𝐏( 𝐁 | 𝐀 ) = 𝐏 (𝐁)
4. Other Rules
The Multiplication Rule: is used to calculate the joint probability of two events. Based on the conditional probability formula.... and then multiplying both sides by P(B) i.e.
P(A and B) = P(B).P(A|B)
For Independent events,
P(A and B) = P(A).P(B) since P(A|B) = P(A)
The Complement Rule: The complement of event A(denoted AC) is the event that occurs when event A does NOT occur i.e.
P(AC) = 1- P(A)
The Addition Rule: allows us to calculate the union of two events The probability that event A, OR event B, OR both occur is:
P(A or B) = P(A) + P(B) – P(A and B) For Mutually exclusive events,
P(A or B) = P(A) + P(B)
5. Sampling with or without replacement
If we were to finite (limited size) sample, we could:
a) Select without replacement: each time you select an observation you remove it from the pile, the
outcome of each selection will depend on the outcomes of previous selections because the size
of the population is getting smaller each time.
b) Select with replacement: each time you select an observation you re-place it back into the pile,
effectively this would mean population size stays the same and the outcomes of each selection
will be independent of one another.
P( A | B )
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
GROUP EXCERCISE 1
1. P (Female) 2. P (High Dist) 3. P (Female U High Dist)
4. P ( Pass )
5. P(( Pass l Male ) 6. Which ones of the above are Marginal Probabilities? 7. Which ones are Joint Probabilities?
GROUP EXCERCISE 2
Probability Trees Probability trees are a very neat and fast way for working out many probability problems. Example: (QMB Final ’99s2): “An advertising executive is studying the television viewing habits of married men and women during prime-time hours. The executive has determined that during prime-time, husbands are watching television 60% of the time. It has also been determined that when the husband is watching television, 40% of the time the wife is also watching. When the husband is not watching television, 30% of the time the wife is watching television.”
i. Find the probability that the wife is watching television. ( 0.36 ) ii. Find the probability that, if the wife is watching television, the husband is also watching television.
(0.6667 )
Male Female Row Total
High Distinction 75 61 136
Pass 215 155 370
Column Total 290 216 506
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
Useful Practice Questions
1.
2 3 3 4 4 5 5 5
5 6 6 7 7 8 9 17
a. find the mean and the mode of this data set (2 marks)
b. Find the median and the third quartile of this data set (2 marks)
2. 2. Suppose A and B are mutually exclusive events. If P(A) = 0.4 and P(B) = 0.2, then P(AlB)=?
3. 2 teams A and B are of equal ability, so each has a probability of 0.5 of defeating the other. Assume that the outcome of any game is independent of the outcome of any other game. What is the probability that team A wins 4 games in a row?
4. Approximately 30% of the sales representatives hired by a firm quit in less than 1 year. Suppose that two
sales representatives are hired and assume that the first sales representative’s behaviour is independent of the second sales representative’s behaviour.
a. What is the probability that both quit within the year? b. Find the probability that exactly one representative quits
5. A group of individuals concerned about environmental problems claims that 30% of the adults in a certain
town have been adversely affected by a new nuclear power plant that pollutes the air and causes lung damage. To test their claim, you randomly select 4 adults of the town
a. If the environmental group is correct, what is the probability that all 4 people have been adversely affected?
b. What is the probability that at least one of the 4 individuals has been adversely affected?
Answers 1. A) mean = 6, mode = 5
B) median = 5, 75% quartile = 7, observing that 50% of data points are below 5 and 75% below 7 2. 0 3. 0.0625 4. A) 0.09
B) 0.42 5. A) 0.0081 B) 0.7599
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
QMB PASS Week 5 2010
1. RANDOM VARIABLES & PROBABILITY DISTRIBUTIONS Here are a few questions as a warm up.
1. What is a random variable? 2. Can you recall from Week 3 PASS the difference between a discrete and continuous random variable?
If we are happy with this, we now approach the concept of a probability distribution, which is a table, formula, or graph that describes the values of a random variable and the probability associated with those values. For discrete probability distributions, there are 2 requirements:
1. 0 ≤ P(x) ≤ 1, for all x. 2. ∑ P(x) = 1.
Let’s think about the methods/techniques we can use to describe the population/probability distribution. From memory, or using your lecture notes, fill out the following table with assistance from group members around you:
ANALYSING PROBABILITY DISTRIBUTIONS
Term Definition Formula
Population Mean aka. Expected Value of X
Population Variance
(Full)
(Shortcut)
Population Standard Deviation
We also come across a new concept of the laws of expected value & variance. These are:
a) Expected Value 1. E(C) = C 2. E(X+C) = E(X) + C 3. E(CX) = C.E(X)
Topics to be covered: 1. Random variables & Probability Distributions 2. Bivariate Distributions 3. Applications in Finance: Portfolio Diversification & Asset Allocation
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
b) Variance 1. V(C) = 0 2. V(X+C) = V(X) 3. V(CX) = C2V(X)
Now, try these questions.
Q1. Sheldon has trouble sleeping at night because sometimes there is this one girl who calls him up at like 3am in the morning for no reason. It means he’s in a bad mood the next day. It happens so much he could actually create a probability distribution for it:
Number of time she calls Sheldon Probability she will call Sheldon
1 .05
2 .12
3 .20
4 .30
5 .15
6 .10
7 .08
Help Sheldon compute the mean and variance of the number of times the annoying girl calls him. (Mean = 4, variance = 2.40)
Q2. Continuing on, this girl is crazy. Every time she walks past a Louis Vuitton store, she has this burning temptation to buy a LV handbag. She used to buy, like, 2 or 3 at a time, but now that Sheldon dumped her, she’s more reluctant to buy one these days. This is the probability distribution for the number of LV handbags she buys each time she goes out:
Number of LV handbags she wants to buy
Probability she buys that number of handbags
0 .10
1 .25
2 .40
3 .20
4 .05
How many LV handbags should we expect her to buy on Thursday night? (1.85)
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
2. BIVARIATE DISTRIBUTIONS Do you recall bivariate relations from Week 3 PASS? We now come across the concept of bivariate distributions which provide the probabilities of combinations of 2 variables. There are 2 measures that are important in describing the bivariate distribution.
IMPORTANT FORMULAS FOR BIVARIATE DISTRIBUTIONS
Term Formula
Covariance
(Full)
(Shortcut)
Coefficient of Correlation
Importantly, we also have laws of expected value & variance for the sum of 2 variables too:
1. E(X+Y) = E(X) + E(Y) 2. V(X+Y) = V(X) + V(Y) + 2.COV(X,Y) ...noting that if X and Y are independent, then COV(X,Y) = 0.
Group Question This question is quite long so divide parts up with your partner to get it done in time.
Sheldon and Juliet are PASS leaders by day, and drug dealers by night. Let X and Y be the weight in kilograms of drugs Sheldon and Juliet sell each night respectively. Bivariate Probability Distribution:
X
0 1 2 Total
Y 0 .12 .42 .06 .6
1 .21 .06 .03 .3
2 .07 .02 .01 .1
Total .4 .5 .1 1.00
You are given the following information to assist you:
E(X) = .7 V(X) = .41 E(Y) = .5 V(Y) = .45
a) Calculate the covariance using either the full or shortcut formula. (-.15) b) Calculate the coefficient of correlation between the kilograms of drugs sold by Sheldon and Juliet. (-.35) c) Draw an inference/conclusion regarding your findings in part b).
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
3. APPLICATIONS IN FINANCE: PORTFOLIO DIVERSIFICATION & ASSET ALLOCATION Here are some questions to consider with the person sitting next to you:
1. Why should we diversify our portfolio’s investments? 2. How do we diversify? 3. Why did your lecturer include it in the lecture slides – ie. how is diversification related to statistics?
FORMULAS FOR A PORTFOLIO OF 2 STOCKS
Term Formula
Mean
E(Rp) = w1.E(R1) + w2.E(R2)
Variance
V(Rp) = w12σ12 + w22σ22 + 2w1w2ρσ1 σ2
Question
Sheldon has also joined the recent craze of investing in English football clubs. This is what his investment portfolio looks like:
Stock Manchester United (#1) Liverpool (#2)
Proportion of Portfolio .30 .70
Mean .12 .25
Standard Deviation .02 .15
For each of the following coefficients of correlation, calculate the expected value and standard deviation of the portfolio. a) ρ = .5 (.211, .1081) b) ρ = .2 (.211, .1064) c) ρ = 0 (.211, .1052)
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
BES PASS Week 6 Aims:
To understand the last of the discrete probability distributions – binomial distribution
To introduce continuous probability distributions – uniform distribution
Question Recall our discussion of discrete and continuous random variables.
Discrete = countable/finite
Continuous = range of values/infinite number of values in a given interval Which of the following are discrete and which are continuous? a) The number of goals scored in 20 attempts (discrete) b) The time it takes to write an essay (continuous) c) The number of people in a bar (discrete) d) The temperature inside a room (continuous) e) The amount of energy used by a computer (continuous)
1. Binomial Distribution Let’s recall the properties of a Binomial Experiment, there’s 4, so give it a shot!
1) Fixed number of trials (n) 2) Two possible outcomes: success and failure 3) P(Success) = p and P(Failure) = (1-p) 4) Trials are independent
Examples: Flipping a coin 10 times, Drawing 5 cards out of a shuffled deck Note: In a binomial experiment, there is an assumption of ‘a sequence of Bernoulli trials’, i.e. the random variables are independently and identically distributed (iid) Binomial Random Variable The probability of ‘x’ successes in a binomial experiment with ‘n’ trials and the probability of success ‘p’ is
o X ~ Bin(n,p)
o P ( X = x ) = nCx px qn-x
N.B. Learn to use Binomial tables!!!
P(X = k) – Individual binomial probability P(X ≤ k) – Cumulative binomial probability P(X > k) – Survivor probability
Also, from Perms and Combs,
!
! !n r
nC
r n r
Which means we can also write our Binomial Function as:
(Sheldon, my word won’t type equations! I shall write this one out >_<.. I had to copy and paste all these
equations)
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010 Mean and Variance of a Binomial Distribution
μ = E(X) = np
σ2 = Var(X) = np(1-p) o σ = √ np(1-p)
Exercises:
1. Sheldon knows that 15% of all the girls he goes out with want expensive presents during the first month of dating. He decides to test this theory out and goes out with 6 girls. Assume the performances of the girls are independent of one another. What’s the probability that: a) All six girls will require expensive presents during the month of dating? (0.0000) b) 1 of them will demand an expensive present during the first month of dating? (0.3993) c) At least 3 of them will require expensive presents during the first month of dating? (Hint: use cumulative binomial probabilities) (0.0473)
2. The Koch Electric Company makes electric shavers. If the probability that an electric shaver is defective is 0.01, what is the probability of the following in a shipment of 500 electric shavers that: a) None are defective? (0.0067) b) One is defective? (0.0337) c) More than three are defective? (0.735)
3. A plumber installs six hot water heaters in a housing development. The probability that any individual heater will last more than 10 years is 0.7, and their life lengths are independent. Let X denote the number of water heaters that last more than 10 years. a) Find the probability that more than 3 of the water heaters will last more than 10 years (0.7443) b) Find the mean and variance of the random variable X (4.2; 1.26)
4. A quality control manager for a manufacturer has instituted “acceptance sampling” in order to monitor the quality of incoming parts that are bought in bulk. The policy is that all incoming parts are checked by selecting at random 10 parts and then determining whether each part contains any defects or not. If 2 or more parts are found to have defects then the entire order is rejected and is returned to the supplier. What is the probability that an order from a particular supplier is rejected if that supplier is known to have 5% of parts with defects? (0.0861)
5. The probabilities that three independent members of a committee will vote in favour of electing a PASS leader as president are 0.2, 0.3 and 0.5, respectively. The probability that at most one member of the committee will elect a PASS leader is? (0.75)
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
Taking our focus to continuous random variables, o P(X = x) = 0 since there are an infinite number of total values that can be obtained o P(X < k) = P(X ≤ k)
Probability Density Function
Consider a probability histogram for random variables o If we make the widths of the columns so small that they are approximately continuous, it
will form a smooth curve
The area under that curve becomes a part of our probability density function f(x) whose range is a ≤ x ≤ b:
1) f(x) ≥ 0 for all x between a and b 2) The total area under the curve between a and b is 1
2. Uniform Distribution
o Also known as a rectangular probability distribution (from its shape) o A distribution where all random variable values within the range a ≤ X ≤ b are all equally as likely to
occur
o Defined by the function:
o f(x) = 1 where a ≤ x ≤ b
b – a Taking the example from your lecture notes... Store deliveries 7-8am Let X = no. of minutes after 7:00am Some formulas for uniform distribution:
E(X) = (a + b)/2
Median = (a + b) /2
Var(X) = (b – a)2/12
P(x1≤X≤x2) = Area under graph = (x2 - x1) x 1/(b - a)
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010 Exercises:
1. The time before a baby cries is a uniformly distributed random variable between 0 and 30 minutes a) Find the probability distribution function (1 /30) b) Find the probability that a baby cries within 20 minutes (0.67) c) Find the probability that a baby does not cry within 10 minutes (0.67) d) Find the probability that a baby cries between 15 minutes and 20 minutes (0.17)
2. If the random variable X is uniformly distributed between 2 and 10: a) Calculate the formula of the probability distribution function. What type of line would this be if drawn? b) Calculate P (2 ≤ X ≤ 10), that is, find the probability that X will assume a value of between 2 and 10 (P (2 ≤ X ≤ 10) equals 1, as the area under the curve must be equal to 1, that is, all the possible outcomes fall within this range) c) Find the mean and variance of X (6; 5 ⅓) d) Calculate P (2 ≤ X ≤ 8) (0.75) e) Calculate P (X = 6) (0)
3. If X, a continuous random variable, is symmetric about μ, is P (X < μ - 2) equal to P (X > μ + 2)? (yes)
4. If X, a continuous random variable, is symmetric about X = 2, find P (X > 2) (0.5)
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
BES PASS Week 7 2010
1. THE NORMAL DISTRIBUTION a) Just for starters – a warm-up question In the space below, draw what a normal distribution looks like. Then, in a different colour, show what happens when:
a) the mean increases/decreases b) the standard deviation increases/decreases
b) Calculating normal probabilities To calculate the probability that a normal random variable falls into any interval, we need to compute the area in the interval under the curve. But since that’s too hard (we need calculus), we can use the probability tables, provided we standardise this random variable.
THE STANDARDISED NORMAL RANDOM VARIABLE
Z = ( X – μ ) / σ
Class Example
You make an investment of stocks with an average return of 10%. Find the probability that you will lose money: a) if the standard deviation of returns is 5% (0.0228) b) if the standard deviation of returns is 10% (0.1587) Clue! Use the tables.
Topics to be covered: 1. The Normal Distribution & Finding Probabilities 2. The Normal Approximation to the Binomial 3. Concepts of Estimation
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
c) Finding values of Z Just then, we focused on working out Z, then using that to work out the probability of something. However, often questions can ask us to ‘reverse engineer’ the process, by giving us a probability first, then working out what Z is. This is the complete opposite of the previous process.
FINDING VALUES OF Z, GIVEN A PROBABILITY
ZA = The value of Z such that the area to its right under the standard normal
curve is A. ie. ZA = The value of a standard normal random variable such that
P ( Z > ZA ) = A
Question
a) Find Z0.25 (1.96) b) Find Z0.05 (1.645)
d) ZA and percentiles
ZA & PERCENTILES
ZA = 100 ( 1 – A ) th percentile of a standard random variable. eg. Using question (b) from above, Z0.05 = 1.645 = the 95th percentile.
Let’s do some questions
1. The amount of time students spend each week on Facebook (FB) is a normally distributed random variable with a mean of 7.5 hours and a standard deviation of 2.1 hours. a) What proportion of students go on FB for more than 10 hours per week? (.1170) b) Find the probability that a student spends between 7 and 9 hours on FB. (.3559) c) What proportion of students spend less than 3 hours on FB? (.0162) d) What is the amount of time below which only 5% of students spend on FB? (4.05 hours)
2. An analysis of the amount of interest paid monthly by Visa cardholders reveals that the amount is normally distributed with a mean of $27 and a standard deviation of $7. a) What proportion of the cardholders pay more than $30 in interest? (.3336) b) What proportion of the cardholders pay more than $40 in interest? (.0314) c) What proportion of the cardholders pay less than $15 in interest? (.0436) d) What interest payment is exceeded by only 20% of the cardholders? ($32.88)
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
2. THE NORMAL APPROXIMATION TO THE BINOMINAL a) Why we approximate the binomial by the normal Discrete distributions such as the binomial distribution are not that easy to draw inferences from. But inferences is the reason why we need sampling distributions. Because of this, we approximate the binomial distribution by a normal distribution, by drawing a bell-shaped curve to smooth out the ends of the rectangles in the histogram. b) Nuts and bolts of normal approximation to the binomial You should recall from last week that, for the binomial distribution:
BINOMIAL FORMULAS
mean: μ = n.p standard deviation: σ = √ n p ( 1 – p )
Note that, however, we can’t directly apply the normal to the binomial. We actually need a continuity correction factor of 0.5 to adjust for the approximation. In particular:
USING THE CONTINUITY CORRECTION FACTOR
Let Y be the normal random variable approximating the binomial random variable X. P ( X = x ) ≈ P ( x – 0.5 < Y < x + 0.5 ) P ( X ≤ x ) ≈ P ( Y < x + 0.5 ) P ( X ≥ x ) ≈ P (Y > x – 0.5 )
c) Some questions to try
3. Juliet and Sheldon are stars of the next Batman movie, the Dark PASS Leader. Juliet is Batwoman and Sheldon is Joker. We are in that moment close to the end of the film where Sheldon as Joker flips the very special coin with probability of heads as 0.95. Juliet doesn’t believe such a coin exists, so Sheldon, being Joker, messes around and flips it 100 times. What is the probability that: a) Juliet sees 100 heads flipped? (0.0133) b) Juliet sees at least 90 heads flipped? (0.9941) c) Juliet sees no more than 98 heads flipped? (0.9463) d) Juliet saves Gotham city from Sheldon’s destruction? (No chance at all, ever...)
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
4. Anyway...so...moving on...Sheldon and Juliet have now become avatars. Sheldon is Corporal Jake Sully from Earth and Juliet is Neytiri from Na’vi. They are fighting against the evil humans who want to destroy their magic tree, from where they generate energy to teach their PASS classes. Sheldon is actually from the human team but betrays them and links up with Juliet to fight against the evil humans. Over the past few centuries, they fight 50 wars, with the probability of the Na’vi winning being a massive 10%. The possible outcomes of wars are only winning and losing – there is no such thing as ‘drawing’ a war. Calculate the probability that: a) Na’vi wins the war 8 times? (0.0695) b) Na’vi wins the war at least than 3 times? (0.8810) c) Na’vi wins the war no more than 7 times? (0.8810) d) Sheldon as Corporal Jake Sully will officially turn into an avatar (110%)...I will teach you in PASS class next week in the other world.
3. CONCEPTS OF ESTIMATION a) Some chilled questions to consider with the person next to you... What is the purpose of estimation? What is the difference between a point estimator and an interval estimator? b) The 3 desirable qualities of estimators 1. Unbiased-ness 2. Consistency 3. Relative efficiency
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
BES PASS Week 8
Aims:
Learn about the sampling distribution of the mean
Learn the Central Limit Theorem (CLT)
Learn about Confidence Interval o Estimation of Population mean when Population variance is known
Selecting Sample size
Q. Recall the difference between: a) population vs sample b) parameter vs statistic
1. Sampling Distribution of the mean is the distribution of all possible values that can be assumed by that statistic, computed of samples of the
same size drawn from the same population i.e. allows us to estimate the population parameter using a sample statistic
o The population of a random variable will have certain parameters E.g. Mean μ and Variance σ2
o For a particular sample of size n, the sample statistic is unlikely to be the same as its population parameter
Known as sampling error: The cost of sampling which can be reduced by taking larger samples. (NB: Standard Deviation of Sampling distribution of the mean = Sampling Error)
o Different samples (of size n) will have different sample statistics i.e. Sample mean/variance will vary for each sample
o Taking repeated samples of size n, the distribution of this statistic can be computed
Properties of the Sampling Distribution of the Sample Mean 1) μx̅ = μ 2) σ2
x ̅ = σ2 and σx ̅ = σ n √n
3) If X is normal, X ̅ is normal If X is non-normal, X ̅ is approximately normal for sufficiently large sample sizes
2. Central Limit Theorem
Definition: The sampling distribution of the mean of a random sample drawn from any population is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of X̅ will resemble a normal distributi on.
Mean Variance Distribution
Population parameter μ σ
Samples (any size) from normal pop μ σ2/n Normal
Samples (n≥30) from non-normal population μ σ2/n Approx normal re: CLT
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010 Group Excercise 1
A sample of 100 observations is drawn from a normal population, with 1000 and 200 .
Find 1050 960 1100
Group Excercise 2 In a certain PASS community, 60% of all leaders are in favour of electing Sheldon as the ‘genius’. A random sample of 200 leaders is taken. What is the probability that 100 or less of these leaders favour the election of Sheldon as the one and only ‘genius’? (0.0025)
Group Excercise 3
Cadbury Yowie chocolates are known to have a mean weight of 27g and a variance of 6.25g
squared. If a random sample of 60 Yowie’s is examined, find the probability that its average is:
a) Below 26g (0.1075)
b) Between 27.50g and 28g (0.1601)
Group Excercise 4
A basketball coach is seeking tall recruits who are smart enough to be eligible for college. The
recruit must be at least 74 inches tall and have an IQ of 115 or above. Height and IQ are
independent of one another. IQ is normally distributed with mean 100 and standard deviation 12,
and height is normally distributed with mean 70 and standard deviation 2 inches. What percentage
of the population satisfies the coach’s requirements? (0.24%)
Additional Questions for you to do...
1. The amount of time lawyers devote to their jobs per week is normally distributed with a mean of 52 hours and standard deviation of 6 hours.
a.) Find the probability that the mean amount of work per week for three randomly selected professors is more than 60 hours. (0.0104)
b.) If the strict boss finds out that the average time worked by his 7 employees is less than 48hours, he will fire them all. What is the probability they will be fired? (0.29454)
2. The time it takes for a statistics professor to mark his mid-session test is normally distributed with a mean of 4.8 mins and a standard deviation of 1.3 mins. If there are 60 students in the class, what is the probability that he needs more than 5 hours to mark all the mid-session tests? (0.1170)
3. Pierre’s goose farm claims that its jars of foie gras have a weight of 250g and a standard deviation of 6g. After buying 36 jars, before eating them on petits blinis with some fig jam, salt and pepper, you weighed them and found them to have a mean of 245g. What general statement can we make about the Pierre’s claim? (Pierre is a lying Frenchman)
4. The mean of a population is 18.75, and the standard deviation is 7.8:
a) If a sample of 50 values is taken, what is the probability that the sample mean is greater than 20?
(0.1292)
b) If a sample of 100 is taken, what is the probability that the sample mean is greater than 20? (0.0548)
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
3. Confidence Interval
Recall: o Point Estimators – produce a single estimate of the parameter of interest o Interval Estimators – produce a range of values and attach a degree of confidence with that interval
Confidence Interval: is a interval estimator defined by the confidence level (1-α) This implies that we start with the confidence level we want and then work out the width of the interval Deriving it step by step...
1. Recall: Standard Normal: Z = X ̅ – μ σ/√n
2. Employ the definition of a confidence interval
Symmetrical Interval: P(-Zα/2 < X ̅– μ < Zα/2) = 1-α σ/√n
3. Rearranging, P(X ̅ - Zα/2 σ < μ < X ̅+ Zα/2 σ) √n √n Lower Confidence Level Upper Confidence Level (LCL) (UCL)
Therefore the confidence interval estimator (confidence level 1- α) is:
How do we interpret this interval?
Confidence Level
(1 - ) /2 Z/2
90% 0.1 0.05 1.645
95% 0.05 0.025 1.960
98% 0.02 0.01 2.326
99% 0.01 0.005 2.576
4. Selecting sample size Previously, we determined our level of confidence first and then the width of the interval. However, sometimes, the width of the confidence interval may be determined before sampling. To calculate the sample size based on a desired sampling error, we use the formula: (Solve for n from the above confidence interval formula)
where B = the sampling error, and is equal to . *Note that this is equal to half the width of the confidence interval
Consider what happens to the width of a CI when: o Standard deviation changes o Confidence level changes o Sample size changes o Sample mean changes
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
Group Excercise 5 If we know that σ = 40, and we obtain a sample mean of 136, construct a 95% confidence interval for the population mean using a sample size of:
a) 20 (-118.47 ≤ μ ≤ 153.53 ) b) 160 ( 129.80 ≤ μ ≤ 142.20 )
Group Excercise 6 If we know that σ = 40, and we obtain a sample mean of 136 using 25 values, construct a confidence interval for the population mean having:
a) A confidence level of 99% (115.392 ≤ μ ≤ 156.608) b) A confidence level of 50% (130.6 ≤ μ ≤ 141.40)
Group Excercise 7 John wants to estimate the average time it takes for customers to have lunch at his new cafe. He knows from past experience that the standard deviation will be 18. John wants to use a confidence interval of 90% and have a sampling error no greater than 3 minutes. How many customers does he need to time? (98)
Additional Questions for you to do... 1. The average mark for an exam was between 65% and 75% using a confidence coefficient of 90%. Indicate
whether the following statements are true or false: a) 90% of students scored a mark between 65% and 75% b) If random samples were taken, then 90% of the samples would have a mean between 65% and 75% c) The probability that the average population mark will be contained in this interval is 90% d) The probability that the average population mark will fall between this interval is 90%
2. In an article about disinflation, various investments were examined.
The investments included stocks, bonds, and real estate. Suppose that a random sample of 200 rates of return on real estate investments was computed and recorded. The sample mean was calculated to be 12.10% return. Assuming that the standard deviation of all rates of return on real estate investments is 2.1%, estimate the mean rate of return on all real estate investments with 90% confidence. Interpret the estimate. (11.86% : 12.34%)
3. An economist wants to estimate the mean annual income of households in a particular district. It is assumed that the population standard deviation is $4000. The economist wants to estimate the sample mean to
within D = $500 of the true mean with 95% level of confidence. Calculate the sample size required.
4. Starting annual salaries for university graduates with business degrees are believed to have a standard deviation of approximately $1800. A 95% confidence interval estimate of the mean annual starting salary is desired. How large a sample should be taken if we want to be 95% confident that the maximum sampling error is:
a. $500
b. $200
55.. A medical researcher wants to investigate the amount of time it takes for patients’ headache pain to be relieved after taking a new prescription painkiller. She plans to use statistical methods to estimate the mean of the population of relief times. She believes that the population is normally distributed with a standard deviation of 20 minutes. How large a sample should she take to achieve 90% confidence to within 1 minute?
John wants to estimate the average time it takes for customers to have lunch at his new café. He knows
from past experience that the standard deviation will be 18. John wants to use a confidence interval of
90% and have a sampling error no greater than 3 minutes. How many customers does he need to time?
(98)
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
BES PASS Week 9 2010
1. HYPOTHESIS TESTING a) Setting up a model to help us think about hypothesis testing. What exactly is a hypothesis test? We can try and simplify the issue by using the most popular model of thinking – a criminal trial. Imagine Sheldon has been arrested for drink driving...although this would never happen. Suppose that we set up 2 hypotheses to test:
o H0: Sheldon is innocent (the null hypothesis) o H1: Sheldon is guilty (the alternative hypothesis)
The testing procedure begins with the assumption that the null hypothesis is true. o ie. Assume Sheldon is innocent – assume he is not a drink driver.
What is the goal of hypothesis testing? It is to determine whether there is enough
evidence to infer that the alternative hypothesis is true. In statistics, the speak of the result of the hypothesis test in either 1 of 2 ways:
a) ‘rejecting the null hypothesis in favour of the alternative’ b) or ‘not rejecting the null hypothesis in favour of the alternative’
Notice that we don’t say that we ‘accept’ the null hypothesis...why? b) Errors induced when running hypothesis tests There are 2 possible errors:
o A Type I error occurs when we reject a true null hypothesis. ie. Sheldon is innocent (Juliet was the actual drink driver), but he is still
wrongly convicted. P ( Type I error ) = α
o A Type II error occurs when we do not reject a false null hypothesis. ie. Sheldon is actually guilty of drink driving, but he is acquitted. P ( Type II error ) = β
c) Group discussion exercise In the following 2 scenarios, identify what the null and alternative hypotheses would be: 1. You are considering whether you should apply to be a PASS leader in 2011. If you
succeed, a life of fame, fortune and happiness awaits you. If you fail, no one will like you. Should you apply?
2. You are faced with 2 investments. One is very risky, but the potential returns are high.
The other is safe, but the potential is quite limited. Which one should you choose?
Topics to be covered: 1. Hypothesis Testing – Type I & Type II errors 2. Test about the mean when the population standard deviation is known 3. p-values
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
2. TESTING THE POPULATION MEAN WHEN THE POPULATION STANDARD DEVIATION IS KNOWN. Let’s go through an example as a class (adapted from Keller) to illustrate the required process. a) Factual scenario Sheldon and Juliet run an Asian mini-goods store where they sell, amongst other things,
Hello Kitty mobile phone chains and Totoro pillows. Juliet wants to introduce a new profit strategy - selling Easyway drinks too.
They determine the new profit strategy will be cost-effective only if the mean monthly account is more than $170.
A random sample of 400 accounts is drawn, for which the sample mean is $178. Juliet knows the accounts are approximately normally distributed with a standard
deviation of $65. Can we conclude the new strategy will be cost-effective, if we run the test at 95%
confidence? b) Setting up the model What is the null and alternative hypothesis?
H0: H1:
NB: There are 2 methods in which we can proceed with this problem, using either:
a) the rejection region method b) the p-value approach
We will consider both methods. c) The rejection region method The rejection region is a range of values such that if the test statistic falls into that range, we decide to reject the null hypothesis in favour of the alternative hypothesis. Show how we can use the rejection region method to solve this problem in the space below. Draw diagrams where appropriate.
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
d) The p-value approach Why do we have 2 methods to do the same problem? Well...there are actually some
drawbacks to using the rejection region method. Can you think of some? The p-value of a test is the probability of observing a test statistic at least as extreme as
the one computed given that the null hypothesis is true. So in our example here, what would the p-value be? e) Interpreting the p-value Interpreting the p-value we just calculated, this means that the probability of observing
a sample mean at least as large as 178 from a population whose mean is 170 is _______, which is very small. o In other words, we have just observed an unlikely event, an event so unlikely that
we seriously doubt the assumption that began the process (that the null hypothesis is true).
o Consequently, we have reason to reject the null hypothesis and support the alternative.
However, one thing must be clear to you – the p-value is not the probability that the null hypothesis is true. You cannot make a probability statement about a parameter; it is not a random variable. This is a similar case to what we did last week with confidence interval estimators.
You may notice that the further the sample mean is from the hypothesized mean, the smaller the p-value is.
We can develop this further to use the idea of significance to describe the p-value. Let’s fill out the table below together:
DESCRIBING THE P-VALUE
Range of p-value Amount of evidence to infer that the alternative hypothesis is true
Term for level of significance
< 0.01 Overwhelming
0.01 – 0.05 Strong
0.05 – 0.10 Weak
> 0.10 None
f) The p-value and rejection region methods Note that another way to make the rejection / non-rejection decision is to compare the
p-value with the selected value of the significance level.
COMPARING P-VALUES WITH THE SIGNIFICANCE LEVEL
o If p-value < α → p-value is small enough to reject the null hypothesis. o If p-value > α → we do not reject the null hypothesis.
As shared on...
Sheldon and Juliet Wednesday 2-3pm Red Centre M010
g) A note – be careful when you draw conclusions from hypothesis tests
HOW TO DRAW CONCLUSIONS FROM HYPOTHESIS TESTS
o If we reject the null hypothesis, we conclude that there is enough statistical
evidence to infer that the alternative hypothesis is true. o If we do not reject the null hypothesis, we conclude that there is not
enough statistical evidence to infer that the alternative hypothesis is true.
NB: You cannot prove that either the null or hypothesis is true.
h) One and Two Tail tests
ONE VS TWO TAIL TESTS
o One Tail tests are of the form:
H0: μ = μ0 H1: μ > μ0 or H1: μ < μ0
o Two Tail tests are of the form: H0: μ = μ0
H1: μ ≠ μ0
THE ADJUSTMENT REQUIRED FOR RUNNING TWO-TAIL TESTS
1. Change the rejection region into a two-tail rejection region of the form
z < - zα/2 or z > zα/2 2. See if the Z-score you calculate falls into either of these 2 rejection regions. 3. In terms of the p-value, you need to determine the p-value in both tails.
3. PRACTICE QUESTIONS 1. Juliet gets really annoyed when her BES students take too long to finish her class tests.
She’s shot a few of them in the leg before. To investigate further, she randomly samples 10 students and measures the amount of time they spend doing a BES test. The results are listed below. Assuming that the times are normally distributed with a standard deviation of 2 minutes, test to determine whether the owner can infer at the 5% significance level that the mean amount of time spent on the tests is greater than 6 minutes. Data: 8 11 5 6 7 8 6 4 8 3. (Answer: z = .95, p-value = .1711, no.)
2. Sheldon owns a telecommunications company called ‘Shel-Tel’ which specializes in providing cheap phone call rates back to Hong Kong. Suppose mean and standard deviation of monthly long-distance bills of customers are $17.09 and $3.87 respectively. Sheldon takes a random sample of 100 customers and recalculates their monthly bill using rates quoted by a leading competitor, as $17.55. Can we conclude at the 5% significance level that there is a difference between the average Shel-Tel bill and that of the competitor? (Answer: z = 1.19, p-value = .2340, no.)
As shared on...
Juliet and Sheldon Wednesday 2-3pm Red Centre M010
0 :H
Rejection Region
:Actual
Non-Rejection Region
But SHOULD reject!
1
x zn
Correctly Rejected
Rejection Region
When H0 is False
Hypothesized Mean
Distribution
Actual Mean
Distribution
BES PASS WEEK 10 Aims:
Learning to calculate the probability of Type II errors and the Power of the Test
Hypothesis Testing when population variance is unknown
Sampling distribution of sampling proportion
1. Some clarification...
Sample Mean: 𝑋 ~ 𝑁(𝜇𝑋 ,𝜎𝑋 2 )– subscripts to show this is different to the population
Hypothesis testing: testing where our 𝑋 lies in relation to our hypothesised μ0 o Methods: Critical values for 𝑋 (not a confidence interval) and critical values using z-scores o State H0 with a strong equality sign (=) and your conclusion with the level of significance. o Value of α is called our significance level
2. Type I and Type II errors
ERRORS Given a true H0 Given a false H0
Reject H0 Type I error Correct Decision Do not reject H0 Correct Decision Type II error
Type I error occurs when we reject a true null hypothesis The significance level α is usually given as 0.01, 0.05 or 0.10 P (Type I error) = α P (Reject H0 | H0 is true) = α
Type II error occurs when we do not reject a false null hypothesis P ( Type II error) = β P (Do not reject H0 | H0 is false) = β
NB. There is a trade-off between the two types of errors. Changing our significance level α will produce resultant changes in β.
Power of the test The power of the test is the probability of correctly rejecting a false null hypothesis. Power = 1 - β NB. 1 – β ≠ α!!!! Steps:
1. Draw the distribution of μ0 under the null hypothesis, H0
2. Find the critical value, c and rejection region for
α level of significance P(Type I error) 3. Draw a new distribution for the true population
mean, μ1 in relation to H1 P(Type II error)
For an upper tailed test: P(Type II error) = 𝐏(𝐗 <𝑐|𝜇=𝛍1) - see diagram
For a lower tailed test: P(Type II error) = 𝐏( 𝐗 >𝑐|𝜇=𝛍1)
As shared on...
Juliet and Sheldon Wednesday 2-3pm Red Centre M010
Z-Normal
t-dist
Problems for you to do... 1) N.S.W. Police are testing if vehicles are exceeding the speed limit of 90km/hr on South Dowling Road. A sample of 81 vehicles yields a mean driving speed of 98km/hr. If the population of vehicle speeds is normally distributed with a standard deviation of 25 km/hour, test the hypothesis, at the 5% level of significance, H0: μ= 90; H1: μ> 90. If H0 is rejected, calculate β, the probability of Type II error, given that the true μ= 100. (β = 0.0253; power of test = 0.9747; we reject H0) 2) Miss Rose was researching dress sizes. She had thought the mean dress size was 9. But her suspicion is that it will be larger than that. Thus, being the relative unknown and incredible mathematician she was, Miss Rose decided to do a hypothesis test. She found the population to be normally distributed, with standard deviation of 4. If α = 0.05 and the sample size was 64, calculate the power of the test if the mean was actually a. 9.5 (0.1844) b. 10 (0.6387) 3) What will be the answer for (a) and (b) in the above example if Miss Rose only suspected that the mean size was not 9? (0.17, 0.516)
3. T-distribution
So far, the problems we have dealt with assume that the population variance 2 is known
This is unrealistic, we’re more likely to know the sample variance
Note that s2 is an unbiased and consistent estimator of 2
For large sample sizes: (n > 30)
By the CLT, 𝑋 is approximately normal regardless of the population distribution
Standardised test statistic remains
approximately normal even when replacing with s
Use Z-scores and normal distribution table
For small sample sizes: (n < 30) Must use the t-distribution
Similar to the normal distribution but with “fatter” tails
Our variance is determined by our ‘degrees of freedom’, v
T-distribution is only valid if the underlying distribution is normal
We have a new t-statistic:
𝑡𝛼 ,𝑣 = 𝑥 − 𝜇𝑠 𝑛
Where P( t > tα,v) = α and v = n-1 (degress of freedom)
Confidence Intervals when 2 is unknown
The procedure is still the same….The only difference is that we replace our z-score with a t-score, and with an s!!!
𝑥 ∓ 𝑡𝛼 2
𝑠
𝑛 , 𝑣 = 𝑛 − 1
Where α = significance level and 1 – α is the confidence level
As shared on...
Juliet and Sheldon Wednesday 2-3pm Red Centre M010 Hypothesis Testing when σ2 is unknown
Procedure also remains the same except we replace our z-score with a t-score and σ with s
Using the unstandardised method, our critical values for X are: Assuming H0: μ = μ0
If H1: μ > μ0 then c = μ0 + tα(s/√n) If X > c, reject H0, otherwise we do not reject
If H1: μ<μ0 then c = μ0 – tα(s/√n) If X < c, reject H0, otherwise we do not reject
If H1: μ≠μ0 then c = μ0 ± tα/2(s/√n) If X < c or X > c, reject H0, otherwise we do not reject
Or using the standardised method, our critical values for t are: tα, v (one-tailed) or tα/2,v (two-tailed test) where α = level of significance ν = n-1 (degrees of freedom) Assuming H0: μ = μ0
o If H1: μ > μ0 reject H0 if t > tα,v
o If H1: μ < μ0 reject H0 if t < -tα,v
o If H1: μ ≠ μ0 reject if: t > tα/2,v or t < -tα/2,v Problems for you to do… 4) Sheldon owns a farm and needs to know the number of strawberries that can be picked on weekday mornings. On
a sample of 8 Monday mornings, the number of strawberries picked between 7am and 9am are counted. Assume
the population is normal. Construct a 99% confidence interval for the population mean if the sample mean is 1500
and the sample standard deviation is 300. (1128.88, 1871.12)
5) Petit Restaurant claims to sell at least 60 cakes per day. Assume Petit’s sales are approximately normally distributed. To test Petit’s claims, 16 days are selected at random and tested. The sample yields a mean of 56 and a sample standard deviation of 5.25. Perform the test of Petit’s claims against a suitable alternative, assuming α = 0.05. (Reject null) 6) A car rental company is interested in the amount of time its vehicles are out of operation for repair work. A random sample of 12 cars showed that, over the past year, the numbers of days each had been inoperative were as follows: 15, 11, 19, 24, 6, 18, 20, 15, 18, 12, 14, 19. Given that the population is normally distributed, find with 99% confidence and interval which the actual mean may be within. (11.618, 20.216) 7) In a study to determine the capability of the BES PASS leaders, Judith, the PASS coordinator has to measure the mean exam marks of every student that attends his class. She takes 9 random students and their exam marks. The sample mean and standard deviation were 80 percent and 4 marks respectively. Assuming that the marks are normally distributed, calculate a 95% confidence interval for the true exam mark.
As shared on...
Juliet and Sheldon Wednesday 2-3pm Red Centre M010
4. Sampling Distribution of a Sample Proportion Recall the binomial distribution where X is the number of successes for a fixed number of trials
n = no. of trials
p = probability of successes q = (1-p) = probability of failure
E(X) = np and Var(X) = npq
Similar to the distribution of a sample mean, if we take many samples of size n and calculate the sample
proportion of success, 𝒑 (𝑿
𝒏) for each of them, you will find..
E(𝑝 )= p
Var (𝑝 )= pq/n
What is the distribution of 𝒑 ? By the CLT, for large sampel sizes X is approximately normal, therefore the sample proportion 𝑝 is also approximately normal.
𝑝 ~ 𝑁 𝑝,𝑝𝑞
𝑛 and therefore our z-score is
Our confidence level for the Population Proportion (p) is
Hypothesis Testing for the Population Proportion (p)
Assuming H0: p = p0 our critical values are:
o If H1:p < p0 then p* = p0 – Zα√(p0q0/n)
o If H1:p > p0 then p* = p0 + Zα√(p0q0/n)
o If H1:p ≠ p0 then p*= p0 ± Zα/2√(p0q0/n)
Or using the standardised test-statistic:
o If H1:p < p0 , reject H0 if Z < -Zα
o If H1:p > p0 , reject H0 if Z > Zα
o If H1:p > p0 , reject H0 if Z < -Zα/2 or Z > Zα/2
Problems for you to do..again….last one!
8) The proportion of families buying milk from Company A in a certain city is p = 0.6. A random sample of 10 families shows that 4 buy milk from Company A. a) Conduct a hypothesis test with a null H0: p = 0.6 against the alternative H1: p < 0.6. Find the critical values using both unstandardised and standardised methods at the 5% significance level. (Do not reject null)
b) Construct a 95% confidence interval for p. Does this interval include 0.6? [0.096, 0.799] If we reject the null when 3 or fewer families buy milk from Company A: c) Find the probability of committing a Type I error. (0.055)
d) If the true proportion of families buying milk from Company A is p = 0.5, what is the probability of committing a Type II error based on the above decision rule? (0.828)
As shared on...
Sheldon & Juliet Wednesday 10-11am OMB229
BES PASS Week 11 2010
1. SIMPLE LINEAR REGRESSION – AN INTRODUCTION Introducing regression
Regression analysis is used to predict the value of one variable on the basis of other variables.
The technique involves developing a mathematical equation/model that describes the relationship between the dependent variable (Y) and the independent variables (X1, X2, X3, …, Xk, where k = the number of independent variables.
Regardless of why regression analysis is performed, we begin by developing this mathematical equation/model that describes the relationship between the dependent variable and independent variables.
The Simple Linear Regression Model (aka. the First-Order Linear Model)
THE SIMPLE LINEAR REGRESSION MODEL
y = β0 + β1x + ε
In order to investigate the relationship between x and y, we need to calculate the value of the coefficients β0 and β1 using the least squares method, with whom you had a friendly encounter in Week 2.
Least Squares Method
Why is it called the ‘least squares method’? Recall that when we draw a line through a set of sample data, we aim for the best line – the line of best fit. In particular, this line is the one which is closest to the sample data points; the line that minimizes the sum of the squared differences between the points and the line.
LEAST SQUARES LINE COEFFICIENTS
b1 = sxy
sx2
b0 = y − b1x
Class Example
The annual bonuses (millions) of 6 football players from Chelsea FC [the 2010 Premier League (clearly dominating Man Utd) AND FA Cup Champions] with different years of experience are recorded as follows. The manager, Carlo Ancelotti, has hired you as his private statistician to determine the relationship between annual bonus and years of experience.
Years of experience (x) 1 2 3 4 5 6
Annual Bonus (y) 6 1 9 5 17 12
Topics to be covered: 1. Simple linear regression 2. Assumptions of the regression model 3. Methods of assessing and analysing the model
As shared on...
Sheldon & Juliet Wednesday 10-11am OMB229
Frank Lampard has already performed some initial calculations for you:
SOME HELPFUL DATA FOR THIS QUESTION
xi = 21ni=1
yi = 50ni=1
xiyi = 212ni=1
xi2 = 91n
i=1
WHAT YOU NEED TO CALCULATE
sxy =
sx2 =
b1 =
x =
y =
b0 =
Finally, the least squares simple regression line is:
y =
HOW TO INTERPRET THE SIMPLE REGRESSION LINE
Advise Carlo on the relationship between bonuses and years of experience at Chelsea FC.
2. ASSUMPTIONS OF THE REGRESSION MODEL
THE 7 ASSUMPTIONS OF THE REGRESSION MODEL
1. 2. 3. 4. 5. 6. 7.
As shared on...
Sheldon & Juliet Wednesday 10-11am OMB229
3. METHODS OF ASSESSING AND ANALYSING THE MODEL Introduction
Having established the required conditions for our assessment methods to be valid in the previous section, we can now look at the methods to assess our regression model.
However, we need to look at the concept of the sum of squares for error, which forms the foundation for all these methods.
Sum of squares for error
Recall that the least squares method determines the coefficients that minimize the sum of squared
deviations between the points and the line defined by the coefficients – aka. the sum of squares for
error (SSE).
SHORT-CUT FORMULA FOR SSE
SSE = (yi − y i)2 = n
i = 1
Method 1: Standard Error of the Estimate (SEE)
STANDARD ERROR OF ESTIMATE
sε = SSE
n − 2
QUESTION
1. Calculate the standard error of estimate for Chelsea FC. (1.596) 2. Interpret what it tells you about the model’s fit.
Method 2: The t-test of the slope (a hypothesis test)
In this method of assessing the regression model, we look in particular at the slope of the simple regression line and run a hypothesis test on it. Steps are below.
Step 1: Set up hypothesis test
Ho: β1 = 0 (ALWAYS)
H1: β1 ≠, >, < 0
Step 2: Find rejection region
If our test statistic falls within the rejection region, we can conclude that the variables are linearly related.
If β1 > 0, then the variables are positively related.
If β1 < 0, they are inversely related.
Since β1 is our X coefficient, this means that a one unit change in X will cause a β1 change in Y.
As shared on...
Sheldon & Juliet Wednesday 10-11am OMB229
QUESTION
1. Perform a hypothesis t-test of the slope for Chelsea FC at 5% significance. (t-stat = 5.5413, reject null). 2. Interpret what it tells you about the model’s fit.
Method 3: Coefficient of Determination
The coefficient of determination, R2, allows us to determine the strength of a linear relationship.
R2 = the amount of variation in the dependent variable that is explained by variation in the independent variable.
To fully understand this, we will need to break down the total variation in y, as follows:
COEFFICIENT OF DETERMINATION
R2 =sxy
2
sx2sy
2 = 1 − SSE
(y i− y )2=
(y i− y )2− SSE
(y i− y )2=
EXPLAINED VARIATION
VARIATION IN Y
QUESTION
1. Calculate the coefficient of correlation for Chelsea FC. (0.491) 2. Interpret what this tells you about the regression model.
Critical Values & Decision Rule
H1: β1 > 0 t > t α, n-2
H1: β1 < 0 t < - tα, n-2
H1: β1 ≠ 0 |t|< tα/2, n-2
t = b1 − β1
sb1
Step 3: Calculate test statistic
Step 4: Conclusion
If we don’t reject Ho we can conclude y is not linearly related to x
sb1=
sε
n − 1 sx2
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
BES PASS Week 12
Aim:
Learn about the prediction in linear regression
Learn about the multiple regression model
Prediction in linear regression We can use our model to forecast or estimate values of Y (dependent variable)
Point Prediction
o Use the fitted regression line to predict a value of Y for a given level of X
i.e. ŷ = b0 + b1x
o NB. This prediction is less accurate if the value of X falls outside the range of OLS
o This point estimate does not provide any information on how close our predicted value is
from our true value.
Thus..we use,
Interval Prediction
Formula
Prediction Interval
This prediction interval is used to predict a one-time occurrence for a particular value of the dependent variable
Confidence Interval Estimator of the Expected Value of Y
This is the confidence interval used to predict the mean of y or the long-run average of y.
Why is there a missing “1” under the square root for the confidence interval estimator? Ans. There is less error in estimating a mean value as opposed to predicting an individual value
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010 Class Excercise In television’s early years, most commercials were 60seconds long. Now, however, commercials can be any length.
The objective of commercials remains the same-to have as many viewers as possible remember the product in a
favorable way. A total of 60 participants were shown advertisements of varying length and each was given a test
score based on what they would remember. Using the data set (Keller, 16.06)
a) Determine the least squares line of test scores on the length of the advertisement.
b) Interpret the coefficients and their significance. Comment on the overall fit of the model.
c) Predict with 95% confidence the memory test score of a viewer who watches a 36 second commercial.
d) Estimate with 95% confidence the mean memory test score of people who watch 36 second commercials.
Multiple Regression Recall the assumptions of a classical linear regression model
Problem? Only measured the effect of ONE variable on the model
All the other factors were omitted and included in the error term (ε) This can cause confoundment and omitted variable bias. Bias occurs when:
Omitted variable is correlated with explanatory or other independent variable
Omitted variable is a determinant with the explanatory variable Violates the assumption of the zero conditional mean and therefore, OLS estimates are no longer unbiased. Our new population regression model is:
Interpretation of β1
Measures the effect of a change in X1 holding X2, X3, ... , Xk constant
Also known as the partial effect of X1 holding all other explanatory variables constant What happens if the variables X2, X3, ... , Xk are omitted and these variables are correlated with X1?
Omitted variables will appear in the disturbance/error term
ZCM assumption will be violated (error term now correlated with independent variable)
Produces a biased estimator of β1 (will also include the effect of other variables on Y)
Also,
𝑋 = 38 𝑆𝑋
2 = 193.90
𝑦 = 13.80
𝑆𝑦2 = 47.96
𝑠𝑥𝑦 = 57.86
n = 60
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010
Essentially, the process of multiple regression remains the same as linear regression
Minimising SSE
(𝑦𝑖− 𝑦𝑖 𝑛𝑖=1 )2 gives us 𝑦𝑖 = 𝑏0 + 𝑏1𝑥𝑖 + ⋯+ 𝑏𝑘𝑥𝑘𝑖
Additional Assumption (to the 7 you already have) No perfect (multi)collinearity or exact linear relationships between the explanatory variables This is particularly important for dummy variables
Assessing the model
Standard Error of Estimate
𝑠∈ = 𝑆𝑆𝐸
𝑛−𝑘−1 where k-1 is the number of explanatory variables
Hypothesis Testing
where v = n-k-1
Coefficient of Determination – Adjusted R2
Excel prints off an additional R2 statistic which is called the coefficient of determination adjusted for
degrees of freedom.
This is because, adding an extra explanatory variable will never imply a fall in R2
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 1 −
𝑆𝑆𝐸(𝑛 − 𝑘 − 1)
(𝑦𝑖 − 𝑦𝑖 )2
(𝑛 − 1)
o If n > k, unadjusted and adjusted R2 will be similar
o If SSE is statistically significant (i.e. quite different to 0), the values of unadjusted and
adjusted R2 will differ substantially
Class Exercise: (Adapted from QMB Final Exam S2 2007) The Human Rights and Equal Opportunity Commission has asked you for further analysis on gender discrimination in the law firms; (see Question 2), including an examination of the difference in income. In order to provide some evidence on whether differences in the incomes of male and female lawyers are due to discrimination or some other factors, a regression model is constructed based on human capital theory. The model is specified as follows: HRINCOMEi = β0 + β1EXPi + β2FIRMSIZE2i + β3FIRMSIZE3i + β4FIRMSIZE4i + β5PARTNERi + β6FEMALEi + Ui HRINCOME = Hourly income from legal practice in dollars (i.e. total income divided by hours worked) EXP = Experience measured as the number of years working as a lawyer FIRMSIZEI = Dummy variable equal to 1 if firm has between 1 - 10 lawyers inclusive, and 0 otherwise FIRMSIZE2 = Dummy variable equal to 1 if firm has between 11 - 50 lawyers inclusive, and 0 otherwise FIRMSIZE3 = Dummy variable equal to 1 if firm has between 51 - 200 lawyers inclusive, and 0 otherwise FIRMSIZE4 = Dummy variable equal to 1 if firm has equal to or greater than 201 lawyers, and 0 otherwise PARTNER = Dummy variable equal to 1 if lawyer is a partner, and 0 otherwise FEMALE = Dummy variable equal to 1 if lawyer is female, and 0 otherwise.
As shared on...
Sheldon and Juliet Wednesday 2-3pm RC M010 The regression was estimated by Ordinary Least Squares and a portion of the EXCEL output is reproduced below in Table 3:
a) The sample mean of HRINCOME for males is $59 and for females is $34. Why is this difference not necessarily evidence of gender discrimination? [2 marks] b) Use the regression output to conclude whether there is evidence of gender discrimination in hourly incomes. Justify your answer. [3 marks] c) Interpret the estimate for the EXP variable in terms of both economic and statistical significance. Is it consistent with your expectations? Discuss. [3 marks] d) Test the null hypothesis that β5 is equal to zero against the alternative that it is greater than zero. Use a 1% significance level. [1 mark] e) What are the "Standard Error" and "R Square" statistics reported amongst the "Regression Statistics" in the EXCEL output? Interpret the R Square result for this regression model. [3 marks] f) Calculate the predicted hourly income for a male lawyer, with 10 years experience who works in a firm with 20 lawyers but who is not a partner. [1mark]
Distributions thus far:
o Binomial Distribution (Week 6)
o Uniform Distribution (Week 6)
o Normal Distribution (Week 7)
o Distribution of the Sample Mean (Week 8)
o T-Distribution (Week 10)
o Distribution of the Sample Proportion (Week 10)
Next week....(Our last week!)
Chi-Squared Distribution
Revision on whatever we decided today...Confidence Interval, Hypothesis Testing?
As shared on...
Sheldon & Juliet Wednesday 2-3pm Red Centre M010
BES PASS Week 13 2010
1. CHI-SQUARED DISTRIBUTION
WARM-UP QUESTION
1. What does the chi-squared distribution look like? 2. What is the effect of increasing v (the degrees of freedom)?
IMPORTANT STATS FOR THE CHI-SQUARED DISTRIBUTION
Chi-squared random variable: χ2
Mean: E (χ2) = v
Variance: V (χ2) = 2v
HOW TO DETERMINE CHI-SQUARED VALUES
χ2 > 0 (always)
area to the right of χ2 = χA,V2
area to the left of χ2 = χ1−A,V2
use the table of values at the back of your yellow booklet
2. INFERENCES ABOUT A POPULATION VARIANCE There are 2 ways of drawing inferences about a population variance:
1) Confidence Interval Estimator of σ2 2) Run a hypothesis test on σ2
1) Confidence Interval Estimator of σ2
CONFIDENCE INTERVAL ESTIMATOR OF σ2
lower confidence limit (LCL)
= n−1 s2
χα 2 2
upper confidence limit (UCL)
= n−1 s2
χ1−α 2 2
Topics to be covered: 1. Chi-squared distribution 2. Inferences about population variance 3. Chi-squared Goodness-of-Fit Test 4. Chi-squared Test of a Contingency Table
As shared on...
Sheldon & Juliet Wednesday 2-3pm Red Centre M010
2) Running a hypothesis test on σ2
PRACTICE QUESTIONS
1. The sample variance of a random sample of 50 observations from a normal population was found to be s2 = 80. Can we infer at the 1% significance level that σ2 is less than 100? (No) 2. Estimate σ2 with 90% confidence given that n=15 and s2=12. (7.0932, 25.5684)
3. CHI-SQUARED GOODNESS OF FIT TEST The purpose of a Chi-squared goodness of fit test is to examine whether observed & expected frequencies are the same in a multinomial experiment. But first, let’s check out some stats for multinomial experiments.
PROPERTIES OF MULTINOMIAL EXPERIMENTS
Fixed number of trials (n)
Outcome of each trial falls into one of k categories (cells)
p1 + p2 + p3 + ... + pk = 1
Each trial is independent.
Step 1: Define Hypothesis Test
H0: σ2 = 1
H1: σ2 ≠, >, < 1
Step 2: Establish Rejection Region
If H1: σ2 > 1, RR: x2 > xα ,v2
If H1: σ2 < 1, RR: x2 < x1−α ,v2
If H1: σ2 ≠ 1, RR: x2 > xα/2,v2 or x2 < x1−α/2,v
2
Step 3: State Decision Rule If our test statistic falls within the rejection region, there is sufficient evidence to suggest that the population variance ≠ 1
χ2 =
n − 1 s2
σ2
Step 4: Calculate Test Statistic
Step 5: Conclusion Do we have enough evidence to reject H0, that the population variance = 1?
As shared on...
Sheldon & Juliet Wednesday 2-3pm Red Centre M010
FREQUENCY
Frequency = the number of outcomes falling into each of the k cells/categories.
It is notated by f1, f2, f3, ..., fk (where fi = the observed frequency of outcomes falling into cell i)
f1 + f2 + f3 + ... + fk = n
How to run a Chi-squared Goodness-of-Fit test – another flow-chart by Shel
PRACTICE QUESTION
3. We would like to make inferences about the market shares of Dell, HP, Apple, and the rest at the 5% significance level. In a random sample of 200 computers, we find that 48 are Dell, 42 are HP, 12 are Apple and 98 are the rest. Test the hypothesis that:
H0: p1=0.2, p2=0.2, p3=0.1, p4=0.5
H1: At least one pi is not equal to its specified value (Answer: Don’t reject H0 at 5% significance level)
Step 1: Check the Rule of Five For each cell, ei ≥ 5, where ei = npi
Step 2: Define hypothesis test
H0: p1 = ..., p2 = ..., p3 = ..., etc.
H1: at least one of the pi ≠ its specified value
Step 3: Critical Value, Rejection Region
Rejection region: x2 > x∝,k−12
Step 4: Decision Rule If our test statistic falls within the rejection region, there is sufficient evidence to suggest the observed frequency of a multinomial variable ≠its expected value
X2 = (fi − ei)
2
ei
k
i=1
Step 5: Calculate Test Statistic
Step 6: Conclusion Do we have enough evidence to reject H0 that at least one of the pi ≠ its specified value?
As shared on...
Sheldon & Juliet Wednesday 2-3pm Red Centre M010
4. CHI-SQUARED TEST OF A CONTINGENCY TABLE The purpose of running a Chi-squared test of a contingency table is to determine whether there’s enough evidence to infer that
a) 2 nominal variables are related
b) differences exist between 2 or more populations of nominal variables How to run a Chi-squared test of a Contingency Table – the final flow-chart Shel will ever make for you...
PRACTICE QUESTION
4. Test the hypothesis that income and education are independent at the 1% significance level.
Education/Income < $50k $50k - $100k > $100k TOTAL
Secondary 40 30 12 82
Tertiary 30 40 20 90
Doctorate 1 12 15 28
TOTAL 71 82 47 200 (Answer: reject H0 – ie. the variables are not independent.
Step 1: Define hypothesis test
H0: variables are independent
H1: variables are dependent.
Step 2: Rejection region, critical value Rejection region: x2 > x∝,v
2
where v = ( r – 1 ) ( c – 1 )
Step 3: Decision rule If our test statistic falls within the rejection region, there is sufficient evidence to suggest the variables are dependent. X2 =
(fi − ei)2
ei
k
i=1
eij = total of row i . (total of column j)
sample size
Step 4: Calculate test statistic
where
Step 6: Conclusion Do we have enough evidence to reject H0 that the variables are dependent?
As shared on...