45
Special continuous random variables

Special continuous random variables

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Special continuous random variables

Special continuous

random variables

Page 2: Special continuous random variables

1. Uniform distribution

2. Normal probability distributions

Page 3: Special continuous random variables
Page 4: Special continuous random variables
Page 5: Special continuous random variables
Page 6: Special continuous random variables
Page 7: Special continuous random variables
Page 8: Special continuous random variables
Page 9: Special continuous random variables
Page 10: Special continuous random variables
Page 11: Special continuous random variables
Page 12: Special continuous random variables

A RANDOM VARIABLE X WHOSE DISTRIBUTION

HAS THE SHAPE OF A NORMAL CURVE IS CALLED

A NORMAL RANDOM VARIABLE.

This random variable X is said to be normally distributed with

mean μ and standard deviation σ if its probability distribution is

given by

Page 13: Special continuous random variables

PROPERTIES OF A

NORMAL DISTRIBUTION

The normal curve is symmetrical about the mean μ;

The mean is at the middle and divides the area into halves;

The total area under the curve is equal to 1;

It is completely determined by its mean and standard

deviation σ (or variance σ2)

Note:

In a normal distribution, only 2 parameters are needed,

namely μ and σ2.

Page 14: Special continuous random variables

AREA UNDER THE NORMAL

CURVE USING INTEGRATION

The probability of a continuous normal variable X found in a

particular interval [a, b] is the area under the curve bounded

by x=a and x=b and is given by

and the area depends upon the values of μ and σ.

Page 15: Special continuous random variables

THE STANDARD NORMAL

DISTRIBUTION

It makes life a lot easier for us if we standardize our normal curve, with

a mean of zero and a standard deviation of 1 unit.

If we have the standardized situation of μ = 0 and σ = 1, then we

have:

We can transform all the observations of any normal random

variable X with mean μ and variance σ to a new set of observations of

another normal random variable Z with mean 0 and variance 1 using

the following transformation:

Page 16: Special continuous random variables

EXAMPLE

Say μ=2 and σ=1/3 in a normal distribution.

The graph of the normal distribution is as follows:

Page 17: Special continuous random variables

The following graph (that we also saw earlier) represents the same information, but it has been standardized so that μ = 0 and σ = 1 (with the above graph superimposed for comparison):

The two graphs have different μ and σ, but have the same area.

The new distribution of the normal random variable Z with mean 0 and variance 1 (or standard deviation 1) is called a standard normal distribution. Standardizing the distribution like this makes it much easier to calculate probabilities.

Page 18: Special continuous random variables

MEAN

The mean is the central tendency of the distribution. It defines the

location of the peak for normal distributions. Most values cluster

around the mean. On a graph, changing the mean shifts the entire

curve left or right on the X-axis.

Page 19: Special continuous random variables

STANDARD DEVIATION

The standard deviation is a measure of variability. It defines the

width of the normal distribution. The standard deviation

determines how far away from the mean the values tend to fall. It

represents the typical distance between the observations and the

average.

Page 20: Special continuous random variables

Unfortunately, population parameters are usually unknown

because it’s generally impossible to measure an entire population.

However, you can use random samples to calculate estimates of

these parameters.

Statisticians represent sample estimates of these parameters

using x̅ for the sample mean and s for the sample standard

deviation.

Page 21: Special continuous random variables

POPULATION

In statistics, a population is the complete set of all objects or

people of interest. Typically, studies definite their population of

interest at the outset. Populations can have a finite size but

potentially very large size.

For example,

All valves produced by a specific manufacturing plant

All adult females in Ukraine

All smokers

Populations can also have an infinite size. For example, infinite

populations are used for all possible results of a sequence of

trials, such as flipping a coin.

Page 22: Special continuous random variables

COMMON PROPERTIES FOR ALL

FORMS OF THE NORMAL

DISTRIBUTION

They’re all symmetric. The normal distribution cannot

model skewed distributions.

The mean, median, and mode are all equal.

Half of the population is less than the mean and half is greater than

the mean.

The Empirical Rule allows you to determine the proportion of

values that fall within certain distances from the mean.

Page 23: Special continuous random variables

MEDIAN

The median is the middle of the data. Half of the observations are less

than or equal to it and half of the observations are greater than or

equal to it. The median is equivalent to the second quartile or the 50th

percentile.

For example, if the weights of five apples are 5, 5, 6, 7, and 8, the

median apple weight is 6 because it is the middle value. If there is an

even number of observations, you take the average of the two middle

values.

Page 24: Special continuous random variables

MODE The mode is the value that occurs most frequently in a set of

observations. You can find the mode simply by counting the

number of times each value occurs in a data set.

For example, if the weights of five apples are 5, 5, 6, 7, and 8, the

apple weight mode is 5 because it is the most frequent value.

Identifying the mode can help you understand your distribution.

Page 25: Special continuous random variables

THE EMPIRICAL RULE FOR THE

NORMAL DISTRIBUTION

When you have normally distributed data, the standard deviation becomes

particularly valuable. You can use it to determine the proportion of the values

that fall within a specified number of standard deviations from the mean. For

example, in a normal distribution, 68% of the observations fall within +/- 1

standard deviation from the mean. This property is part of the Empirical Rule,

which describes the percentage of the data that fall within specific numbers of

standard deviations from the mean for bell-shaped curves.

Page 26: Special continuous random variables

Mean +/-

standard

deviations

Percentage of

data contained

1 68%

2 95%

3 99.7%

Page 27: Special continuous random variables

RANGE

Let’s start with the range because it is the most straightforward

measure of variability to calculate and the simplest to understand.

The range of a dataset is the difference between the largest and

smallest values in that dataset. For example, in the two datasets

below, dataset 1 has a range of 20 – 38 = 18 while dataset 2 has

a range of 11 – 52 = 41. Dataset 2 has a broader range and,

hence, more variability than dataset 1.

Page 28: Special continuous random variables

THE INTERQUARTILE RANGE (IQR) . . .

AND OTHER PERCENTILES

The interquartile range is the middle half of the data. To visualize

it, think about the median value that splits the dataset in half.

Similarly, you can divide the data into quarters. Statisticians refer

to these quarters as quartiles and denote them from low to high

as Q1, Q2, and Q3. The lowest quartile (Q1) contains the quarter

of the dataset with the smallest values. The upper quartile (Q4)

contains the quarter of the dataset with the highest values. The

interquartile range is the middle half of the data that is in between

the upper and lower quartiles. In other words, the interquartile

range includes the 50% of data points that fall between Q1 and

Q3.

Page 29: Special continuous random variables
Page 30: Special continuous random variables
Page 31: Special continuous random variables
Page 32: Special continuous random variables
Page 33: Special continuous random variables
Page 34: Special continuous random variables
Page 35: Special continuous random variables
Page 36: Special continuous random variables
Page 37: Special continuous random variables
Page 38: Special continuous random variables
Page 39: Special continuous random variables

Suppose that Mr. N is one of the company's clients, and exactly 20% of the

clients are older than Mr. N. How old is Mr. N?

Since 20% of the clients are older than Mr. N, the age of Mr. N is the 80th

percentile of the r.v. X. Therefore, if we use c to denote the age of Mr. N,

we have that F (c) = 0.80. Then:

Therefore, Mr. N is 53.42 years old.

Page 40: Special continuous random variables

WHY THE NORMAL

DISTRIBUTION IS IMPORTANT

Some statistical hypothesis tests assume that the data follow a

normal distribution. However, there’s more to it than only whether

the data are normally distributed.

Linear and nonlinear regression both assume that

the residuals follow a normal distribution.

The central limit theorem states that as the sample size increases,

the sampling distribution of the mean follows a normal distribution

even when the underlying distribution of the original variable is

non-normal.

Page 41: Special continuous random variables

Parametric tests of means Nonparametric tests of

medians

1-sample t-test 1-sample Sign, 1-sample

Wilcoxon

2-sample t-test Mann-Whitney test

One-Way ANOVA Kruskal-Wallis,

Mood’s median test

Factorial DOE with a factor and

a blocking variable Friedman test

Page 42: Special continuous random variables

ADVANTAGES OF

PARAMETRIC TESTS

Advantage 1: Parametric tests can provide trustworthy results with

distributions that are skewed and nonnormal

Many people aren’t aware of this fact, but parametric analyses

can produce reliable results even when your continuous data are

nonnormally distributed. You just have to be sure that your sample

size meets the requirements for each analysis in the table below.

Simulation studies have identified these requirements.

Page 43: Special continuous random variables

Parametric analyses Sample size requirements for nonnormal

data

1-sample t-test Greater than 20

2-sample t-test Each group should have more than 15

observations

One-Way ANOVA

•For 2-9 groups, each group should have

more than 15 observations

•For 10-12 groups, each group should have

more than 20 observations

Page 44: Special continuous random variables

ADVANTAGE 2: PARAMETRIC TESTS CAN

PROVIDE TRUSTWORTHY RESULTS WHEN THE

GROUPS HAVE DIFFERENT AMOUNTS OF

VARIABILITY

It’s true that nonparametric tests don’t require data that are

normally distributed. However, nonparametric tests have the

disadvantage of an additional requirement that can be very hard

to satisfy. The groups in a nonparametric analysis typically must

all have the same variability (dispersion). Nonparametric analyses

might not provide accurate results when variability differs

between groups.

Conversely, parametric analyses, like the 2-sample t-test or one-

way ANOVA, allow you to analyze groups that have unequal

variances. In most statistical software, it’s as easy as checking the

correct box! You don’t have to worry about groups having different

amounts of variability when you use a parametric analysis.

Page 45: Special continuous random variables

ADVANTAGE 3: PARAMETRIC TESTS

HAVE GREATER STATISTICAL POWER

In most cases, parametric tests have more power. If

an effect actually exists, a parametric analysis is more likely to

detect it.