45
Biostatistics Unit 4 Probability 1

Biostatistics Unit 4 Probability 1. Probability theory developed from the study of games of chance like dice and cards. A process like flipping a coin,

Embed Size (px)

Citation preview

Biostatistics

Unit 4

Probability

1

Probability

Probability theory developed from the study of games of chance like dice and cards.  A process like flipping a coin, rolling a die or drawing a card from a deck is called a probability experiment.  An outcome is a specific result of a single trial of a probability experiment.

2

Probability distributions

• Probability theory is the foundation for statistical inference.  A probability distribution is a device for indicating the values that a random variable may have. 

• There are two categories of random variables.  These are:

–discrete random variables, and

–continuous random variables.

3

Discrete random variable

The probability distribution of a discrete random variable specifies all possible values of a discrete random variable along with their respective probabilities

(continued)

4

Discrete random variable

Examples can be• Frequency distribution • Probability distribution (relative frequency distribution) • Cumulative frequency

Examples of discrete probability distributions are the binomial distribution and the Poisson distribution.

5

Binomial distributionA binomial experiment is a probability experiment with the following properties.

1.  Each trial can have only two outcomes which can be considered success or failure.2.  There must be a fixed number of trials.

3.  The outcomes of each trial must be independent of each other.

4. The probability of success must remain the same in each trial.

6

Binomial distribution The outcomes of a binomial experiment are called a binomial distribution.

7

Poisson distributionThe Poisson distribution is based on the Poisson process.  1.  The occurrences of the events are independent in an interval.2.  An infinite number of occurrences of the event are possible in the interval.3.  The probability of a single event in the interval is proportional to the length of the interval.4. In an infinitely small portion of the interval, the probability of more than one occurrence of the event is negligible.

8

9

Continuous variable

A continuous variable can assume any value within a specified interval of values assumed by the variable.  In a general case, with a large number of class intervals, the frequency polygon begins to resemble a smooth curve.

10

Continuous variable

• A continuous probability distribution is a probability density function. 

• The area under the smooth curve is equal to 1 and the frequency of occurrence of values between any two points equals the total area under the curve between the two points and the x-axis.

11

The normal distribution

• The normal distribution is the most important distribution in biostatistics.  It is frequently called the Gaussian distribution.  

• The two parameters of the normal distribution are the mean () and the standard deviation (). 

• The graph has a familiar bell-shaped curve.

12

The normal distribution

13

Properties of a normal distribution1.  It is symmetrical about .

2.  The mean, median and mode are all equal.

3.  The total area under the curve above the x-axis is 1 square unit.  Therefore 50% is to the right of and 50% is to the left of .

4.  Perpendiculars of:     ± contain about 68%;      ±2 contain about 95%;    ±3 contain about 99.7%of the area under the curve.

14

The normal distribution

15

16

Table of Normal Curve Areas

The Standard Normal Distribution

• A normal distribution is determined by  and .  This creates a family of distributions depending on whatever the values of  and  are. 

• The standard normal distribution has

=0 and =1.

17

 Standard z score

• The standard z score is obtained by creating a variable z whose value is

• Given the values of and we can convert a value of x to a value of z and find its probability using the table of normal curve areas.

18

 Finding normal curve areas

1.  The Table of Normal Curve Areas gives areas

between and the value of .  

2.  Find the z value in tenths in the column at left

margin and locate its row.  Find the hundredths

place in the appropriate column.

19

 Finding normal curve areas

3. Read the value of the area (P) from the body of

the table where the row and column intersect.  

Note that P is the probability that a given value of

z is as large as it is in its location.  

Values of P are in the form of a decimal point and

four places.  This constitutes a decimal percent.

20

Finding probabilities

(a) What is the probability that z < -1.96?

(1) Sketch a normal curve

(2) Draw a line for z = -1.96

(3) Find the area in the table

(4) The answer is the area to the left of

the line P(z < -1.96) = .0250

21

22

Finding probabilities

23

Finding probabilities

(b) What is the probability that -1.96 < z < 1.96?

(1) Sketch a normal curve

(2) Draw lines for lower z = -1.96, and

upper z = 1.96  (3) Find the area in the table corresponding to

each value  (4) The answer is the area between the values.

Subtract lower from upper:

P(-1.96 < z < 1.96) = .9750 - .0250 = .9500

24

25

Finding probabilities

26

Finding probabilities

(c)  What is the probability that z > 1.96? (1) Sketch a normal curve  (2) Draw a line for z = 1.96  (3) Find the area in the table  (4) The answer is the area to the right of the

line. It is found by subtracting the table

value from 1.0000:

P(z > 1.96) =1.0000 - .9750 = .0250

27

Finding probabilities

28

Applications of the normal distribution

• The normal distribution is used as a model to study many different variables. 

• We can use the normal distribution to answer probability questions about random variables. 

• Some examples of variables that are normally distributed are human height and intelligence.

29

Solving normal distribution application problems

(1) Write the given information

(2) Sketch a normal curve

(3) Convert x to a z score

(4) Find the appropriate value(s) in

the table(5) Complete the answer

30

Example: fingerprint count

Total fingerprint ridge count in humans is approximately normally distributed with mean of 140 and standard deviation of 50.  Find the probability that an individual picked at random will have a ridge count less than 100.  We follow the steps to find the solution.

31

Example: fingerprint count

(1) Write the given information

     = 140     = 50     x = 100

32

Example: fingerprint count

(2) Sketch a normal curve.

33

Example: fingerprint count

(3) Convert x to a z score.

               

34

35

Example: fingerprint count

(4) Find the appropriate value(s) in the table

     A value of z = -0.8 gives an area of .2119

which corresponds to the probability

P (z < -0.8)

36

Example: fingerprint count

(5) Complete the answer.

The probability that x is less than 100 is .2119.

    

37

Distortions of Normal Curve

• Data may not be normally distributed. • There may be data that are outliers that distort

the mean. The measure of this is skew.• Data may be bunched about the mean in a non-

normal fashion. The measure of this is kurtosis.

     38

Normal Distribution Graph-Box Plot

    

39

Skewed Data• Data may have a positive skew (long tail to

the right, or a negative skew (long tail to the left).

40

Positive Skew

41

Negative Skew

42

Kurtosis

• Kurtosis indicates data that are bunched together or spread out.

• Data that are bunched together give a tall, think distribution which is not normal. This is called leptokurtic.

• Data that are spread out give a low, flat distribution which is not normal. This is called platykurtic.

43

Kurtosis

44

fin

45