Probability and Statistics

GENG200

Probability and Statistics for Engineers

Spring 2013

1

Course Information Instructor: Dr. Adel Elomri,

Office: E212, corridor 5

E-mail: [email protected]

Office Hours: to be fixed later

TA:

Lectures: Monday, Wednesday 09:30 10:45

Course Webpage: http://mybb.qu.edu.qa

2

3

Textbook: Applied Statistics and Probability for Engineers, Douglas C. Montgomery, George C. Runger,

5th edition , publisher John Wiley & Sons, 2007

ISBN: 978-0-471-74589-1 Reference: Probability and Statistics for Engineering and the

sciences, Jay L. Devore, 6th edition, publisher John

Wiley & Sons, Inc. 2007

-See Library

Textbooks

Course Objectives

1. Provide students with statistical methods, both descriptive and

analytical, for dealing with the variability in observed data.

2. Provide students with fundamental concepts of probability and

random variables.

3. Introducing concepts of Statistical Inference and Hypothesis

testing and confidence intervals of parameters.

4. Emphasize practical engineering-based applications and the use

of real data examples.

4

5

Topics Weeks Introduction.

1

Probability: Addition rule, conditional probability, multiplication rule and Bayes Theorem.

1

Discrete random variables. Probability mass function. Mean and variance of discrete random variables.

1

Probability Distribution functions: Uniform, Binomial, Geometric and Poisson Distribution.

2

Continuous random variables. Probability Density functions. 1 Normal Distribution. Approximation to Binomial and Poisson Distribution. Exponential distribution. Uniform Distribution

1

Joint probability function. Multiple discrete and continuous random variables. 1

Covariance and correlation. Linear combination of random variables. Functions of random variables.

2

Descriptive Statistics: Data Summary, Presentation: Stem-Leaf Diagrams, Frequency Distributions, Histograms, Box Plots.

1

Parameter estimation. Properties of estimators. Method of Moments. 1

Interval estimation. Inference on the mean of a population: variance known or unknown. Inference on the variance of a normal population

1

Hypothesis testing about the mean and Proportion: Small and Large Sample 1

Covered Topics

Evaluation Scheme

Quizzes: There will be announced quizzes during class hours. There will be no make-up for missed quizzes.

Homework: You will be given some homework assignments, which will be announced in class and posted on the course website on The Blackboard. You will have one week for submitting each homework. Late submissions are accepted up to 3 days with a penalty of 15% deduction per each day lost.

Examination: There will be one mid-term Exam in addition to the final

Exam. If you miss one of these exams, you must have a university accepted official written excuse to take a make-up.

Term Project/Paper: The project details, guidelines and evaluation criteria shall be provided duly.

6

Evaluation

Homework : 05% Quizzes: 10% Term Project: 10% Midterm Exam : 35% Final Exam: 40%

Some of the exercises and in-class examples will be only

corrected in the whiteboard. It is the student responsibility to

take notes.

7

Our ground rules

Please switch off/silent our phones when in class,

Arrive on time, If you are late, just open the door and have a seat quietly. Late arrivals are not accepted after 20 minutes,

Attendance will be taken at the beginning of classes (within first 10-15 min, after that time late arrivals are considered absent,

In general be courteous to others

.

For more details see course syllabus

GENG200

Probability and Statistics for Engineers

Spring 2013

9

An engineer is someone who solves problems of interest to society by the efficient application of scientific principles by

Refining existing products

Designing new products or processes

1-1 The Engineering Method and

Statistical Thinking

12



Figure 1.1 The engineering method

13

Engineering Example An engineer is designing a nylon connector to be used in an automotive engine application. The engineer is considering establishing the design specification on wall thickness at 3/32 inch but is somewhat uncertain about the effect of this decision on the connector pull-off force. If the pull-off force is too low, the connector may fail when it is installed in an engine. Eight prototype units are produced and their pull-off forces measured (in pounds): 12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6, 13.1.


Statistical Thinking 14

The field of statistics deals with the collection, presentation, analysis, and use of data to

Make decisions

Solve problems

Design products and processes



Statistical techniques are useful for describing and understanding variability.

By variability, we mean successive observations of a system or phenomenon do not produce exactly the same result.

Statistics gives us a framework for describing this variability and for learning about potential sources of variability.



Engineering Example The dot diagram is a very useful plot for displaying a small body of data - say up to about 20 observations. This plot allows us to see easily two features of the data; the location, or the middle, and the scatter or variability.



Engineering Example The engineer considers an alternate design and eight prototypes are built and pull-off force measured. The dot diagram can be used to compare two sets of data

Figure 1-3 Dot diagram of pull-off force for two wall thicknesses.



Engineering Example Since pull-off force varies or exhibits variability, it is a random variable.

A random variable, X, can be modeled by

X = +

where is a constant and a random disturbance.



Environment effect; such as test equipment, difference in material and others.

19



Examples: Ohms law and ideal gas law.

These laws help us with designing products and processes.

Data is not available about the population (e.g. All nylon connectors produced for the next two weeks). So a sample is taken, measured and

considered to represent the whole population

20

e.g. population: Connectors sold to customers

Three basic methods for collecting data:

A retrospective study using historical data

An observational study

A designed experiment

21

1-2 Collecting Engineering Data

A retrospective study using historical data: It may involve a lot of data. So extracting the needed data

only (for solving a specific problem) is not an easy job. Data may contain relatively little useful information

Some information could be missing other could not be recorded at all.

Accuracy of collected data could be an issue (outliers). Therefore you have to be careful before making a general conclusions.

22


An observational study

It solves most of the problems of the retrospective method by collecting the needed data (sometimes more) with accuracy.

To avoid disturbing the current process, some variations of interest can not be tested. Therefore, experiments of the current system (or its model) will be necessary.

23


1-2.4 Design of Experiment

Example 1: For the nylon connectors, there are two values of

thickness that are considered to test their effect on the

connectors pull-off force.

Therefore, the engineer is interested in determining if there is

any difference between the 3/32 and the 1/8 connectors.

Hypothesis testing can be used to answer the question.

24

1-2.4 Design of Experiment

Example 2: Acetone-Butyl Alcohol Distillation column.

The process has three factors

Reboil Temperature

Condensate Temperature

Reflux Rate

We want to study the effect of these factors on the concentration of the produced acetone.

Each factor has two levels (values) that can be denoted as low (-1) and high (+1).

So the number of combinations = 2x2x2 = 8

25

1-2.4 Design of Experiment 26

1-2.4 Designed Experiments

Figure 1-5 The factorial design for the distillation column

27

1-2.4 Designed Experiments

A four-factorial experiment for the distillation column

Fourth factor can be considered: type of distillation column

Number of combinations = 24 = 16

General form is 2k, k factors with 2 levels each.

28

1-2.5 Observing Processes Over Time

Whenever data are collected over time it is important to plot the data over time. Phenomena that might affect the system or process often become more visible in a time-oriented plot and the concept of stability can be better judged.

The dot diagram illustrates variation but does not identify the problem.

The large variation shown in the plot diagram indicates in a lot of variability in concentration but the chart does not help explain the

reason of variation. 29


A time series plot of concentration provides more information than a dot diagram.

The plot shows a shift in the process mean level and an estimate of the time of the shift can be obtained.

30


Demings funnel experiment.

Strategy 1: No adjustment

Strategy 2: Adjust the funnel to compensate for the error (in the opposite direction of the error of the previous trial)

31

Experiment to understand the nature of variability in processes and systems over time. E.Deming : Very Influential Industrial Statistician


Adjustments applied to random disturbances over control the process and increase the deviations from the target.

32


Process mean shift is detected at observation number 57, and one adjustment (a decrease of two units) reduces the deviations from target.

33


A control chart for the chemical process concentration data.

Some disturbance happened at sample # 20 because all of the following observations are below the center line and 2 of them are below the lower limit.

34


Enumerative versus analytic study.

Enumerative Study: Collect data from a process to evaluate the current production (sampling in quality control).

Analytic Study: Use data from current production to evaluate future production. This requires stable process (use to control charts to verify that).

35

1-3 Mechanistic and Empirical Models

A mechanistic model is built from our underlying knowledge of the basic physical mechanism that relates several variables.

Example: Ohms Law

Current = voltage/resistance

I = E/R

I = E/R +

Due to uncontrolled factors the actual measured current could be different (e.g. Temperature, humidity, variations in voltage, and

measuring unit). 36


An empirical model is built from our engineering and

scientific knowledge of the phenomenon, but is not

directly developed from our theoretical or first-principles

understanding of the underlying mechanism.

37


Example

Suppose we are interested in the number average molecular weight (Mn) of a polymer. Now we know that Mn is related to the viscosity of the material (V), and it also depends on the amount of catalyst (C) and the temperature (T ) in the polymerization reactor when the material is manufactured. The relationship between Mn and these variables is

Mn = f(V,C,T) say, where the form of the function f is unknown. where the bs are unknown parameters. These parameters can be estimated using least square method.

38

Example:

39


In general, this type of empirical model is called a regression model.

The estimated regression line is given by

40

Three-dimensional plot of the wire and pull strength data.

41

Plot of the predicted values of pull strength from the

empirical model.

42

1-4 Probability and Probability Models

Probability models help quantify the risks involved in statistical inference, that is, risks involved in decisions made every day.

Probability provides the framework for the study and application of statistics.

Example:

a container has 25 items. One of them is defective.

If a sample of size (n) is taken, what is the chance that the defective part will be detected?

What is the risk of not detecting it?

Probability models will quantify this risk.

43

Probability versus Statistics

Probability deals with predicting the likelihood of future events, while statistics involves the analysis of the frequency of past events.

Probability is primarily a theoretical branch of mathematics. Statistics is an applied branch of mathematics.

In summary, probability theory enables us to find the consequences of a given ideal world, while statistical theory enables us to measure the extent to which our world is ideal.

44

Example: Probability The standard example is flipping a fair coin. Fair means, technically, that the probability

of heads on a given flip is 50%, and the probability of tails on a given flip is 50%. This doesn't mean that every other flip will give a head after all, three heads in a row is no surprise.

Another example would be flipping an unfair coin, where we know ahead of time that there's a 60% chance of heads on each toss.

A third example would be rolling a loaded die, where (for example) the chances of rolling 1, 2, 3, 4, 5, or 6 are 25%, 5%, 20%, 20%, 20%, and 10%, respectively.

45

Probability

1

Sample Spaces and Events

2

Random Experiment

3

An experiment that can result in different outcomes, even though it is repeated in the same manner every time, is called a random experiment.

Random Experiments - Example

Communication system such as voice communication network.

The information capacity available (number of external lines) to service customers (answer their calls) is an important design consideration.

Assuming that each line can carry only one conversation, how many lines should be purchased (too few or too many)?

To answer this question, we need to develop a model that shows the number of calls and the duration of calls.

If you know that on average, a call is received every 5 minutes and it lasts for 5 minutes? Is that enough? If you deal with this problem in a deterministic manner, how many lines will be purchased?

4

Random Experiments - Example

Communication system

5

Conclusions: considering the variation in our analysis of communication systems is very important.

Deterministic based

on average values

Real-life

Stochastic model is

needed

Sample Space

6

The set of all possible outcomes of a random experiment is called the sample space of the experiment. The sample space is denoted as S.

Sample Space - Example

7

Sample Space - Example

8

Sample Spaces Example 2

9

Or S = (R+, R+)

Sample Spaces Example 2

10

Tree Diagrams

11

Sample spaces can also be described graphically with

tree diagrams.

When a sample space can be constructed in several

steps or stages, we can represent each of the n1 ways of

completing the first step as a branch of a tree.

Each of the ways of completing the second step can be

represented as n2 branches starting from the ends of the

original branches, and so forth.

Tree Diagrams - Example

12

Tree Diagrams - Example

13 Tree diagram for three messages.

Tree Diagrams

Example 2:

Assume we have a bag containing 6 balls (1 white, 2 Red, 3 black), a random experiment consists in taking without replacement two balls from the bag.

1-Use a tree diagram to find the sample space.

S={WR, WB, BW, BR, BB, RW, RR, RB}

2-What will be sample space if the two balls were selected with replacement.

S={WW, WR, WB, BW, BR, BB, RW, RR, RB}

14

Events

15

An event is a subset of the sample space of a random experiment.

Representing Events with Sets

16

Representing Events with Sets Venn Diagrams

17

Event Representations - Example

18

Consider the sample space S={yy, yn, ny,nn} in Example 2-2. Identify the following events: -E1: the event in which at least one part conforms -E2: the event in which at most one part conforms - E3: the event in which at least one part do not conform - E4: the event in which both parts do not conform - E5: the event in which both parts are conform Find the following:

1 3, 1 3, 1 1, 4 4, 4 5

Event Representations - Example

19

Mutually Exclusive Events

20

Probability

21

Probability

22

Interpretations of Probability

23

Used to quantify likelihood or chance

Used to represent risk or uncertainty in engineering applications

Can be interpreted as our degree of belief or relative frequency

Interpretations of Probability

24

Relative frequency of corrupted pulses sent over a

communications channel.

Axioms of Probability

25

Equally Likely Outcomes

26

Definition

27

Example

28

Addition Rules: Probability of Union

29

Example

30

Suppose one wafer is selected at random.

Let H denote the event that the wafer contains high level of contamination.

Let C denote the event that the wafer is in the center of a sputtering tool.

Find:

P(H), P(C), P(HC), P(HC)

The table below lists the history of 940 wafers in a semiconductor process.

Example Continued

31

Suppose one wafer is selected at random.

Let H denote the event that the wafer contains high level of contamination.

Let C denote the event that the wafer is in the centre of a sputtering tool.

Find:

P(H), P(C), P(HC), P(HC)

P(H) = 358/940

P(C) = 626/940

P(HC) = probability that the wafer is from the centre of the sputtering tool and contains high level contamination = 112/940

P(HC) = probability that the wafer is from the centre of the sputtering tool, or contains high level contamination, or both = P(H) + P(C) - P(HC) = 872/940

Addition Rules: Mutually Exclusive Events

32

Example

33

Example: semiconductor process with more details.

1-What is the probability that a wafer was either at the edge or that it contains 4 or more particles?

2-What is the probability that a wafer contains less than 2 particles, or it is both at the edge and contains more than 4 particles?

Addition Rules

34

What is the probability that a wafer was either at the edge or that it contains 4 or more particles?

Let E1 denote the event that a wafer contains 4 or more particles

Let E2 denote the event that a wafer is at the edge

Then, the required probability is P(E1E2)

P(E1E2) = P(E1) + P(E2) - P(E1E2) = (0.05 +0.1) + 0.28 (0.01+0.03) = 0.39

Addition Rules

35

What is the probability that a wafer contains less than 2 particles, or it is both at the edge and contains more than 4 particles?

Let E1 denote the event that a wafer contains less than 2 particles

Let E2 denote the event that a wafer is both at the edge and contains more than 4 particles

Then, the required probability is P(E1E2)

P(E1E2) = P(E1) + P(E2) - P(E1E2) = (0.2 +0.4) + 0.03 0 = 0.63

E1 and E2 are mutually exclusive; e.g., E1E2 = .

Addition Rules: Union of Three Events

36

Addition Rules: Generalized Mutually Exclusive Events

37

Addition Rules: Generalized Mutually Exclusive Events

38

Venn diagram of four mutually exclusive events

Conditional Probability

To introduce conditional probability, consider an example involving

manufactured parts.

Let D denote the event that a part is defective and let F denote the event

that a part has a surface flaw.

Then, we denote the probability of D given, or assuming, that a part has

a surface flaw as P(D|F). This notation is read as the conditional

probability of D given F, and it is interpreted as the probability that a

part is defective, given that the part has a surface flaw.

39

Conditional Probability - Example

40

P(D|F) = 25%

P(D|F) = 5%

In a manufacturing process, 10% of parts contain visible

surface flaws (F) and 25% of these parts are defective (D).

5% of parts without surface flaws are defective.

Conditional Probability Example

41

The table below shows an example of 400 parts classified by surface flaws and

functionality defective.

P(D|F) =

P(D|F) =

10/40 = 0.25

18/360 = 0.05

Conditional Probability

42

Same Last Example:

P(D|F)=P(DF)/P(F) = (10/400)/(40/400) = 10/40

P(F) = 40/400 P(F|D) = P(FD)/P(D) = 10/28

P(D) = P(D/F)P(F)+P(D/F)P(F)

P(D) = 28/400 = (18/360)(360/400) + (10/40)(40/400)

Multiplication Rule

43

10 parts from tool 1

40 parts from tool 2

What is the conditional probability that a part from

tool 2 is selected randomly (without replacement),

given that a part from tool 1 is selected first?

Let E1 denote the event that the first part from tool

1, and E2 denote the event that the second part is

from tool 2.

P(E2|E1) =?

Multiplication and Total Probability Rules

Let E denote the outcomes with the first part from tool 1 and the

second part from tool 2.

Find P(E)?

P(E2|E1) = 40/49 (from previous slide)

P(E1) = 10/50

P(E) represents the outcome of selecting part of tool 1 and then

followed by the outcome of the conditional probability of selecting

a part of tool 2 given that a part of tool 1 is already selected.

P(E) = P(E1) P(E2|E1) = (40/49)(10/50) = 8/49 = P(E2E1)

44

Sometimes the probability of an event is given under each of several conditions.

Example:

In semiconductor manufacturing, the probability is 0.01 that a chip is subjected to high level of contamination causes a product failure.

The probability is 0.005 that a chip is not subjected to high level of contamination causes a product failure.

In a particular production run, 20% of the chips are subjected to high levels of contamination.

What is the probability that a product using one of these chips fails?

Total Probability Rule

45

The previous example can be summarized as following


46

Understanding the Total Probability Rule

47


Partitioning an event into two mutually exclusive subsets. Partitioning an event into several mutually exclusive subsets.

Total Probability Rule (Two Events)

48

Example:

Multiplication and Total Probability Rules

49

Total Probability Rule: Multiple Events

50

Total Probability Rule (multiple events)

Independence

51

Definition (two events)

Independence

52

Definition (multiple events)

Bayes Theorem

53

Remember what we mentioned about total probability rules that

sometimes information is given in terms of conditional probability.

But after a random experiment generates an outcome, we may be

interested in the probability that a condition was present (e.g. high

contamination) given an outcome (e.g. a semiconductor failure).

In other words; you would like to know the chance that the failure is due

to the high contamination condition.

Bayes Theorem

54

Thomas Bayes addressed this issue in 1700 and developed the Bayes

theorem.

Example: recall the example solved on slide 64.

Now we want to know

the probability of

P(H|F)?

i.e., a failure has happen

and the H contamination

Condition exist.

Bayes Theorem

55

P(H|F) = P(F|H) P(H)/P(F) P(F) was estimated on previous slide, P(F) = 0.024 P(H|F) = (0.10x0.20)/ 0.024 = 0.83 We can use the Total Probability Rule (multiple events) to obtain the general form which represents Bayes theorem on the next slide.

Bayes Theorem

56

Bayes Theorem

There are many conditions that could cause the outcome B (E1, E2, , EK).

All of these conditions are mutually exclusive. What is the chance that the

current outcome has been caused by the E1 condition.

Example: New Medical Procedure

Because of a new medical procedure has been shown to be effective in

the early detection of an illness, a medical screening of the population is

proposed.

The probability that the test correctly indentifies someone with illness as

positive is 0.99.

The probability that the test correctly identifies someone without the

illness as negative is 0.95.

The incidence of the illness in the general population is 0.0001.

You take the test and the result is positive, what is the probability that

you have the illness? 57


Let D denotes the event that you have the illness,

Let S denotes the event that the test signals positive.

The probability requested can be denoted as P(D|S).

Why is it D|S??

The test results is already known (either positive or negative).

So this is given. Now we are looking for the chance that is

either caused by the condition of being ill or not ill. 58


Based on these definitions, what is the given information:

1) P(D) = 0.0001

2) P(D) = 1 0.0001

3) P(S|D) = 0.99

4) The probability that the test correctly signals someone without

the illness as negative is 0.95. Consequently, the probability

of a positive test without the illness is: P(S|D)=0.05.

59


From Bayes Theorem:

60

)]P(D )D|P(S P(D) D)|[P(S

P(D) D)|P(S S)|P(D

0.0001)]-0.05P(1 01)[0.99(0.00

1)0.99(0.000

002.0506

1


That is, the probability of non having the illness given a

positive result from the test is only 0.2% (or 1 in 500).

Surprisingly, even though the test is effective, in the

sense that P(S|D) is high and P(S|D) is low, because

the incidence of illness in general population is low, the

chances are quite small that you actually have the

disease even if the test is positive. 61

Chapter 3

1

Random Variables

2

Random Variables

3

Introductory Example

Supposes that a days production of 850 manufacturing parts contains 50 parts that do not conform to customer requirements.

Two parts are selected at random, without replacement, from the batch.

Let the random variable X equal the number of nonconforming parts in the sample.

This case can be generalized as following in next slide 4

What are all possible value of X, and what are the associated probabilities.

3-1 Discrete Random Variables

Many physical systems can be modeled by the same or

similar random experiments and random variables.

The distribution of the random variables involved in each

of these common systems can be analyzed, and the results of

the analysis can be used in different applications and

examples.

So instead of using the sample space of the random

experiment we describe the distribution of a particular

random variable.

5

3-1 Discrete Random Variables

Example: Work Sampling

A voice communication system for a business contains 48 external lines.

At a particular time, the system is observed and some of the lines are used.

X denotes the number of lines in use (0

3-2 Probability Distributions and

Probability Mass Functions

Probability Distribution: Probability distribution of a random variable X is a description

of the probabilities associated with possible values of X. For a discrete random variable, the distribution is often

specified by just a list of the possible values a long with the probabilities of each

In some cases, it is convenient to express the probability in terms of a formula.

7



Example: Bits in error

There is a chance that a bit transmitted in a digital transmission channel is received in error.

Let X equal the number of bits in error in the next 4 bits transmitted. The possible values of X are (0,1,2,3,4).

The probability distribution of X can be specified by the possible

values along with the probability of each:

P(X=0) = 0.656 P(X=1) = 0.292 P(X=2) = 0.049

P(X=3) = 0.004 P(X=4) = 0.0001

8



Probability distribution for bits in error. 9



Loadings at discrete points on a long, thin beam.

Example: Loading on a long beam. At each discrete values of X, what is the probability of a certain weight.

10



Definition

11

Example

X: a random variable denotes the number of semiconductor

wafers that need to be analyzed to detect a large particle of

contamination.

Probability that a wafer contains a large particle is 1%.

Determine the probability distribution of X.

Let p denote a wafer with a large particle is present

Let a denote a wafer with a large particle is absent 12

Example

The sample space is infinite and represents all possible sequences

that starts with a and end with p.

S = {p, ap, aap, aaap, aaaap, and so forth}

Examples: P(X=1) = P(p) = 0.01

P(X=2) = P(ap) = 0.99*0.01 = 0.0099

13

Example 3-5 (continued)

Describing the probabilities associated with X in terms of

this formula is the simplest method of describing the

distribution in this example.

14

3-3 Cumulative Distribution Functions

For the example of Bits in error (slide # 9), if we would like to

find P(X3), we know that it is the union of the mutually

exclusive events: X=0, X=1, X=2, and X=3.

Hence P(x3) will be summation (or accumulation) of the

probability of these events:

P(X3) = P(X=0)+ P(X=1)+ P(X=2)+ P(X=3) = 0.9999

Using the cumulative probabilities is another way to

represent the probability distribution of a random variable.

15


Definition

16

Example 3-8

Supposes that a days production of 850 manufacturing parts contains 50 parts that do not conform to customer requirements.

Two parts are selected at random, without replacement, from the batch.

Let the random variable X equal the number of nonconforming parts in the sample.

What is the probability mass function of X? What is the cumulative distribution function of X?

17

Example 3-8

X={0, 1, 2}

18

probability mass function

cumulative distribution

Example 3-8

Cumulative distribution function for -x

19

Probability Distributions and Probability Mass Functions

20

3-4 Mean and Variance of a Discrete

Random Variable

Definition

22


Random Variable

A probability distribution can be viewed as a loading with the mean

equal to the balance point.

Parts (a) and (b) illustrate equal means, but Part (a) illustrates a larger

variance.

23


Random Variable

The probability distribution illustrated in Parts (a) and (b) differ even

though they have equal means and equal variances.

What does the length of each line represent in the above graphs? 24

Example 3-11

Determine the mean and standard deviation of the number of messages sent per hour.

The number of messages sent per hour over a computer network has the following distribution:

25


Random Variable

Expected Value of a Function [h(X)] of a Discrete Random Variable

Where h(X) is any function of the random variable X.

Example: the expected value of the function (X-)2 is the variance of the random

variable X (see slide # 19).

26


Random Variable

Example: Bits in error (see slide #9)

Let X equal the number of bits in error in the next 4 bits transmitted. P(X=0) = 0.656; P(X=1) = 0.292; P(X=2) = 0.049;

P(X=3) = 0.004; P(X=4) = 0.0001

What is the expected value of X2? h(X) = X2

E[h(X)] = 02 x 0.656 + 12 x 0.292 + 22 x 0.049 + 32 x 0.004 + 42 x 0.0001 = 0.52

27

Chapter 3

1

Outline Discrete Uniform Distribution Binomial Distribution Geometric Distribution Poisson Distribution

3-5 Discrete Uniform Distribution

The first digit of a parts serial number is likely to be any one of the digits 0 to 9. If a part is selected from a batch, what is the probability that the first digit is 8? The first number is a discrete random variable with range:

R = {0, 1, 2, , 9}

Probability of each value in this range is equal. Hence, f(x) = 1/(10) = 0.1 = 10%

This is a serial number of a product: 723-975632188

2


Probability mass function for a discrete uniform random variable.

3


Definition

4


Estimating the

Mean and Variance

Example of Product serial number:

B C BxC (B-3.5)2 CxE Face Probability E

1 1/6 1/6 6.25 6.25/6 2 1/6 2/6 2.25 2.25/6 3 1/6 3/6 0.25 0.25/6 4 1/6 4/6 0.25 0.25/6 5 1/6 5/6 2.25 2.25/6 6 1/6 6/6 6.25 6.25/6

Sum 3.5 2.91667 Mean Variance

5


Mean and Variance

6


Example of work sampling

A voice communication system for a business contain 48 external lines. At a particular time, the system is observed and some of the lines are used. X denotes the number of lines in use (0 = X = 48). Assume X is uniformly distributed (range: 0-48).

E(X) = (48+0)/2=24 = {[(48-0+1)2 1]/12}0.5 = 14.14 = Standard deviation

Let Y denote the proportion of the 48 lines that are in use at particular time. Then Y = X/48.

E(Y) = E(X)/48 = 24/48 = 0.5 = 50% V(Y) = V(X)/(48)2 = 0.087

7

8

End of Discrete Uniform Distribution

Any example of Random experiment with (only) two possible outcomes?

9

Examples:

Tossing a coin The gender of an expected baby The result of a basketball match Whether a produced part is good or defective

9

Two options

success with probability p

failure with probability 1-p

Bernoulli Distribution

Bernoulli Distribution

What if a Bernoulli Distribution is repeated n times?

Jacob Bernoulli (1655-1705)

3-6 Binomial Distribution

Random experiments and random variables

11


Random experiments and random variables

12


13

Given n Bernoulli trials

How many success and how many failure ?

P (1-P) (1-P) P (1-P) (1-P)


Example:

The chance that a bit transmitted through a digital

transmission channel is received in error is 0.1.

Assume that the transmission trials are independent.

Let X = the number of bits in error in the next 4 bits

transmitted.

Determine P(X=3). 14


Let E denote a bit in error, and O denote the bit is fine.

The outcomes can be represented as following:

15


Definition

16


Mean and Variance

17


Binomial distributions for selected values of n and p.

For a fixed n, the distribution becomes more symmetric as p increases from 0 to 0.5 or decreases from 1 to 0.5.

For a fixed p, the distribution becomes more symmetric as n increases. 18


Example

19 P(X=x) = Number of outcomes that result in x errors) x (0.1)x(0.9)16-x


Example (cont.)

20


Example (cont.)

21


Example

For the number of transmitted bits received in error:

n = 4, p=0.1,

So E(X) = 4(0.1) = 0.4

V(X) = 4(0.1)(0.9) = 0.36

22

End Binomial Distribution

3-7 Geometric Distribution

24

How many trials (including the success) to get a success for the first time?

(1-P) (1-P) (1-P) (1-P) P (success for the first time)

Number of success is known =1 Number of trials is unknown =Random variable


Example

25

1-What is the sample space? 2-What are the possible values of X 3-What is the event associated with X=5 4-Calculate the probability of X=5, P(X=5) 5-Derive the Probability Mass function of X


Example

26


Definition

27


Definition

28


Geometric distributions for selected values of the parameter p.

Probability decreases as

in Geometric series.

That is why it is called

Geometric.

29


Example X: a random variable denotes the number of semiconductor

wafers that need to be analyzed to detect a large particle of

contamination (see slide # 13).

Probability that a wafer contains a large particle is 1%.

What is the probability that exactly 125 wafers will be

analyzed to find the first wafer with a large particle?

30


Example

Let X denote the number of samples analyzed until a large

particle is detected.

Then X is a geometric random variable with p = 0.01.

The requested probability is:

31


Example

32

X: a random variable denotes the number of times a Die

need to be thrown to get face number 2 on top.

What is the probability that exactly 4 trials will be needed to

get face n2 on top for the first time ?

P(X=14\ X


Lack of Memory Property

P(X=106|100 are transmitted) = P(X=6)

33



34

Poisson Distribution

Example

Consider the transmission of n bits over a digital communication channel.

Let the RV equal the number of bits in error.

When the probability that a bit is in error is constant and the

transmissions are independent, X has a binomial distribution.

Let p denote the probability that a bit is in error.

Let = pn. Then E(x) = pn = . 35


Now, suppose that the number of bits transmitted increases and the

probability of an error decreases exactly enough that pn remains

equal to a constant. That is, n increases and p decreases

accordingly, such that E(x) = pn = remains constant.

Then, with some work on the above equation, it can be shown that:

36


So that:

Also, because the number of bits transmitted tends to

infinity, the number of errors can equal any nonnegative

integer.

Therefore, the range of X is the integers from zero to

infinity. 37


Example Flaws occur at random along the length of a thin copper wire.

Let X denote the RV that counts the number of flaws in a length of L

millimeters of wire and suppose the average number of flaws in L is

.

Partition the length of wire into n subintervals of small length, say 1

micrometer each. L

38


Example

If the subinterval chosen is small enough, the probability

that more than one flaw occurs in the subinterval is

negligible.

Every subinterval has the same probability of containing

a flaw, say p.

Finally, if we assume that the probability that a

subinterval contains a flaw is independent of other

subintervals, then:

39


Example

E(X) = =np

Then p = /n

With small enough subintervals, n is very large and p is

very small, the distribution of X can be obtained as in

the previous example.

40

Definition


41

Consistent Units


42


Example

Contamination is a problem in the manufacture of optical

storage disks.

The number of particles of contamination that occur on an

optical disk has a Poisson distribution, and the average number

of particles per centimeter squared of media surface is 0.1.

The area of a disk under study is 100 squared centimeters.

Find the probability that 12 particles occur in the area of a disk

under study.

43


Let X denote the number of particles in the area of a disk

under study.

Because the mean number of particles is 0.1 particles per

cm2, then:

Therefore:

44


The probability that zero particles occur in the area of the

disk is:

Determine the probability that 12 or less particles occur in

the area of the disk under study.

The probability is:

You need a computer to estimate it!! 45

Mean and Variance


46


47


Suppose that the number of flaws follow a Poisson distribution with

mean of 2.3 flaws per millimeter.

Determine the probability of exactly 2 flaws in 1 millimeter of wire

P(X=2) = e-2.3 2.32/2! = 0.265

Determine the probability of 10 flaws in 5 millimeters of wire.

E(X) = 5 mm x 2.3 flaws/mm = 11.5 flaws

P(X=10) = e-11.5 11.510/10! = 0.113

Example: The copper wire

48


Determine the probability of at least 1 flaw in 2

millimeters of wire.

E(X) = 2 mm x 2.3 flaws/mm = 4.6 flaws

P(X1) = 1- P(X=0) =1-e-4.6 = 0.9899

Example: The copper wire

49

Example 3-118

The number of failures of a testing instrument from contamination particles on the product is a Poisson random variable with a mean of 0.02 per hour.

(a) what is the probability that the instrument does not fail in an 8-hour shift?

(b) what is the probability of at least one failure in a 24-hour day?

50

Example 3 131

The probability that your call to a service line is answered in less than 30 seconds is 0.75. Assume that your calls are independent.

If you call 10 times, what is the probability that exactly 9 of your calls are answered within 30 seconds?

If you call 20 times, what is the probability that at least 16 calls are answered in less than 30 seconds?

If you call 20 times, what is the mean number of calls that are answered in less than 30 seconds?

51

4-1 Continuous Random Variables

2

The objective of this chapter is to introduce some continuous random distributions.


Probability Density Functions

Density function of a loading on a long, thin beam.

Density functions are commonly used to describe physical systems.

The load over the interval can be found by integrating the density function over the interval.

You can not find the load at a discrete point. But you can find the load

over an interval of length. How?



Probability determined from the area under f(x).

Similar to discrete R.V, the probability density function f(x) can be

used to describe the probability distribution of a continuous random

variable X.

The probability that the random variable X is between point a and b is

the integral of f(x) from a to b.



Definition



Histogram approximates a probability density function.

Probability of each interval



P(X=x) = 0



Example

If a part with a diameter larger than 12.60 millimeters is scrapped, what proportion of parts is scrapped?

A part is scrapped if X12.60.



Probability density function for previous example



Example (cont.)


Definition


Example

For the drilling operation in the previous example, F(x) consists of two expressions:

And for 12.5 x


Example (cont.)

Therefore,

The figure below displays a graph of F(x).

Cumulative distribution function

4-4 Mean and Variance of a Continuous

Random Variable

Definition

4-4 Mean and Variance of a Continuous

Random Variable

Expected Value of a Function of a Continuous

Random Variable

Exercises in section 4-2, 4-3, 4-4

UNIFORM DISTRIBUTION

NORMAL DISTRIBUTION

Exponential Distribution

4-5 Continuous Uniform Random

Variable

Definition


Variable

Continuous uniform probability density function.


Variable

Mean and Variance


Variable

Example

What is the probability that a measurement of current is between 5 and 10 mA?

X Continuous Uniform Random Variable with a range of 0 to 20 mA (i.e., a=0 and b=20).

f(x) = 1/(20-0)= 0.05,


Variable

Example The mean and variance formulas can be applied with a=0 and b=20. Therefore,

Consequently, the standard deviation of X is 5,77 mA.


Variable

The cumulative distribution function of a continuous uniform random variable is obtained by integration. If a x b,

Therefore, the complete description of the cumulative function of a continuous uniform variable is:

A movie theatre scheduled three sessions for a movie at: 5:00pm, 7:00pm and 9:00pm.

Once the movie starts, the gate will be closed. A visitor will arrive at the movie theatre at a time

uniformly distributed between 4:00pm and 9:00pm. (i.e., ~[4: 00, 9: 00])

1- Determine the cumulative distribution function of the arrival time in minutes (Hint: set

4:00 pm =0 min)

Use the cumulative distribution function to determine the following:

1- The probability that the visitor attends at least one movie session

2- The probability that the visitor waits more than 20min for a movie session

3- The probability that the visitor waits less than 10min for a movie session

4- The probability that the visitor attends the second show knowing that he missed the first

show.

Exercise

1- Determine the cumulative distribution function of the arrival time in minutes (Hint: set 4:00 pm =0 min)

~[0, 300]

=

0 , < 0

300, 0 < 300

1, 300 <

1- The probability that the visitor attends at least one movie session

According to the arrival distribution the visitor will come any time between 4pm and 9pm (0 and 300) which means he will

attend for sure one movie session (worst case he will come at 9pm and then attend the 3rd movie session),

P (attending at least one movie session)= 0 300 =F(300)-F(0)=1

2The probability that the visitor waits more than 20min for a movie session

0 40 + 60 160 + 180 280 = 40 + 160 + 280 0 60 180

=240

300= 0.8

3- The probability that the visitor waits less than 10min for a movie session

50 60 + 170 180 + 290 300 = 60 + 180 + 300 50 170 290

= 3 10

300=

30

300= 0.1

The probability that the visitor attends the second show knowing that he missed the first show.

A: be able to attend the second session (t60)

\ =

=

6060=

180 (60)

1(60)=

180

30060

300

160

300

=120

240= 0.5

4-6 Normal Distribution

Definition


Normal probability density functions for selected values of the

parameters and 2.


Definition : Standard Normal


Example

Standard normal probability density function

Assume that X is a standard normal distribution variable.

Find P(Z 1.5).

Appendix table II provides P(Z z).

z = 1.50 = 1.5 + 0.00

P(Z 1.5) =0.9332

P(Z 1.93)

z = 1.93 = 1.9 + 0.03

=0.9732

P(Z 1.938) =P(Z 1.93) or

=P(Z 1.94)


Example

Standard normal probability density function

z = 1.50 = 1.5 + 0.00

Assume that X is a standard normal distribution variable.

Find P(Z 1.5).

Appendix table II provides P(Z z).


Example

z = 1.53 = 1.5 + 0.03

Find P(Z 1.53).


Standardizing


Example

Let X denote the current in mA.

The requested probability is P(X13)

Transform it to Z = (X-10)/2

So X 13 corresponds to Z (13-10)/2=1.5.

Hence we look for P(Z 1.5) of the standard normal Table.

The table gives P(Z 1.5).


Standardizing a normal random variable.


To Calculate Probability


Example (cont.)

Based on the previous example, what is the probability that the current

is between 9 and 11 mA?


Example (cont.)

Determine the value for which the probability that a current is below it is

0.98.

So we need x such that P(X x) = 0.98.

Determining the value of x to meet a specified probability.


Example (cont.)

From Appendix table II, we can find the value of z that gives

a probability of 0.98.

2.05 = (x-10)/2 x = 14.1

4-8 Exponential Distribution

Definition


Mean and Variance


Example

In a large corporate computer network, user log-ons to the

system can be modeled as Poisson process with a mean

of 25 log-ons per hour.

What is the probability that there are no log-ons in an

interval of 6 minutes?


Let X denote the time in hours from the start of the interval until the

first log-on.

Then X has an exponential distribution with =25 log-ons/hour.

Required: P(X6minutes)

is given in hours. So we need to change all times in hours.

6 minutes = 0.1 hour !!!

Example (cont.)


Probability for the exponential distribution in Example 4-21.

Example (cont.)


Example (cont.)


Example 2

Let X denote the time between detections of a particle with a Geiger

counter.

Assume that X has an exponential distribution with a mean =1.4.

minutes.

The probability that we detect a particle within 30 seconds of starting

the counter is:


Example 2 (cont.)

Now suppose we turn the Geiger counter and wait 3 minutes without detecting a particle.

What is the probability that a particle is detected in the next 30 seconds?

Do you think that the probability will be higher than 0.3?

No.

Why?

This is the nature of exponential distribution!!!!

Prove it!!!


This situation can be expressed as a conditional probability

that, P(X3.5|X3).

P(X3.5|X3) = P(3X3.5) / P(X3).

Where

Example 2 (cont.)


Example (cont.)

and

Therefore P(X3.5|X3) = 0.0035/0.117 = 0.3

The fact that you waited 3 minutes without a detection does

not change the probability of a detection in the next 30

seconds.



So we have two distributions with this property; one

for discrete RVs. (Geometric), the second is for the

continuous RVs (Exponential).

5-2 TWO OR MORE CONTINUOUS RANDOM VARIABLES

+ 5-1 Two Discrete Random Variables

In previous chapters, we studied the probability distributions

of a single random variable.

Sometimes it is useful to study more than one random

variable in a random experiment.

Example 1: transmitted signals

X= the number of high quality signals

Y = the number of low quality signals


Example 2: Injection molded parts

X = length of dimension of injected part

Y = length of another dimension of the injected part

Let the specs of X is: 2.95 to 3.05, and for Y is: 7.60 to 7.80

We may be interested in the probability that a part satisfies both specs; that is P(2.95 X 3.05) and P(7.69 Y 7.80) .

In general, if X and Y are two random variables, the probability distribution that defines their simultaneous behavior is called a joint probability distribution.


Example 5-1

Calls are made to check the airline schedule at your departure city.

You monitor the number of bars of signal strength on your cell phone

and the number of times you have to state the name of your departure

city before the voice system recognizes the name.

In the first 4 bits transmitted, let:

X denote the number of bars of signal strength on your cell phone

Y denote the number of times you need to state your departure city

+Joint probability distribution of X and Y in Example 5-1.

By specifying the probability of each of the points, we define the range of

the random variables (X,Y) to be the set of points (x,y) in two-dimensional

space for which the probability that X=x & Y=y is positive.

The joint probability distribution of the two random variables is

sometimes referred to as the bivariate probability distribution or bivariate distribution.


5-1.1 Joint Probability Distributions


5-1.2 Marginal Probability Distributions

Example 5-2 Find the marginal probability distribution of X in example 5.1:

P(X=3) = P(X=3, Y=1)+ P(X=3, Y=2)+ P(X=3, Y=3)+ P(X=3, Y=4)

= 0.25 + 0.2 + 0.05 + 0.05 = 0.55


Example 5-3

Based on example 5-1:

P(Y=1| X=3) = P(X=3, Y=1)/P(X=3)

= fXY(3,1)/fX(3) = 0.25/0.55 = 0.454

The probability that Y=2 given X=3 is:

P(Y=2| X=3) = P(X=3, Y=2)/P(X=3)

= fXY(3,2)/fX(3) = 0.2/0.55 = 0.364


5-1.3 Conditional Probability Distributions

+Example 5-4

Conditional probability distributions of Y given X, fY|x(y) in Example 5-6.

5-1 Two Discrete Random Variables

Definition: Conditional Mean and Variance

+ 5-1 Two Discrete Random VariablesExample 5-5

Based on example 5-1, the conditional mean of Y given X=1 is

obtained from the conditional distribution in figure 5.3:

E(Y|1) = Y|1 = 1(0.05) + 2(0.1) + 3(0.1) + 4(0.75) = 3.55 The conditional mean is interpreted as the expected number of times

the city name is stated given that one bar of signal is present.

The conditional variance of Y given X=1 is:

V(Y|1) = Y|1 = (1-3.55)2 (0.05)+ (2-3.55)2 (0.1) + (3-3.55)2 (0.1) + (4-3.55)2 (0.75) = 0.748


5-1.4 IndependenceIn some random experiments, knowledge of the values of X does not

change any of the probabilities associated with the values of Y.

Example 5-6 In a plastic molding operation, each part is classified as to whether it

conforms to color and length specifications.

Define the random variable X & Y as:


Example 5-6 (cont.) Assume the joint probability

distribution of X & Y is defined by

fXY(x,y) in the Figure. The marginal probability

distribution of X & Y are also

shown.

Note that fXY(x,y) = fX(x) fY(y)


Example 5-6 (cont.)

The conditional probability mass

function fY|X(y) is shown in Figure.

Notice that for any x, fY|x(y) = fY(y). That is, knowledge of whether or not the part meets specifications

does not change the probability that it meets length specifications.


5-1.4 Independence

+ Announcement

Midterm Exam 2 will take place on MON 17th December

SAT 22 DEC 10-11:30

5-2 TWO OR MORE CONTINUOUS RANDOM VARIABLES


In previous chapters, we studied the probability distributions

of a single random variable.

Sometimes it is useful to study more than one random

variable in a random experiment.

Example 1: transmitted signals

X= the number of high quality signals

Y = the number of low quality signals


Example 2: Injection molded parts

X = length of dimension of injected part

Y = length of another dimension of the injected part

Let the specs of X is: 2.95 to 3.05, and for Y is: 7.60 to 7.80

We may be interested in the probability that a part satisfies both specs;

that is P(2.95 X 3.05) and P(7.69 Y 7.80) .

In general, if X and Y are two random variables, the probability distribution

that defines their simultaneous behavior is called a joint probability

distribution.


Example 5-1

Calls are made to check the airline schedule at your departure city.

You monitor the number of bars of signal strength on your cell phone

and the number of times you have to state the name of your departure

city before the voice system recognizes the name.

In the first 4 bits transmitted, let:

X denote the number of bars of signal strength on your cell phone

Y denote the number of times you need to state your departure city

+

Joint probability distribution of X and Y in Example 5-1.

By specifying the probability of each of the points, we define the range of

the random variables (X,Y) to be the set of points (x,y) in two-dimensional

space for which the probability that X=x & Y=y is positive.

The joint probability distribution of the two random variables is

sometimes referred to as the bivariate probability distribution or

bivariate distribution.


5-1.1 Joint Probability Distributions


5-1.2 Marginal Probability Distributions

Example 5-2

Find the marginal probability distribution of X in example 5.1:

P(X=3) = P(X=3, Y=1)+ P(X=3, Y=2)+ P(X=3, Y=3)+ P(X=3, Y=4)

= 0.25 + 0.2 + 0.05 + 0.05 = 0.55


Example 5-3

Based on example 5-1:

P(Y=1| X=3) = P(X=3, Y=1)/P(X=3)

= fXY(3,1)/fX(3) = 0.25/0.55 = 0.454

The probability that Y=2 given X=3 is:

P(Y=2| X=3) = P(X=3, Y=2)/P(X=3)

= fXY(3,2)/fX(3) = 0.2/0.55 = 0.364


5-1.3 Conditional Probability Distributions

+

Example 5-4

Conditional probability distributions of Y given X, fY|x(y) in Example 5-6.

5-1 Two Discrete Random Variables

Definition: Conditional Mean and Variance


Example 5-5

Based on example 5-1, the conditional mean of Y given X=1 is

obtained from the conditional distribution in figure 5.3:

E(Y|1) = Y|1 = 1(0.05) + 2(0.1) + 3(0.1) + 4(0.75) = 3.55

The conditional mean is interpreted as the expected number of times

the city name is stated given that one bar of signal is present.

The conditional variance of Y given X=1 is:

V(Y|1) = Y|1 = (1-3.55)2 (0.05)+ (2-3.55)2 (0.1) +

(3-3.55)2 (0.1) + (4-3.55)2 (0.75) = 0.748


5-1.4 Independence

In some random experiments, knowledge of the values of X does not

change any of the probabilities associated with the values of Y.

Example 5-6

In a plastic molding operation, each part is classified as to whether it

conforms to color and length specifications.

Define the random variable X & Y as:


Example 5-6 (cont.)

Assume the joint probability

distribution of X & Y is defined by

fXY(x,y) in the Figure. The marginal probability

distribution of X & Y are also

shown.

Note that fXY(x,y) = fX(x) fY(y)


Example 5-6 (cont.)

The conditional probability mass

function fY|X(y) is shown in Figure.

Notice that for any x, fY|x(y) = fY(y). That is, knowledge of whether or not the part meets specifications

does not change the probability that it meets length specifications.


5-1.4 Independence

+

Covariance and Correlation

+ 5-3 Covariance and Correlation

Definition

Covariance is a measure of Linear Relationship between the random variables


Determine the covariance for the following joint

probability.


Definition

If cov(X,Y)>0, then Y tends to increase as X increases,

If cov(X,Y)


Definition


Figure 5-13 Joint probability distributions and the sign of covariance between X and Y.


If cov(X,Y)>0, then Y tends to increase as X increases,

If cov(X,Y)


Definition


+1 in the case of a perfect positive (increasing) linear relationship

(correlation),

1 in the case of a perfect decreasing (negative) linear relationship (anticorrelation)

and some value between 1 and 1 in all other cases, indicating the degree of linear dependence between the variables. As it approaches zero there is

less of a relationship (closer to uncorrelated). The closer the coefficient is

to either 1 or 1, the stronger the correlation between the variables.


Example 5-26

0 1 2 3

0 0.2

1 0.1 0.1

2 0.1 0.1

3 0.4

Y

X


Example 5-26 (continued)

6-1 Numerical Summaries

Definition: Sample Mean


Example 6-1


Fulcrum

The sample mean as a balance point for a system of weights.


Population Mean

For a finite population with N measurements, the

mean is

The sample mean is a reasonable estimate of the

population mean.


Definition: Sample Variance

Why (n-1)?

(1) xi tend to be closer to the sample mean x than the true mean . To compensate for that and avoid obtaining a variance thats often smaller than the true variance 2 we use (n-1).

(2) The number of degrees of freedom of the sum is (n-1).


Example 6-2

The table below displays the quantities needed for calculating the sample variance and sample standard deviation for the pull-off force data.


Example 6-2


Computation of s2


Population Variance

When the population is finite and consists of N values, we

may define the population variance as

The sample variance is a reasonable estimate of the population variance.


Definition

6-2 Stem-and-Leaf Diagrams

A stem-and-leaf diagram is a good way to obtain an informative visual

of a data set x1, x2, , xn, where each number xi consists of at least

two digits.

To construct a stem-and-leaf diagram, use the following steps:


Data of compressive strengths in pounds per square inch of 80 specimens of a new aluminum-lithium alloy under going evaluation

As a possible material for aircraft structural elements.


Stem-and-leaf diagram for

the compressive strength

data in Table 6-2.

Units:

Stem: Tens and hundreds

(psi)

Leaf: Ones (psi)


The last column is the frequency count of the number of

leaves associated with each stem.

Inspection of this display reveals that most of the

compressive strength lie between 110 and 200 psi and that

a central value is somewhere between 150 and 160.

The stem-and-leaf diagram enables us to determine quickly

some features of the data that were not immediately

obvious in the original display in the table.


Example 6-5

Too few stems, moderate number of stems, or too many?

It does not provide much information


Example 6-5


L: 0,1,2,3, & 4 U: 5,6,7,8, & 9 It provides more

information


Example 6-5


z (0,1), t (2,3), f(4,5), s(6,7), e (8,9)

It does not tell much about

the shape of the data.


Ordered Stem-and-

leaf diagram

produced by

Minitab software.

Median: measure the central tendency

40th and 41st values

(160+163)/2 =161.5

Sample Mode: most frequent occurring

Sample =158

Data Features

The range is a measure of variability that can be easily

computed from the ordered stem-and-leaf display. It is the

maximum minus the minimum measurement. From previous

slide, the range is 245 - 76 = 169.


Data Features

The median is a measure of central tendency that divides the

data into two equal parts, half below the median and half above.

If the number of observations is even, the median is halfway

between the two central values.

In previous slide, the 40th and 41st values of strength as 160

and 163, so the median is (160 + 163)/2 = 161.5. If the number of

observations is odd, the median is the central value.

6-2 Data Features: Quartiles

Data Features

When an ordered set of data is divided into four equal parts, the division points are called quartiles.

The first or lower quartile, q1 , is a value that has approximately one-fourth (25%) of the observations below it and approximately 75% of the observations above. The second quartile, q2, has approximately one-half (50%) of the observations below its value. The second quartile is exactly equal to the median. The third or upper quartile, q3, has approximately three-fourths (75%) of the observations below its value. As in the case of the median, the quartiles may not be unique.


Data Features

The compressive strength data in Figure 6-6 contains

n = 80 observations. Minitab software calculates the first and third

quartiles as the (n + 1)/4 and 3(n + 1)/4 ordered observations and

interpolates as needed.

For example, (80 + 1)/4 = 20.25 and 3(80 + 1)/4 = 60.75.

Therefore, Minitab interpolates between the 20th (143) and 21st (145)

ordered observation to obtain q1 = 143.50 and between the 60th and 61st

observation to obtain q3 =181.00.


Data Features

The interquartile range (IQR) is the difference between

the upper and lower quartiles (q3,q1), and it is sometimes

used as a measure of variability.

IQR = q3-q1



#

Data Sort data

A--->Z

1 80 11

2 95 15

3 20 20

4 67 55

5 93 67

6 11 75

7 15 80

8 55 93

9 75 95

10 96 96

First Quartile (q1)

(n+1)/4 = (10+1)/4=2.75 q1 is between X(2) and X(3), by interpolation:

q1=X2+0.75*(X3-X2)=15+0.75(20-15)=18.75

Second Quartile (q2) = median

(n+1)/2 = (10+1)/4=5.5 q2 is between X(5) and X(6).

q2=(X5+X6)/2=(67+75)/2=71

Third Quartile (q3)

3(n+1)/4 = 3(10+1)/4=8.25 q3 is between X(8) and X(9), by interpolation:

q3=X8+0.25*(X9-X8)=93+0.25*(95-93)=93.5

IQR=q3-q1=

93.5-18.75

=74.75

Or

q1=(X2+X3)/2=

(20+15)/2=17.5

Or

q3=(X8+X9)/2=

(95+93)/2=94


First Quartile (q1)

(n+1)/4 = (11+1)/4=3 q1 =X(3)=12


(n+1)/4 = (11+1)/4=6 q2 =X(6)=23

Third Quartile (q3)

3(n+1)/4 = 3(11+1)/4=9 q3 =X(9)=73

#

Data Sort data

A--->Z

1 3 1

2 74 3

3 1 12

4 99 13

5 12 20

6 73 23

7 40 27

8 23 40

9 20 73

10 13 74

11 27 99

q1

q2

q3

6-4 Box Plots

The box plot is a graphical display that simultaneously

describes several important features of a data set, such

as center, spread, departure from symmetry, and

identification of observations that lie unusually far from

the bulk of the data (outlier).

Whisker

Outlier

Extreme outlier

6-4 Box Plots

Description of a box plot

Whisker extends to

smallest data point

within 1.5(IQR ) ranges

from the first quartile

Whisker extends to

largest data point

within 1.5(IQR ) ranges

from the third quartile

interquartile range (IQR)

IQR = q3-q1

Box plot for compressive strength data in Table 6-2.

6-4 Box Plots

Example:

IQR = q3-q1 = 181-143.5

Q3+1.5(IQR)=237.25

q1-1.5(IQR)=87.25

Extreme

outlier

Outliers

6-4 Box Plots Exp1

First Quartile (q1)

(n+1)/4 = (11+1)/4=3 q1 =X(3)=12


(n+1)/4 = (11+1)/4=6 q2 =X(6)=23

Third Quartile (q3)

3(n+1)/4 = 3(11+1)/4=9 q3 =X(9)=73

#

Data Sort data

A--->Z

1 3 1

2 74 3

3 1 12

4 99 13

5 12 20

6 73 23

7 40 27

8 23 40

9 20 73

10 13 74

11 27 99

q1

q2

q3

6-2 Box Plots Exp1

Box plots are preferable to stem and leaf displays when

-there is a large amount of data.

-three or more groups are to be compared.

More information:

http://onlinestatbook.com/2/graphing_distributions/boxplots.html

6-4 Box Plots

6-4 Box Plots

6-3 Frequency Distributions and Histograms

A frequency distribution is a more compact summary of data than a stem-and-leaf diagram. To construct a frequency distribution, we must divide the range of the data into intervals, which are usually called class intervals, cells, or bins.

number of bins is usually between 5 and 20. the number of bins increases as the number of observations increases. Rule of thumb: number of bins = (n)0.5.

Histogram of compressive strength for 80 aluminum-lithium

alloy specimens.

It looks like a bell shape!!


A histogram of the compressive strength data from Minitab

with 17 bins.

It is recommended to use the histograms if the number of

observations is more than 75.


A histogram of the compressive strength data

from Minitab with nine bins.


A cumulative distribution plot of the compressive

strength data from Minitab.


Histograms for symmetric and skewed distributions.


Meanx : Medianx :~

Positive or right skew

mode mean

6-5 Time Sequence Plots

A time series or time sequence is a data set in which the

observations are recorded in the order in which they occur.

A time series plot is a graph in which the vertical axis denotes the

observed value of the variable (say x) and the horizontal axis

denotes the time (which could be minutes, days, years, etc.).

When measurements are plotted as a time series, we

often see

trends,

cycles, or

other broad features of the data


Company sales by year

Upward trend


Company sales by quarter

Cycle


A digidot plot of the compressive strength data

A combined plot of time series and stem-and-leaf plot. There is no pattern.


A digidot plot of chemical process concentration readings,

observed hourly.

Until observation 20, the average was about 85 grams/liter

The average went down after that point!!

6-6 Probability Plots

Probability plotting is a graphical method for determining

whether sample data conform to a hypothesized distribution

based on a subjective visual examination of the data.

Probability plotting typically uses special graph paper,

known as probability paper, that has been designed for the

hypothesized distribution. Probability paper is widely

available for the normal, lognormal, Weibull, and various chi-

square and gamma distributions.


Example

Ten observations on the effective service life in minutes of

batteries used in a portable personal computer are as follows:

176, 191, 214, 220, 205,192, 201, 190, 183, 185.

It is assumed that battery life is modeled by a normal distribution.

Use the probability plotting to investigate this assumption.

First, arrange the observations in ascending order and calculate

their cumulative frequency (j-0.5)/10 as shown below.


Example


Normal probability plot for battery life.


A straight line is drawn through the plotted points.

A good rule of thumb is to draw the line approximately

between the 25th & 75th percentile points.

If all points are covered by imaginary fat pencil (straight

line), a normal distribution adequately describes the data.

Example

Normal probability plot obtained from standardized normal scores.

On ordinary

paper

7-1 Introduction

The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population.

These methods utilize the information contained in a sample from the population in drawing conclusions.

Statistical inference may be divided into two major areas:

Parameter estimation

Hypothesis testing

7-1 Introduction

Suppose that we want to obtain a point estimation of a population

parameter.

The observations are random variables, say X1, X2, , Xn.

Therefore, any function of the observations, or any statistic, is also a

random variable.

For example, the sample mean X, and the sample variance S2 are

statistics and they are also random variables.

Since a statistic is RV, it has a probability distribution.

We call the probability distribution of a statistic a sampling

distribution.

7-1 Introduction

Definition

7-1 Introduction

is the uppercase of

7-1 Introduction

7.2 Sampling Distributions and the Central Limit Theorem

Statistical inference is concerned with making decisions about a population based on the information contained in a random sample from that population.

Definitions:


If we are sampling from a population that has an unknown

probability distribution, the sampling distribution of the

sample mean will still be approximately normal with

mean and variance 2/n, if the sample size is large.

This is one of the most useful theorems in statistics, and it

is called central limit theorem.


X is N(, 2/n) If the population is continuous, unimodal and symmetric, n=5 is enough. If n30, that will be enough regardless of the shape of the population. If n30, still it will be fine if the population distribution is not severely non-normal


Example 7-1

An electronic company manufactures resistors that have a mean

resistance of 100 ohms and a standard deviation of 10 ohms.

The distribution of resistance is normal

Find the probability that a random sample of n=25 resistors will have

an average resistance less than 95 ohms.

Note that the sampling distribution of X is normal, with mean =100

ohms and a standard deviation of:.


Example 7-1

Therefore, the desired probability, shaded area shown in the figure,

can be estimated by standardizing the point X = 95:

And therefore

Probability for previous example


Example 7-2

Suppose that the random variable X has a continuous uniform

distribution f(x)=0.5 for 4 x 6, 0 otherwise.

Find the distribution of the sample mean of a random sample of size

n=40.

Mean and variance are (4+6)/2=5, and (6-4)2/12=1/3 respectively.

According to the central limit theorem, X is normally distributed with:

5x120/1)40(3/1/22 nx


Example 7-2

7-3 General Concepts of Point Estimation

7-3.1 Unbiased Estimators

An estimate should be close to the true value of the unknown

parameter.

Formally, we say that is unbiased estimator of if the expected value of is equal to .

Definition

Example


Suppose we have a random sample of size 2n from a population denoted by X, and E(X)= and V(X)=2.

Let

n

i

iXn

X2

11 2

1and

n

i

iXn

X1

2

1

Be two estimators of . Which is unbiased estimator of ?

nn

XEnn

X

EXn

ii

n

ii

22

1

2

1

2

2

1

2

11

Example (cont.)


are unbiased estimators of .

nn

XEnn

X

EXn

ii

n

ii 11

1

12

1X 2Xand

Example


Documents

Probability and Statistics