33
IC 102: Data Analysis and Interpretation Instructor: Guruprasad PJ Dept. Aerospace Engineering Indian Institute of Technology Bombay Powai, Mumbai – 400076 Email: [email protected] Phone no.: 2576 7142

IC 102: Data Analysis and Interpretation

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IC 102: Data Analysis and Interpretation

IC 102: Data Analysis and Interpretation

Instructor: Guruprasad PJDept. Aerospace Engineering

Indian Institute of Technology BombayPowai, Mumbai – 400076

Email: [email protected] no.: 2576 7142

Page 2: IC 102: Data Analysis and Interpretation

What is Probability?

● “The true logic of this world is in the calculus of probabilities.” - James Clerk Maxwell

● By “chance” or “probability” of a particular outcome of an observation we mean our estimate for the most likely fraction of a number of repeated observations that will yield that particular outcome.

● Key to the defintion is: an event or an occurance is a possible outcome of some repeatable observation.

Page 3: IC 102: Data Analysis and Interpretation

Sample Space and Events

● Sample space (S): Set of all possible outcomes of an experiment.

Ex:

1. If the experiment consists of the flipping of a coin, then: S = {H,T}

2. If the experiment consists of rolling a die, then the sample space is: S = {1,2,3,4,5,6}

3. If the experiment consists of measuring the lifetime of a car, then the sample space consists of all nonnegative real numbers: S = [0,∞)

Page 4: IC 102: Data Analysis and Interpretation

Sample Space and Events

● Event(s): Any subset of the sample space is known as an event.

Ex:

1. In Ex 1., if E = {H}, then E is the event that a head appears on the flip of the coin.

2. In Ex 2., if E = {1}, then E is the event that one appears on the roll of the die.

3. In Ex 3., if E = (2,6), the E is the event that the car lasts between two and six years.

Page 5: IC 102: Data Analysis and Interpretation

Sample Space and Events continued...

● Union of two events E and F: E U F

Consists of all events that are either in E or F or in both E and F.

For example, if E = {H}; F = {T} then E U F = {H,T}

Rolling a die: E = {1,3,5}; F = {1,2,3}, then

E U F = {1,2,3,5}

Page 6: IC 102: Data Analysis and Interpretation

Sample Space and Events continued...

● Intersection of two events E and F: E ∩F

EF consists of all outcomes which are both in E and F.

Rolling a die: E = {1,3,5}; F = {1,2,3}, then

E ∩F = {1,3}

Suppose, E = {H} and F = {T}

E ∩F = {Ø} a mutually exclusive event.

Page 7: IC 102: Data Analysis and Interpretation

Sample Space and Events continued...

● More than two events E1, E

2, .....

Union:

it is defined to be that event which consists of all outcomes that are in E

n, for at least one value of n = 1, 2,...

Intersection: ∩n=1

∞ En

is defined to be that event consisting of those outcomes that are in all of the events E

n, n = 1,2,...

● Complement of E: Ec, referred to as the complement of E, consists of all outcomes in the sample space S that are not in E.

U n=1∞ En

Page 8: IC 102: Data Analysis and Interpretation

Probabilities Defined on Events

● Consider an experiment whose sample space is S. For each event E of the sample space S, we assume that a number P(E) is defined and satisfies the following three conditions:

1.

2.

3. For any sequence of events E1, E

2, ..... that

are mutually exclusive,

0≤P E ≤1

P S=1

P E1, E2,. ..=∑n=1

PEn

Page 9: IC 102: Data Analysis and Interpretation

Some Examples

● Coin tossing: if we assume that a head is equally likely to appear as a tail, then we would have:

P({H}) = P({T}) = 1/2

Suppose we had a biased coin where a head was twice as likely to appear as a tail, we would have

P({H}) = 2/3 P({T}) = 1/3

Page 10: IC 102: Data Analysis and Interpretation

Some Examples

● Die rolling: Suppose all six numbers were equally likely to appear, then we would have

P({1})=P({2})=P({3})=P({4})=P({5})=P({6})=1/6

Probability of getting an even number is?

P({2,4,6}) = ------?

Page 11: IC 102: Data Analysis and Interpretation

What is the probability that an event does not occur?

● Show that:

P(Ec) = 1 – P(E)

i.e the probability that an event does not occur is one minus the probability that it does occur.

Page 12: IC 102: Data Analysis and Interpretation

Some identities

● Suppose E and F are two events, show that

P(E U F) = P(E) + P(F) – P (EF)

What happens when the two events are mutually exclusive?

Assignment: Obtain a similar expression for P (E U F U G); where E, F and G are events belonging to the sample space S.

Page 13: IC 102: Data Analysis and Interpretation

An Example

● Suppose that we toss two coins, and suppose that we assume that each of the four outcomes in the sample space S = {(H,H), (H,T), (T,H), (T,T)} is likely to appear, hence the probability is 1/4. Now, let E = {(H,H),(H,T)} and F = {(H,H), (T,H)}. What is the probability that either the first or the second coin falls head?

Page 14: IC 102: Data Analysis and Interpretation

Conditional Probability

● Often it is required to find the probability of an event E under the condition that an event F occurs. This probability is called the conditional probability of E given F and is denoted by P(E|F). In this case F serves as the new (reduced) sample space, and that probability is the fraction of P(F) which corresponds to P(E∩F). Thus we have:

P(E|F) = P(E∩F)/P(F)

Page 15: IC 102: Data Analysis and Interpretation

Example: Conditional Probability

● Suppose cards numbered one through ten are placed in a hat, mixed up, and then one of the cards is drawn. If we are told that the number on the drawn card is at least five, then what is the conditional probability that it is ten?

Page 16: IC 102: Data Analysis and Interpretation

Solution to Conditional Probability Example

● Let E denote the event that the number drawn is 10 and let F be the event that it is atleast 5. The desired probability is P(E|F).

P(E|F) = P(E∩F)/P(F)

P(EF) = P(E).....why?

P(E|F) = (1/10)/(6/10) = 1/6

Page 17: IC 102: Data Analysis and Interpretation

Examples continued...

● A family has two children. What is the conditional probability that both are boys given that at least one of them is a boy? Assume that the sample space S is given by S = {(b,b), (b,g), (g,b), (g,g)}, and all outcomes are equally likely.

Page 18: IC 102: Data Analysis and Interpretation

Examples continued...

● Anand can either take a course in computers or in chemistry. If Anand takes the computer course, then he will receive an A grade with probability 1/2; if he takes the chemistry course then he will receive an A grade with probability 1/3. Anand decides to base his decision on the flip of a fair coin. What is the probability that Anand will get an A in chemistry?

Page 19: IC 102: Data Analysis and Interpretation

Examples continued...

● Suppose an urn contains seven black balls and five white balls. We draw two balls from the urn without replacement. Assuming that each ball in the urn is equally likely to be drawn, what is the probability that both drawn balls are black?

Page 20: IC 102: Data Analysis and Interpretation

Independent Events

● Under what condition two events can be said to be independent?

Suppose we toss two fair dice. Let E1 denote the

event that the sum of the dice is six and F denote the event that the first die equals four. Are these two events independent?

Suppose E2 be the event that the sum of the

dice equals seven. Is E2 independent of F?

Page 21: IC 102: Data Analysis and Interpretation

Independent Events

● Two events E and F are said to be independent if

P(EF) = P(E)P(F)

Intutively, events E1, E

2, ...

E

n are independent if

knowledge of the occurence of any of these events has no effect on the probability of any other event.

Page 22: IC 102: Data Analysis and Interpretation

Example...

● Suppose we toss two fair dice. Let E denote the event that the sum of the dice is six and F denote the event that the first die equals four. Is E independent of F?

● Let G be the event that the sum of the dice equals seven. Is G independent of F?

Page 23: IC 102: Data Analysis and Interpretation

Baye's Formula

● There are two different interpretations:

1. Bayesian intrepretation – how a subjective degree of belief should rationally change to account for evidence

2. Frequentist intrepretation – any given experiment can be considered as one of an infinite sequence of possible repetitions of the same experiment, each capable of producing statistically independent results.

Thomas Bayes (1701 – 1761): Mathematician and Theologician

Page 24: IC 102: Data Analysis and Interpretation

Baye's Formula

● Let E and F be two events from a sample space S

E = EF U EFc

EF and EFc are mutually exclusive so,

P(E) = P(EF) + P(EFc)

P(E) = P(E|F)P(F) + P(E|Fc)P(Fc)

P(E) = P(E|F)P(F) + P(E|Fc)(1- P(F))

The above equation states that the probability of E is the weighted average of the conditional probability of E given that F has occured and conditional probability of E given that F has not occured.

Page 25: IC 102: Data Analysis and Interpretation

Some Examples...

● Consider two urns. The first contains two white and seven black balls, and the second contains five white and six black balls. We flip a fair coin and then draw a ball from the first urn or the second urn depending on whether the outcome was heads or tails. What is the conditional probability that the outcome of the toss was heads given that a white ball was selected?

● Let W be the event that a white ball is drawn, let H bet the even that the coin comes up heads.....find P(H|W)......

Page 26: IC 102: Data Analysis and Interpretation

Examples continued...

● In answering a question on a multiple-choice test a student either knows the answer or guesses. Let p be the probability that she knows the answer and (1-p) that she guesses. Assume that a student who guesses at the answer will be correct with probability 1/m, where m is the number of multiple-choice alternatives. What is the conditional probability that a student knew the answer to a question given that she answered it correctly?

● Let C and K denote respectively the event that the student answers the question correctly and the event that she actually knows the answer. Find P(K|C)....

Page 27: IC 102: Data Analysis and Interpretation

Examples continued...

● A laboratory blood test is 95% effective in detecting a certain disease when it is, in fact, present. However, the test also yields a “false positive” result for 1% of the healthy persons tested. If 0.5% of the population actually has the disease, what is the probability a person has the disease given that his test result is positive?

What are the two events one can consider?

Page 28: IC 102: Data Analysis and Interpretation

Back to Baye's Formula

● Let us generalize the above formula we learnt to...suppose that F

1, F

2, F

3,.....F

n are mutually

exclusive events such that

So we can write

(all mutually exclusive events)

Thus we can write:

U i=1n F i=S

E=U i=1n EFi

P E =∑i=1

nP EFi

P E =∑i=1n P E cond F iP Fi

Page 29: IC 102: Data Analysis and Interpretation

Baye's Formula continued...

● Suppose now that E has occured and we are interested in determining which one of the Fj has also occured...

P(Fj|E) = P(EF

j)/P(E)

P(Fj|E) = P(E|F

j)P(F

j)/ ∑i=1

n P E cond F iP Fi

Page 30: IC 102: Data Analysis and Interpretation

Review examples

● UP election will be held next month and , by polling a sample of the voting population, we are trying to predict whether UPA, NDA, SP or BSP, will prevail. Which of the following methods of selection is likely to yield a representative sample?

● Poll all people of voting age attending an IPL cricket game in Delhi.

● Poll all people of voting age leaving a fancy mall in Lucknow.

● Obtain a copy of the voter registration list, randomly choose 10,000 names, and question them.

● Use the results of television call-in poll, in which the station asked its listeners to call in and name their choice.

● Choose names from the telephone directory and call these people.

Page 31: IC 102: Data Analysis and Interpretation

Example continued...

● A researcher is trying to discover the average age at death for people in India. To obtain data, the obituary columns of the Times of India are read for 30 days, and the ages at death of people in India are noted. Do you think this approach will lead to a representative sample?

Page 32: IC 102: Data Analysis and Interpretation

Examples continued...

● Represent the following data in a stem and leaf plot and a histogram:

There are 14 measurements of the tensile strength of sheet steel in kg/mm2 recorded in order obtained and rounded to integer values.

78 81 83 84 86 87 87 89 89 89 89 90 91 99

absolute frequency, cumulative absolute frequencies and cumulative relative frequencies...identiy.

class intervals, class marks and relative class frequency.

Page 33: IC 102: Data Analysis and Interpretation

Examples continued...

● What is the range of the data set given to you?● What is the interquartile range?● Draw boxplot from the data set.