IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability

IBS-09-SL RM 501 – Ranjit Goswami

1

Basic Probability


2

Introduction

• Probability is the study of randomness and uncertainty.

• In the early days, probability was associated with games of chance (gambling).


3

Simple Games Involving Probability

Game: A fair die is rolled. If the result is 2, 3, or 4, you win $1; if it is 5, you win $2; but if it is 1 or 6, you lose $3.

Should you play this game?


4

Random Experiment

• a random experiment is a process whose outcome is uncertain.

Examples:

• Tossing a coin once or several times

• Picking a card or cards from a deck

• Measuring temperature of patients

• ...


5

Sample SpaceThe sample space is the set of all possible outcomes.

Simple EventsThe individual outcomes are called simple events.

EventAn event is any collectionof one or more simple events

Events & Sample Spaces


6

Example

Experiment: Toss a coin 3 times.

• Sample space

= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.

• Examples of events include

• A = {HHH, HHT,HTH, THH}

= {at least two heads}

• B = {HTT, THT,TTH}

= {exactly two tails.}


7

Basic Concepts (from Set Theory)

• The union of two events A and B, A B, is the event consisting of all outcomes that are either in A or in B or in both events.

• The complement of an event A, Ac, is the set of all outcomes in that are not in A.

• The intersection of two events A and B, A B, is the event consisting of all outcomes that are in both events.

• When two events A and B have no outcomes in common, they are said to be mutually exclusive, or disjoint, events.


8

Example

Experiment: toss a coin 10 times and the number of heads is observed.

• Let A = { 0, 2, 4, 6, 8, 10}.

• B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.

• A B= {0, 1, …, 10} = .

• A B contains no outcomes. So A and B are mutually exclusive.

• Cc = {6, 7, 8, 9, 10}, A C = {0, 2, 4}.


9

Rules

• Commutative Laws: • A B = B A, A B = B A

• Associative Laws: • (A B) C = A (B C )

• (A B) C = A (B C) .

• Distributive Laws:• (A B) C = (A C) (B C)

• (A B) C = (A C) (B C)


10

Venn Diagram

A BA∩B


11

Probability

• A Probability is a number assigned to each subset (events) of a sample space .

• Probability distributions satisfy the following rules:


12

Axioms of Probability

• For any event A, 0 P(A) 1.

• P() =1.

• If A1, A2, … An is a partition of A, then

P(A) = P(A1) + P(A2) + ...+ P(An)

(A1, A2, … An is called a partition of A if A1 A2 … An = A and A1, A2, … An are mutually exclusive.)


13

Properties of Probability

• For any event A, P(Ac) = 1 - P(A).

• If A B, then P(A) P(B).

• For any two events A and B,

P(A B) = P(A) + P(B) - P(A B).

For three events, A, B, and C,

P(ABC) = P(A) + P(B) + P(C) -

P(AB) - P(AC) - P(BC) + P(AB C).


14

Example

• In a certain population, 10% of the people are rich, 5% are famous, and 3% are both rich and famous. A person is randomly selected from this population. What is the chance that the person is

• not rich?

• rich but not famous?

• either rich or famous?


15

Intuitive Development (agrees with axioms)

• Intuitively, the probability of an event a could be defined as:

Where N(a) is the number that event a happens in n trialsWhere N(a) is the number that event a happens in n trials

Here We Go Again: Not So Basic Probability


17

More Formal:

• is the Sample Space:• Contains all possible outcomes of an experiment

• in is a single outcome

• A in is a set of outcomes of interest


18

Independence

• The probability of independent events A, B and C is given by:

P(A,B,C) = P(A)P(B)P(C)

A and B are independent, if knowing that A has happened A and B are independent, if knowing that A has happened does not say anything about B happeningdoes not say anything about B happening


19

Bayes Theorem

• Provides a way to convert a-priori probabilities to a-posteriori probabilities:


20

Conditional Probability

• One of the most useful concepts!

AABB


21

Bayes Theorem

• Provides a way to convert a-priori probabilities to a-posteriori probabilities:


22

Using Partitions:

• If events Ai are mutually exclusive and partition


23

Random Variables

• A (scalar) random variable X is a function that maps the outcome of a random event into real scalar values

X(X())


24

Random Variables Distributions

• Cumulative Probability Distribution (CDF):

• Probability Density Function (PDF):Probability Density Function (PDF):


25

Random Distributions:

• From the two previous equations:


26

Uniform Distribution

• A R.V. X that is uniformly distributed between x1 and x2 has density function:

XX11 XX22


27

Gaussian (Normal) Distribution

• A R.V. X that is normally distributed has density function:


28

Statistical Characterizations

• Expectation (Mean Value, First Moment):

•Second Moment:Second Moment:


29

Statistical Characterizations

• Variance of X:

• Standard Deviation of X:


30

Mean Estimation from Samples

• Given a set of N samples from a distribution, we can estimate the mean of the distribution by:


31

Variance Estimation from Samples

• Given a set of N samples from a distribution, we can estimate the variance of the distribution by:

Pattern Classification

Chapter 1: Introduction to Pattern Recognition (Sections 1.1-1.6)

• Machine Perception

• An Example

• Pattern Recognition Systems

• The Design Cycle

• Learning and Adaptation

• Conclusion


34

Machine Perception

• Build a machine that can recognize patterns:

• Speech recognition

• Fingerprint identification

• OCR (Optical Character Recognition)

• DNA sequence identification


35

An Example

• “Sorting incoming Fish on a conveyor according to species using optical sensing”

Sea bass

Species

Salmon


36

• Problem Analysis

• Set up a camera and take some sample images to extract features

• Length

• Lightness

• Width

• Number and shape of fins

• Position of the mouth, etc…

• This is the set of all suggested features to explore for use in our classifier!


37

• Preprocessing

• Use a segmentation operation to isolate fishes from one another and from the background

• Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features

• The features are passed to a classifier


38


39

• Classification

• Select the length of the fish as a possible feature for discrimination


40


41

The length is a poor feature alone!

Select the lightness as a possible feature.


42


43

• Threshold decision boundary and cost relationship

• Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!)

Task of decision theory


44

• Adopt the lightness and add the width of the fish

Fish xT = [x1, x2]

Lightness Width


45


46

• We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such “noisy features”

• Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:


47


48

• However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input

Issue of generalization!


49

Documents

IBS-09-SL RM 501 – Ranjit Goswami 1 Basic Probability