Stephan Hartmann London School of Economics · … and their problems. Normativism - challenged by insights from the history of science (e.g. Popper and the stability of normal science)

Bayesian Epistemology

Stephan HartmannLondon School of Economics

Motivation Bayesian Epistemology is a quite active

research program. It has many attractions, and I hope to

convince you of its superiority overreliablism…

Needless to say, there are differentapproaches to Bayesian Epistemology.

In this lecture and the next, I will present mytake on the subject.

OverviewI. Why Bayesianism?II. Probability and Its InterpretationsIII. The Elements of BayesianismIV. Success StoriesV. Further Topics and LimitationsVI. Modeling in ScienceVII. Bayesian Networks

I. Why Bayesianism?

Two traditions in philosophy of science

1. Normativism- Examples: Falsificationism, Bayesianism- Typically motivated on a priori grounds

2. Descriptivism- Examples: Kuhn, naturalized philosophies of science- Examine specific case studies that contribute to a better understanding of

science

… and their problems

. Normativism- challenged by insights from the history ofscience (e.g. Popper and the stability of

normal science)- often “too far away” from real science

2. Descriptivism- not clear how generalizable insights or

normative standards can be provided

Desiderata for a methodology ofscience

1. It should be normative and provide adefensible general account of scientificrationality.

2. It should provide a framework to illuminate“intuitively correct judgments in the history ofscience and explains the incorrectness ofthose judgments that seem clearly intuitivelyincorrect (and shed light on ‘grey cases’)”(John Worrall)

Goal

Develop a version of Bayeisanism that doesthe job.

This version if Bayesianism will help us toexplain features of the methodology ofscience by fitting it in a normative frameworktheory.

Which features are explained:- specific episodes, or- more general features?

II. Probability and Its Interpretations

Bayesianism uses probabilities to representthe degree of belief of an agent.

To understand this better, we have to getclearer about(i) what probabilities are, and(ii) what probabilities mean.

Note: Probabilities are mathematical objects,which are defined in an axiomatic way. Howcan these objects relate to something in theworld, i.e. outside the mathematical realm?

1. Probability Theory Let S be a collection of sentences, and P is a

probability function. It satisfies the Kolmogorovaxioms:1. P(A) ≥ 02. P(A) = 1 if A true in all models3. P(A ∨ B) = P(A) + P(B) if A, B mutually exclusive

Some consequences:1. P(¬A) = 1- P(A)2. P(A) = P(B) if A ⇔ B3. P(A ∨ B) = P(A) + P(B) - P(A, B); note: P(A, B) := P(A ∧ B)

Conditional Probabilities Definition:

P(A|B) = P (A, B)/P(B) if P(B) ≠ 0 Bayes’ Theorem:

P(B|A) = P(A|B) P(B)/P(A) = P(A|B) P(B)/[P(A|B) P(B) + P(A|¬B) P(¬B)] = P(B)/[P(B) + P(¬B) x]

with the likelihood ratiox := P(A|¬B)/P(A|B)

Conditional Independence

A and B are (unconditionally) independent iffP(A, B) = P(A) P(B) ⇔ P(A|B) = P(A) ⇔ P(B|A) = P(B)

Example: throw a dice; A = observe even number, B =observe number ≤ 4.

A is conditionally independent of B given C iffP(A|B, C) = P(A|C)

Example: A=yellow fingers, B=lung cancer,C=smoking

In symbols: Note: The relation is symmetrical:

⇔

Examples

1. What is the probability of throwing either 5 or 6?(One cannot have both, that is why the events aremutually exclusive.) Answer: the probability ofthrowing 6 plus the probability of throwing 5.

2. The general multiplication rule: P(B ∧ C) = P(B)P(C|B)

What is your chance of meeting someone who isboth good looking (B) and intelligent (C)? Answer:the probability of meeting someone who is goodlooking, P(B), multiplied by the probability that thisperson is also intelligent, P(C|B).

More example3. The probability of having rain tomorrow (C) does not depend on

today’s seminar taking place; the two events are independent.Your knowledge of Bayesianism tomorrow (hopefully!) doesdepend on that. These two events are not independent.

4. What is the probability of the GNP going up next year and thatthe sun shines tomorrow? Answer: just multiply the respectiveprobabilities, since the two events are independent.

5. What is the chance of picking a philosophical or entertainingbook when picking a book randomly from a box on the market?These two events are not mutually exclusive because somephilosophical books are entertaining. The answer is: Theprobability of picking a philosophical book plus the probability ofpicking a entertaining book minus the probability of picking onethat is both.

2. What are Probabilities?

So far, probabilities are only definedmathematically (through the axioms, i.eimplicitly).

But we want more: We have to specify whatprobabilities are and how the relate to eventsin the world.

(a) Classical interpretation (Laplace)Definition: The probability of an event is the ratio of

favourable cases to the number of equally possiblecases.

Examples:

• What is the probability of getting 6 when throwing adice? Answer 1/6.

• What is the probability of getting an even numberwhen throwing a dice? Answer: 3/6=1/2.

Problems:

The definition rests not only on possible outcomes but onequally possible outcomes. Example: biased coin (one forwhich the probability of getting heads is 1/3, for instance).How can we specify what equally possible outcomes are?

Answer: Laplace’s (in)famous principle of indifference,which says that two events are equally possible if we haveno reason to prefer one to the other.

Example: in the case of a perfectly symmetrical coin wehave no reason to prefer one side to the other.

Problem 1: Can we make this principle precise?

Problem 2: It can lead to inconsistencies because it canbe applied in different ways to the same situation yieldingincompatible values for the same probability.

Income

Final Mark

g

45 60 75

20,00060,000

100,000

Example:

Now pick a student at random.• What is her final mark? Applying the principle of indifference to the final markgives p=0.5 for a mark between 45 and 60 and also p=0.5 for one between 60and 75.• What is her income? Applying the principle of indifference to the ‘Income’gives p=0.5 for an income between 20,000 and 60,000 and also p=0.5 for onebetween 60,000 and 100,000.• Given that income and final mark are related by function g these two sets ofprobabilities are incompatible!

(b) Frequency InterpretationDefinition: The probability of an event is the relative

frequency with which it occurs.

Example: Toss a coin: H T H H T T H H H T H H T T TT H …

The probability for ‘heads’ in this process is therelative frequency with which ‘heads’ occurs in thissequence; that is, the number of heads divided by thetotal number of tosses.

Problems1. How long does the sequence have to be? Answer:

Frequencies are defined in the long run. But howlong is the long run? What are we to make ofinfinity?

2. How do we get the actual value of a probability ifwe need an infinite limit? Does the limit exist at all?

3. Single cases? (E.g. the probability that there is norain tomorrow)

Subjective ProbabilitiesUnlike classical and frequency interpretations, which take

probabilities to be features of the real world, the subjectiveinterpretation takes them to be degrees of belief of an agent.

Example: My degree of belief that it will rain tomorrow.

Problem: Individual degrees of belief may be such that the axiomsof probability are violated.

Example: Many people believe that the possibility of at leastgetting one six when throwing a dice three times is .5. This isfalse (Exercise: show why!).

Answer: Yes, but assume there are people who have beliefs thatare consistent with the axioms.

Then degrees of belief are indeed an interpretation of the calculus!

Subjective Probabilities (cont’d)• Whether or not such humans exist does not matter – it is a

normative account! (Compare deductive logic!) A set of beliefsthat violates the axioms of probability is said to be incoherent.

• Penalties for incoherence: Dutch books! A Dutch book is an setof bets such that, no matter what the outcome, the person loses.

• Example: James believes that outcome of the next toss of acoin will be heads with probability 2/3 and also believes that itwill come up tails with probability 2/3. So James should bewilling to bet at odds of 2 to 1 that the coin will come up headsand similarly for tails. That is, Problem: whatever the outcome,James loses 1 EURO.

• Theorem: A person is subject to Dutch books if, and only if, heholds an incoherent set of beliefs.

Problems• Where to get the values of the probabilities from?

The calculus does not provide us with values (withthe exception of trivial cases: logically necessary andcontradictory beliefs).

• There need be little connection between personalprobabilities and what goes on in the world. Whatthen are these probabilities good for? For instance,James can believe that the next coin toss will beheads with probability 9/10, but that does not make ahim a good gambler.

III. Textbook Bayesianism

What is Bayesianism?

Quantitative confirmation theory When (and how much) does a piece of

evidence E confirm a hypothesis H? Typically formulated in terms of probabilities =

subjective degrees of belief Normative theory (see Dutch books) Textbooks: Howson & Urbach: Scientific

Reasoning, and Earman: Bayes or Bust.

The mathematical machinery

Confirmation = positive relevance between Hand E

Start with a prior probability P(H) Updating-rule: Pnew(H) := P(H|E) By mathematics (“Bayes’ Theorem”) we get:

Pnew(H) = P(E|H) P(H)/P(E) H confirms E iff Pnew(H) > P(H) H disconfirms E iff Pnew(H) < P(H)

Some terminologyRemember: Pnew(H) = P(H) P(E|H) / P(E)

• P(H|E): Posterior probability: the probability of thehypothesis H judged in the light of evidence E.

• P(H): Prior probability: the probability of thehypothesis without taking evidence E into account.

• P(E|H): Likelihood of evidence E: the probability of Eunder the assumption that H is correct.

• P(E): Expectedness of E: the probability that E wouldobtain regardless of whether H is correct or not.

Examples of Textbook Bayesianism

Let’s assume that H entails E: P(H|E) = 1. Then Pnew(H) = P(H)/P(E) (i) It is easy to see that Bayesianism can

account for the insight that surprisingevidence (i.e. if P(E) is small) confirms better.

(ii) Let’s assume we have different pieces ofevidence: P(E) = P(E1, E2,…, En) = P(E1) ·P(E2|E1) · P(E3|E2, E1) ·… · P(En|En-1,…, E1).Hence, less correlated pieces of evidenceconfirm better: the variety-of-evidence thesis.

We conclude

Textbook Bayesianism explains very generalfeatures of science.

However, it turns out to be too general as itdoes not take into account de factoconstraints of the scientific practice (such asdependencies between measurementinstruments). Taking them into account mightlead to different conclusions.

Moreover, Textbook Bayesianism does nothave an account of what a scientific theory is.

Jon Dorling’s version

Jon Dorling aims at reconstructing specificepisodes in the history of science and fittingthem into the Bayesian apparatus.

To do so, one has to assign specificprobability values to the hypotheses andlikelihoods in question.

This variant of Textbook Bayesianism is toospecific.

Upshot

We need a version of Bayesianism that is nottoo general (to connect to the practice ofscience), and not too specific (to gain somephilosophical insight).

It would also be nice to have an account thathas a somewhat wider scope, i.e. an accountthat reaches beyond confirmation theory.

IV. (More) Success Stories• The larger picture seems appealing: evidence confirms

hypotheses as it raises their probabilities in a certain way.Moreover, the account provides a quantitative account ofconfirmation, that is, it affords us with a means todetermine the degree of to which a hypothesis is confirmed.

1. Bayesianism & hypothetico-deductivism

• Assume 1 > P(H) > 0 and 1 > P(E) > 0.• Since E is a logical consequence of H, P(E|H) = 1• Bayes’ theorem reduces to p(H|E) = p(H) / p(E).• Since P(E) < 1, P(H|E) is greater than p(H).• Hence every non-trivial deductive consequence of H

confirms H.• So the Bayesian account captures the intuition of the

H-D account that a hypothesis is confirmed (orcorroborated, as Popper preferred to say) by itsconsequences.

2. Popper’s risky predictions• Of two deductive consequences of a hypothesis, the

more improbable one confirms it more stronglybecause

P(H|E1) = P(H) / P(E1) > P(H) / P(E2) = P(H|E2) whereP(E1) < P(E2)

• Hence the Bayesian account captures the intuitionthat an unexpected (or unlikely) piece of evidenceconfirms the hypothesis stronger than evidence thatwas to be expected. And this was Popper’s point thatscientists must make risky predictions to test theirtheory.

An example from 19th century optics

Poisson deduced that the wave theory of lightpredicts that the shadow of a small circulardisk by a narrow beam has a white spot in themiddle. This was considered to be extremelyunlikely. So when Arago experimentallyconfirmed this prediction, the wave theoryquickly received widespread acceptability.

3. Evading the raven paradox• Since there are much more non-ravens in the world

than ravens, the prior probability of ‘x is not a raven &x is not black’ is much greater than probability of ‘x isa raven & x is black’. (This follows from ourbackground knowledge about the world.)

• But we have just seen that the more unlikely a pieceof evidence is, the more it confirms the hypothesis.

• Hence black ravens confirm the hypothesis that allravens are black much more than white shoes.

• Question: Is this a satisfying solution?

V. Further Topics andLimitations

1. Induction and convergence• The Bayesian approach to confirmation seems to be

able to circumvent Hume’s problem.

• Next case induction:(H) ‘all observed a are B’ → ‘the next observed a is B’.

• Weak projectability (WP):

limn→∞ p(Ban+1 | Ba1 & Ba2 & … & Ban) = 1.

• Claim: granted that the prior probability of H is notzero, WP follows from Bayesian confirmation theory.

Proof• Set E = Ba1 & Ba2 & … & Ban+1, and notice that p(E|H)

= 1, since E is a logical consequence of H. ThenBayes’ theorem reads

• P(H|E) = p(H) / p(E).

• Iterative expansion of the denominator yields:

• P(H| Ba1 & Ba2 & … & Ban+1) =

p(H) / p(Ba1) p(Ba2| Ba1) … p(Ban+1| Ba1 & Ba2 & … & Ban)

Proof (cont’d)• Now: if p(Ban+1| Ba1 & Ba2 & … & Ban) does not

approach 1 as n tends towards infinity, then the righthand side of the above expression tends towardsinfinity. But this contradicts the axioms of probability,which stipulate that 0 ≤ P(H| Ba1 & Ba2 & … & Ban+1)≤ 1. Therefore p(Ban+1| Ba1 & Ba2 & … & Ban) mustapproach 1 as n tends towards infinity.

• Hence limn→∞ p(Ban+1 | Ba1 & Ba2 & … & Ban) = 1,which is what we need as a reply to Hume.

Question: Does this really solve the problem ofinduction?

The problem of the priors

• The Bayesian account is quantitative. That is, it is aprocedure to calculate new probabilities from oldones. But in order to apply Bayes’ formula we needconcrete values for the expectedness of E, the priorprobability of H and the likelihood of E.

• Problem: Where do we get these values from?(‘problem of the priors’)

• An answer to this question depends on how oneunderstands the probabilities involved:1.) Objective-empirical (relative frequency)2.) Subjective empirical (degrees of belief)

1. Frequentism• At the beginning we don’t know whether a hypothesis is

true or not. So how do we get a frequency? The basic ideais that we place a hypothesis in a reference class of similarhypotheses. Then we look at how many of these weresuccessful in the past. The ratio of the successful ones tothe total number of hypotheses is the prior probability ofthe hypothesis.

• Problem: What is the relevant reference class? That is,what does it mean to say that a hypothesis H is similar toanother hypothesis? Is it a matter of mathematical form? Ifso, what about hypotheses that are not formulated in termsof mathematics?

2. Subjectivism• Idea: Probabilities are degrees of belief. Hence, the

priors are a measure of the initial degree of belief inthe truth of the hypothesis.

• Problems:

1. Again, how do we determine the initial degree ofbelief? Just ask scientists what they believe? Who,then, is qualified to have a view on the probability?

2. This is to admit a thoroughly subjective element ina theory of confirmation. Is this a sound procedure?

Replies on behalf of the subjectivist• Reply 1: Salmon’s ‘tempered personalism’ (Shimony,

Salmon) or objective Bayesianism (Jaynes,Williamson)

• Reply 2: ‘Washing out of the priors’: One can showthat even if different prior probabilities are assumed,as evidence accumulates the posterior probabilitiesget closer and closer to each other. That is, theinfluence of the prior probabilities on the posteriorprobabilities decreases as more evidence becomesavailable.

• Question: Is this enough to solve the problem of thepriors?

3. Zero priors• Problem: It follows immediately from Bayes’ formula

that the posterior probability p(H|E) will always equalzero if either the prior probability p(H) or thelikelihood p(E|H) equal zero. How can we know whichhypotheses (if any) should be assigned zeroprobability?

• Example: Swinburne, Grünbaum and the existenceof God.

4. The problem of the Old Evidence• Problem: we have always assumed so far that

p(E) < 1, i.e. that there must be some surprise to E.However, if E is already known when the theory isformulated, there is no surprise to E and we have:p(E) = p(E|H) = 1. Bayes’ formula then reads p(H|E)= p(H) and no confirmation takes place.

• This seems to contradict our intuitions aboutconfirmation as well as historical facts. The perihelionof Mercury was old news when Einstein came up withGTR but it was taken to be confirming the theory.

Possible solutions(1) Bite the bullet and admit that old evidence doesn’t

confirm.

(2) p(E) does not equal one because we have toconditionalise on background knowledge:

• Evaluate p(E) not with respect to someone’s actualbackground knowledge, but with respect tosomeone’s background knowledge would E not yetbe known.

• Consider what p(E) was for someone in the past, ata time when E was indeed not yet known.

Another solution of the OE problem(3) Admit that p(E) = 1 and propose a new criterion of

confirmation: what we are really after is theknowledge that H implies E, and this may not beknown even though E itself is. This is a new insightand hence confirms the theory. Then confirmationtakes place!

5. Conflict with actual scientificpractice• A quote from Glymour: ‘... probabilistic analyses [of

confirmation] remain at too great a distance from thehistory of scientific practice to be really informativeabout that practice, and in part they do so exactlybecause they are probabilistic.’

• In short: no scientist ever gives probabilisticarguments for a theory.

• Question: Can we dismiss Bayesian confirmationtheory on the basis that is that ‘real’ scientists do notreason in this way?

6. The updating rule is unrealistic

The updating rule is unrealistic as it supposesthat the evidence is certain.

Way out: Jeffrey conditionalisation Other problem: There are no convincing

Dutch book arguments for the updating rule.

Outline

I. Why Bayesianism?II. Probability and Its InterpretationsIII. The Elements of BayesianismIV. Success StoriesV. Further Topics and LimitationsVI. Modeling in ScienceVII. Bayesian Networks

VI. Modeling in Science

The ubiquity of models in science

Highschool: Bohr model of the atom, model ofthe pendulum,…

University: Standard Big Bang Model,Standard Model of particle physics,…

Even later: Physicists like Lisa Randall(Harvard) use models to learn about the mostfundamental question of the universe.

We observe a shift from theories to models inscience, and this shift is reflected in the workof philosophers of science in the last 25 years.

Theories and models

Scientists often use the words theory andmodel interchangeably.

Example: the Standard Model of particlephysics.

So how can one distinguish between theoriesand models?

Features of theories

Examples: Newtonian Mechanics, QuantumMechanics

General and unversal in scope Abstract Often difficult to solve (example: QCD) No idealizations involved (ideally...)

Features of models Examples: the model of a pendulum, the Bohr model

of the atom, gas models, the MIT Bag model,... Specific and limited in scope Concrete Intuitive and visualizable Can be solved Involve idealizing assumptions

Two kinds of models Models of a theory

- example: the treatment of the pendulum is a modelof Newtonian Mechanics- the theory acts as a modeling framework (and notmuch follows from the theory w/o specifying a model)

Phenomenological models- example: the MIT Bag model- there is no theory into which a model can beembedded.

Modeling in philosophy So why not constructing philosophical models? Some authors do this already, see, e.g., the work of

Brian Skyrms on the social contract and ClarkGlymour et al.’s work on causal discovery.

Goal: Show that models are also valuable tools forunderstanding the methodology of science.

More specifically: Construct Bayesian Networkmodels.

Glymour‘s distintction

Platonic tradition in philosophy: formulate necessaryand sufficient conditions for conepts such asknowledge, virtue and the good. Comes up withgeneral and universal theories, which aim atexplaining or accounting for everything... and end upexplaining nothing!

Euclidian tradition in philosophy: make idealizedassumptions and explore the consequences of theseassumptions... Just like scientists do it!

Should we go for the Euclidian program in philosophy?- perhaps not always, but sometimes- pluralism of methods

VII. Bayesian Networks

1. Probability Theory Let S be a collection of sentences, and P is a

probability function. It satisfies the Kolmogorovaxioms:1. P(A) ≥ 02. P(A) = 1 if A true in all models3. P(A ∨ B) = P(A) + P(B) if A, B mutually exclusive

Some consequences:1. P(¬A) = 1- P(A)2. P(A) = P(B) if (in all models) A ⇔ B3. P(A ∨ B) = P(A) + P(B) - P(A, B); note: P(A, B) := P(A ∧ B)

Conditional Probabilities Definition:

P(A|B) = P (A, B)/P(B) if P(B) ≠ 0 Bayes’ Theorem:

P(B|A) = P(A|B) P(B)/P(A) = P(A|B) P(B)/[P(A|B) P(B) + P(A|¬B) P(¬B)] = P(B)/[P(B) + P(¬B) x]

with the likelihood ratiox := P(A|¬B)/P(A|B)

Conditional Independence

A and B are (unconditionally) independent iffP(A, B) = P(A) P(B) ⇔ P(A|B) = P(A) ⇔ P(B|A) = P(B)

Example: throw a dice; A = observe even number, B =observe number ≤ 4.

A is conditionally independent of B given C iffP(A|B, C) = P(A|C)

Example: A=yellow fingers, B=lung cancer,C=smoking

In symbols: Note: The relation is symmetrical:

⇔

The Semi-Graphoid Axioms

The conditional independence relation satisfies the semi-graphoid axioms:

1. Symmetry:2. Decomposition:3. Weak Union:4. Contraction:

With these axioms, new conditional independencies can beobtained from known independencies.

Joint and Marginal Probability

To specify the joint probability distribution over twobinary propositional variables A, B, three probabilitieshave to be specified (if we do not have any furtherknowledge):E.g. P(A, B) = .2 , P(A, ¬B) = .1, P(¬A, B) =.6

Note: ΣA, B P(A, B) = 1 ⇒ P(¬A, ¬B) = .1 Marginal probability: P(A) = ΣB P(A, B)ν This allows us to calculate whatever marginal or

conditional probability we are interested in from thejoint probability:

ν P(A1,… Am|Am+1,…An) = P(A1,… An)/P(Am+1,… An)= P(A1,… An)/ΣAm+1, An P(A1,… An)

Representing a Joint ProbabilityDistribution

P(A)

P(¬A)

Representing… (cont’d)P(¬A, ¬B)

P(A) P(B)

2. Bayesian Networks Venn diagrams and the specification of all entries in

P(A1,…,An) are not the most efficient ways torepresent a joint probability distribution.

There is also a problem of computational complexity:Specifying the joint probability distribution over nvariables requires the specification of 2n-1 numbers.

The trick: Use information about conditionalindependencies that hold between (sets of) variables.This will reduce the number of numbers that have tobe specified (and stored in a computer).

An Example from Medicine

T: Patient has tuberculosis X: Positive X-ray Given information:

t := P(T) = .01p := P(X|T) = .95 = 1- P(¬X|T) = 1- rate of false negativesq := P(X|¬T) = .02 = rate of false positives

What is P(T|X)? ⇒Apply Bayes' TheoremP(T|X) = P(X|T) P(T) /[P(X|T) P(T) + P(X|¬T) P(¬T)] =

= p t /[p t + q (1-t)] = t/[t + (1-t) x] with x := q/p = .32

T X

P(T) = .1 P(X|T) = .95

P(X|¬T) = .2

Parlance:

“T causes X”

“T directly influences X”

A Bayesian Network Representation

A More Complicated (=Realistic) Scenario

Smoking Visit to …?

Lung CancerBronchitis Tuberculosis

Dyspnoea Positive X-Ray

P(S) = .23

P(B|S)=.3P(B|¬S)=.05

Directed Acyclic Graphs

A directed graph G (V, E) consists of a finite set ofnodes V and an irreflexive binary relation E on V.

A directed acyclic graph (DAG) is a directed graphwhich does not contain cycles.

Some Vocabulary

Parents of node A:par(A)

Ancestor Child node Descendents Non-Descendents Root node

A B

C

D

E

F

GH

The Parental Markov Condition

The Parental Markov Condition (PMC) requires that avariable is conditionally independent of its non-descendents given its parents.

Definition: A Bayesian Network is a DAG with aprobability distribution which respects the PMC.

Three Examples

B

C

A

A B

C

C

A B

“collider”“common cause”

Bayesian Networks at Work

How can one calculate probabilities with B.N.? Here is the answer:

P(A1,…,An) = Πi P(Ai|par(Ai))ν I.e. the joint probability distribution is determined by

the prior probability of the root nodes (par(A) = ∅)and the conditional probabilities of all other nodes.

ν This typically requires the specification of much lessnumbers than 2n-1.

An Example

P(H) = h

P(R) = r

P(REP|H, R) = 1

P(REP|¬H, R) = 0

P(REP|H, ¬R) = a

P(REP|¬H, ¬R) = a

H R

REP

P(H, R, REP) = P(H) P(R)P(REP|H, R)

What is P(H|REP)?

Some More Theory: d-Separation

There are more independencies in a BayesianNetwork than the ones accounted for by the ParentalMarkov Condition.

Is there a systematic way to find all independencesthat hold in a given Bayesian Network?

Yes! d-separation Let A, B, and C be sets of variables. Then:

iff C d-separates A from B. So what is d-separation?

Example 1

A B C

PMC ⇒

But it is also the case that

This does not follow from PMC.

It can, however, be derived from the symmetry axiom forsemi-graphoids.

Example 2

d-Separation

A path p is d-separated (or blocked) by (a set) Z iffthere is a node w satisfying either:1. w has converging arrows (u→w←v) and none of wor its descendents are in Z.2. w does not have converging arrows and w ∈ Z.

If Z blocks every path from X to Y, then Z d-separatesX from Y and .

How to Construct a Bayesian Network

1. Specify all relevant variables.2. Specify all conditional independences which hold

between them.3. Construct a Bayesian Network which exhibits these

independencies.4. Check other (perhaps unwanted) independencies

with the d-separation criterion. Modify the networks.5. Specify the prior probabilities of all root nodes and

the conditional probabilities of all other (child)nodes given their parents.

6. Calculate any probability you are interested in.

3. Modeling Partially ReliableInformation Sources

When we receive information from independent andpartially reliable sources, what is our degree ofconfidence that this information is true?

Independence? Partial reliability?

A. Independence

REPi

is independent of

F1, REP1, Fi-1, REPi-1, Fi+1, REPi+1, Fn, REPn

given Fi

B. Partial Reliability

To model partially reliable information sources,additional model assumptions have to be made.

Examine two models!

Model I: Fixed ReliabilityParadigm: Medical Testing

REP1 REP2 REP3

F2

F1F3

Model I: Fixed ReliabilityParadigm: Medical Testing

REP1 REP2 REP3

F2

F1F3

P(REPi|Fi) = pP(REPi|¬Fi) = q

for p > q

Measuring Reliability

Let’s assume that we get positive reports.p := P(REPi|Fi) (for all i = 1,…, n)q := P(REPi|¬Fi) with p > qr := 1 – q/p

Randomization Full Reliability 0-------------------------------------------------1

Model II: Variable Reliability,Fixed Randomization ParameterParadigm: Scientific Instruments

REP1 REP2 REP3

F2

F1F3

R2 RnR1

Model II: Variable Reliability,Fixed Randomization ParameterParadigm: Scientific Instruments

REP1 REP2 REP3

F2

F1F3

P(REPi|Fi, Ri) = 1P(REPi|¬Fi, Ri) = 0P(REPi|Fi, ¬Ri) = aP(REPi|¬Fi, ¬Ri) = a

R2 R3R1

P(Ri) = ρ

Model IIa: Testing One Hypothesis

REP1 REP2 REP3

HP(REPi|F, Ri) = 1P(REPi|¬F, Ri) = 0P(REPi|F, ¬Ri) = aP(REPi|¬F, ¬Ri) = a

R2 R3R1

P(Ri) = ρ

Taking Stock The Parental Markov Condition is part of the

definition of a Bayesian Network. The d-separation criterion helps us to identify all

conditional independences in a Bayesian Network. We constructed two basic models of partially reliable

information sources:- reliability endogenous (medical testing)- reliability exogenous (scientific instruments)

The Plan for Lecture III. Textbook BayesianismII. The Variety-of-Evidence ThesisIII. TestimonyIV. Coherence, Unification and SimplicityV. What is a Scientific Theory?VI. The Stability of Normal ScienceVII. Voting and Group Decision Making

Documents

Stephan Hartmann London School of Economics · … and their problems. Normativism - challenged by insights from the history of science (e.g. Popper and the stability of normal science)