Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Bayesian Epistemology
Stephan HartmannLondon School of Economics
Motivation Bayesian Epistemology is a quite active
research program. It has many attractions, and I hope to
convince you of its superiority overreliablism…
Needless to say, there are differentapproaches to Bayesian Epistemology.
In this lecture and the next, I will present mytake on the subject.
OverviewI. Why Bayesianism?II. Probability and Its InterpretationsIII. The Elements of BayesianismIV. Success StoriesV. Further Topics and LimitationsVI. Modeling in ScienceVII. Bayesian Networks
I. Why Bayesianism?
Two traditions in philosophy of science
1. Normativism- Examples: Falsificationism, Bayesianism- Typically motivated on a priori grounds
2. Descriptivism- Examples: Kuhn, naturalized philosophies of science- Examine specific case studies that contribute to a better understanding of
science
… and their problems
. Normativism- challenged by insights from the history ofscience (e.g. Popper and the stability of
normal science)- often “too far away” from real science
2. Descriptivism- not clear how generalizable insights or
normative standards can be provided
Desiderata for a methodology ofscience
1. It should be normative and provide adefensible general account of scientificrationality.
2. It should provide a framework to illuminate“intuitively correct judgments in the history ofscience and explains the incorrectness ofthose judgments that seem clearly intuitivelyincorrect (and shed light on ‘grey cases’)”(John Worrall)
Goal
Develop a version of Bayeisanism that doesthe job.
This version if Bayesianism will help us toexplain features of the methodology ofscience by fitting it in a normative frameworktheory.
Which features are explained:- specific episodes, or- more general features?
II. Probability and Its Interpretations
Bayesianism uses probabilities to representthe degree of belief of an agent.
To understand this better, we have to getclearer about(i) what probabilities are, and(ii) what probabilities mean.
Note: Probabilities are mathematical objects,which are defined in an axiomatic way. Howcan these objects relate to something in theworld, i.e. outside the mathematical realm?
1. Probability Theory Let S be a collection of sentences, and P is a
probability function. It satisfies the Kolmogorovaxioms:1. P(A) ≥ 02. P(A) = 1 if A true in all models3. P(A ∨ B) = P(A) + P(B) if A, B mutually exclusive
Some consequences:1. P(¬A) = 1- P(A)2. P(A) = P(B) if A ⇔ B3. P(A ∨ B) = P(A) + P(B) - P(A, B); note: P(A, B) := P(A ∧ B)
Conditional Probabilities Definition:
P(A|B) = P (A, B)/P(B) if P(B) ≠ 0 Bayes’ Theorem:
P(B|A) = P(A|B) P(B)/P(A) = P(A|B) P(B)/[P(A|B) P(B) + P(A|¬B) P(¬B)] = P(B)/[P(B) + P(¬B) x]
with the likelihood ratiox := P(A|¬B)/P(A|B)
Conditional Independence
A and B are (unconditionally) independent iffP(A, B) = P(A) P(B) ⇔ P(A|B) = P(A) ⇔ P(B|A) = P(B)
Example: throw a dice; A = observe even number, B =observe number ≤ 4.
A is conditionally independent of B given C iffP(A|B, C) = P(A|C)
Example: A=yellow fingers, B=lung cancer,C=smoking
In symbols: Note: The relation is symmetrical:
⇔
Examples
1. What is the probability of throwing either 5 or 6?(One cannot have both, that is why the events aremutually exclusive.) Answer: the probability ofthrowing 6 plus the probability of throwing 5.
2. The general multiplication rule: P(B ∧ C) = P(B)P(C|B)
What is your chance of meeting someone who isboth good looking (B) and intelligent (C)? Answer:the probability of meeting someone who is goodlooking, P(B), multiplied by the probability that thisperson is also intelligent, P(C|B).
More example3. The probability of having rain tomorrow (C) does not depend on
today’s seminar taking place; the two events are independent.Your knowledge of Bayesianism tomorrow (hopefully!) doesdepend on that. These two events are not independent.
4. What is the probability of the GNP going up next year and thatthe sun shines tomorrow? Answer: just multiply the respectiveprobabilities, since the two events are independent.
5. What is the chance of picking a philosophical or entertainingbook when picking a book randomly from a box on the market?These two events are not mutually exclusive because somephilosophical books are entertaining. The answer is: Theprobability of picking a philosophical book plus the probability ofpicking a entertaining book minus the probability of picking onethat is both.
2. What are Probabilities?
So far, probabilities are only definedmathematically (through the axioms, i.eimplicitly).
But we want more: We have to specify whatprobabilities are and how the relate to eventsin the world.
(a) Classical interpretation (Laplace)Definition: The probability of an event is the ratio of
favourable cases to the number of equally possiblecases.
Examples:
• What is the probability of getting 6 when throwing adice? Answer 1/6.
• What is the probability of getting an even numberwhen throwing a dice? Answer: 3/6=1/2.
Problems:
The definition rests not only on possible outcomes but onequally possible outcomes. Example: biased coin (one forwhich the probability of getting heads is 1/3, for instance).How can we specify what equally possible outcomes are?
Answer: Laplace’s (in)famous principle of indifference,which says that two events are equally possible if we haveno reason to prefer one to the other.
Example: in the case of a perfectly symmetrical coin wehave no reason to prefer one side to the other.
Problem 1: Can we make this principle precise?
Problem 2: It can lead to inconsistencies because it canbe applied in different ways to the same situation yieldingincompatible values for the same probability.
Income
Final Mark
g
45 60 75
20,00060,000
100,000
Example:
Now pick a student at random.• What is her final mark? Applying the principle of indifference to the final markgives p=0.5 for a mark between 45 and 60 and also p=0.5 for one between 60and 75.• What is her income? Applying the principle of indifference to the ‘Income’gives p=0.5 for an income between 20,000 and 60,000 and also p=0.5 for onebetween 60,000 and 100,000.• Given that income and final mark are related by function g these two sets ofprobabilities are incompatible!
(b) Frequency InterpretationDefinition: The probability of an event is the relative
frequency with which it occurs.
Example: Toss a coin: H T H H T T H H H T H H T T TT H …
The probability for ‘heads’ in this process is therelative frequency with which ‘heads’ occurs in thissequence; that is, the number of heads divided by thetotal number of tosses.
Problems1. How long does the sequence have to be? Answer:
Frequencies are defined in the long run. But howlong is the long run? What are we to make ofinfinity?
2. How do we get the actual value of a probability ifwe need an infinite limit? Does the limit exist at all?
3. Single cases? (E.g. the probability that there is norain tomorrow)
Subjective ProbabilitiesUnlike classical and frequency interpretations, which take
probabilities to be features of the real world, the subjectiveinterpretation takes them to be degrees of belief of an agent.
Example: My degree of belief that it will rain tomorrow.
Problem: Individual degrees of belief may be such that the axiomsof probability are violated.
Example: Many people believe that the possibility of at leastgetting one six when throwing a dice three times is .5. This isfalse (Exercise: show why!).
Answer: Yes, but assume there are people who have beliefs thatare consistent with the axioms.
Then degrees of belief are indeed an interpretation of the calculus!
Subjective Probabilities (cont’d)• Whether or not such humans exist does not matter – it is a
normative account! (Compare deductive logic!) A set of beliefsthat violates the axioms of probability is said to be incoherent.
• Penalties for incoherence: Dutch books! A Dutch book is an setof bets such that, no matter what the outcome, the person loses.
• Example: James believes that outcome of the next toss of acoin will be heads with probability 2/3 and also believes that itwill come up tails with probability 2/3. So James should bewilling to bet at odds of 2 to 1 that the coin will come up headsand similarly for tails. That is, Problem: whatever the outcome,James loses 1 EURO.
• Theorem: A person is subject to Dutch books if, and only if, heholds an incoherent set of beliefs.
Problems• Where to get the values of the probabilities from?
The calculus does not provide us with values (withthe exception of trivial cases: logically necessary andcontradictory beliefs).
• There need be little connection between personalprobabilities and what goes on in the world. Whatthen are these probabilities good for? For instance,James can believe that the next coin toss will beheads with probability 9/10, but that does not make ahim a good gambler.
III. Textbook Bayesianism
What is Bayesianism?
Quantitative confirmation theory When (and how much) does a piece of
evidence E confirm a hypothesis H? Typically formulated in terms of probabilities =
subjective degrees of belief Normative theory (see Dutch books) Textbooks: Howson & Urbach: Scientific
Reasoning, and Earman: Bayes or Bust.
The mathematical machinery
Confirmation = positive relevance between Hand E
Start with a prior probability P(H) Updating-rule: Pnew(H) := P(H|E) By mathematics (“Bayes’ Theorem”) we get:
Pnew(H) = P(E|H) P(H)/P(E) H confirms E iff Pnew(H) > P(H) H disconfirms E iff Pnew(H) < P(H)
Some terminologyRemember: Pnew(H) = P(H) P(E|H) / P(E)
• P(H|E): Posterior probability: the probability of thehypothesis H judged in the light of evidence E.
• P(H): Prior probability: the probability of thehypothesis without taking evidence E into account.
• P(E|H): Likelihood of evidence E: the probability of Eunder the assumption that H is correct.
• P(E): Expectedness of E: the probability that E wouldobtain regardless of whether H is correct or not.
Examples of Textbook Bayesianism
Let’s assume that H entails E: P(H|E) = 1. Then Pnew(H) = P(H)/P(E) (i) It is easy to see that Bayesianism can
account for the insight that surprisingevidence (i.e. if P(E) is small) confirms better.
(ii) Let’s assume we have different pieces ofevidence: P(E) = P(E1, E2,…, En) = P(E1) ·P(E2|E1) · P(E3|E2, E1) ·… · P(En|En-1,…, E1).Hence, less correlated pieces of evidenceconfirm better: the variety-of-evidence thesis.
We conclude
Textbook Bayesianism explains very generalfeatures of science.
However, it turns out to be too general as itdoes not take into account de factoconstraints of the scientific practice (such asdependencies between measurementinstruments). Taking them into account mightlead to different conclusions.
Moreover, Textbook Bayesianism does nothave an account of what a scientific theory is.
Jon Dorling’s version
Jon Dorling aims at reconstructing specificepisodes in the history of science and fittingthem into the Bayesian apparatus.
To do so, one has to assign specificprobability values to the hypotheses andlikelihoods in question.
This variant of Textbook Bayesianism is toospecific.
Upshot
We need a version of Bayesianism that is nottoo general (to connect to the practice ofscience), and not too specific (to gain somephilosophical insight).
It would also be nice to have an account thathas a somewhat wider scope, i.e. an accountthat reaches beyond confirmation theory.
IV. (More) Success Stories• The larger picture seems appealing: evidence confirms
hypotheses as it raises their probabilities in a certain way.Moreover, the account provides a quantitative account ofconfirmation, that is, it affords us with a means todetermine the degree of to which a hypothesis is confirmed.
1. Bayesianism & hypothetico-deductivism
• Assume 1 > P(H) > 0 and 1 > P(E) > 0.• Since E is a logical consequence of H, P(E|H) = 1• Bayes’ theorem reduces to p(H|E) = p(H) / p(E).• Since P(E) < 1, P(H|E) is greater than p(H).• Hence every non-trivial deductive consequence of H
confirms H.• So the Bayesian account captures the intuition of the
H-D account that a hypothesis is confirmed (orcorroborated, as Popper preferred to say) by itsconsequences.
2. Popper’s risky predictions• Of two deductive consequences of a hypothesis, the
more improbable one confirms it more stronglybecause
P(H|E1) = P(H) / P(E1) > P(H) / P(E2) = P(H|E2) whereP(E1) < P(E2)
• Hence the Bayesian account captures the intuitionthat an unexpected (or unlikely) piece of evidenceconfirms the hypothesis stronger than evidence thatwas to be expected. And this was Popper’s point thatscientists must make risky predictions to test theirtheory.
An example from 19th century optics
Poisson deduced that the wave theory of lightpredicts that the shadow of a small circulardisk by a narrow beam has a white spot in themiddle. This was considered to be extremelyunlikely. So when Arago experimentallyconfirmed this prediction, the wave theoryquickly received widespread acceptability.
3. Evading the raven paradox• Since there are much more non-ravens in the world
than ravens, the prior probability of ‘x is not a raven &x is not black’ is much greater than probability of ‘x isa raven & x is black’. (This follows from ourbackground knowledge about the world.)
• But we have just seen that the more unlikely a pieceof evidence is, the more it confirms the hypothesis.
• Hence black ravens confirm the hypothesis that allravens are black much more than white shoes.
• Question: Is this a satisfying solution?
V. Further Topics andLimitations
1. Induction and convergence• The Bayesian approach to confirmation seems to be
able to circumvent Hume’s problem.
• Next case induction:(H) ‘all observed a are B’ → ‘the next observed a is B’.
• Weak projectability (WP):
limn→∞ p(Ban+1 | Ba1 & Ba2 & … & Ban) = 1.
• Claim: granted that the prior probability of H is notzero, WP follows from Bayesian confirmation theory.
Proof• Set E = Ba1 & Ba2 & … & Ban+1, and notice that p(E|H)
= 1, since E is a logical consequence of H. ThenBayes’ theorem reads
• P(H|E) = p(H) / p(E).
• Iterative expansion of the denominator yields:
• P(H| Ba1 & Ba2 & … & Ban+1) =
p(H) / p(Ba1) p(Ba2| Ba1) … p(Ban+1| Ba1 & Ba2 & … & Ban)
Proof (cont’d)• Now: if p(Ban+1| Ba1 & Ba2 & … & Ban) does not
approach 1 as n tends towards infinity, then the righthand side of the above expression tends towardsinfinity. But this contradicts the axioms of probability,which stipulate that 0 ≤ P(H| Ba1 & Ba2 & … & Ban+1)≤ 1. Therefore p(Ban+1| Ba1 & Ba2 & … & Ban) mustapproach 1 as n tends towards infinity.
• Hence limn→∞ p(Ban+1 | Ba1 & Ba2 & … & Ban) = 1,which is what we need as a reply to Hume.
Question: Does this really solve the problem ofinduction?
The problem of the priors
• The Bayesian account is quantitative. That is, it is aprocedure to calculate new probabilities from oldones. But in order to apply Bayes’ formula we needconcrete values for the expectedness of E, the priorprobability of H and the likelihood of E.
• Problem: Where do we get these values from?(‘problem of the priors’)
• An answer to this question depends on how oneunderstands the probabilities involved:1.) Objective-empirical (relative frequency)2.) Subjective empirical (degrees of belief)
1. Frequentism• At the beginning we don’t know whether a hypothesis is
true or not. So how do we get a frequency? The basic ideais that we place a hypothesis in a reference class of similarhypotheses. Then we look at how many of these weresuccessful in the past. The ratio of the successful ones tothe total number of hypotheses is the prior probability ofthe hypothesis.
• Problem: What is the relevant reference class? That is,what does it mean to say that a hypothesis H is similar toanother hypothesis? Is it a matter of mathematical form? Ifso, what about hypotheses that are not formulated in termsof mathematics?
2. Subjectivism• Idea: Probabilities are degrees of belief. Hence, the
priors are a measure of the initial degree of belief inthe truth of the hypothesis.
• Problems:
1. Again, how do we determine the initial degree ofbelief? Just ask scientists what they believe? Who,then, is qualified to have a view on the probability?
2. This is to admit a thoroughly subjective element ina theory of confirmation. Is this a sound procedure?
Replies on behalf of the subjectivist• Reply 1: Salmon’s ‘tempered personalism’ (Shimony,
Salmon) or objective Bayesianism (Jaynes,Williamson)
• Reply 2: ‘Washing out of the priors’: One can showthat even if different prior probabilities are assumed,as evidence accumulates the posterior probabilitiesget closer and closer to each other. That is, theinfluence of the prior probabilities on the posteriorprobabilities decreases as more evidence becomesavailable.
• Question: Is this enough to solve the problem of thepriors?
3. Zero priors• Problem: It follows immediately from Bayes’ formula
that the posterior probability p(H|E) will always equalzero if either the prior probability p(H) or thelikelihood p(E|H) equal zero. How can we know whichhypotheses (if any) should be assigned zeroprobability?
• Example: Swinburne, Grünbaum and the existenceof God.
4. The problem of the Old Evidence• Problem: we have always assumed so far that
p(E) < 1, i.e. that there must be some surprise to E.However, if E is already known when the theory isformulated, there is no surprise to E and we have:p(E) = p(E|H) = 1. Bayes’ formula then reads p(H|E)= p(H) and no confirmation takes place.
• This seems to contradict our intuitions aboutconfirmation as well as historical facts. The perihelionof Mercury was old news when Einstein came up withGTR but it was taken to be confirming the theory.
Possible solutions(1) Bite the bullet and admit that old evidence doesn’t
confirm.
(2) p(E) does not equal one because we have toconditionalise on background knowledge:
• Evaluate p(E) not with respect to someone’s actualbackground knowledge, but with respect tosomeone’s background knowledge would E not yetbe known.
• Consider what p(E) was for someone in the past, ata time when E was indeed not yet known.
Another solution of the OE problem(3) Admit that p(E) = 1 and propose a new criterion of
confirmation: what we are really after is theknowledge that H implies E, and this may not beknown even though E itself is. This is a new insightand hence confirms the theory. Then confirmationtakes place!
5. Conflict with actual scientificpractice• A quote from Glymour: ‘... probabilistic analyses [of
confirmation] remain at too great a distance from thehistory of scientific practice to be really informativeabout that practice, and in part they do so exactlybecause they are probabilistic.’
• In short: no scientist ever gives probabilisticarguments for a theory.
• Question: Can we dismiss Bayesian confirmationtheory on the basis that is that ‘real’ scientists do notreason in this way?
6. The updating rule is unrealistic
The updating rule is unrealistic as it supposesthat the evidence is certain.
Way out: Jeffrey conditionalisation Other problem: There are no convincing
Dutch book arguments for the updating rule.
Outline
I. Why Bayesianism?II. Probability and Its InterpretationsIII. The Elements of BayesianismIV. Success StoriesV. Further Topics and LimitationsVI. Modeling in ScienceVII. Bayesian Networks
VI. Modeling in Science
The ubiquity of models in science
Highschool: Bohr model of the atom, model ofthe pendulum,…
University: Standard Big Bang Model,Standard Model of particle physics,…
Even later: Physicists like Lisa Randall(Harvard) use models to learn about the mostfundamental question of the universe.
We observe a shift from theories to models inscience, and this shift is reflected in the workof philosophers of science in the last 25 years.
Theories and models
Scientists often use the words theory andmodel interchangeably.
Example: the Standard Model of particlephysics.
So how can one distinguish between theoriesand models?
Features of theories
Examples: Newtonian Mechanics, QuantumMechanics
General and unversal in scope Abstract Often difficult to solve (example: QCD) No idealizations involved (ideally...)
Features of models Examples: the model of a pendulum, the Bohr model
of the atom, gas models, the MIT Bag model,... Specific and limited in scope Concrete Intuitive and visualizable Can be solved Involve idealizing assumptions
Two kinds of models Models of a theory
- example: the treatment of the pendulum is a modelof Newtonian Mechanics- the theory acts as a modeling framework (and notmuch follows from the theory w/o specifying a model)
Phenomenological models- example: the MIT Bag model- there is no theory into which a model can beembedded.
Modeling in philosophy So why not constructing philosophical models? Some authors do this already, see, e.g., the work of
Brian Skyrms on the social contract and ClarkGlymour et al.’s work on causal discovery.
Goal: Show that models are also valuable tools forunderstanding the methodology of science.
More specifically: Construct Bayesian Networkmodels.
Glymour‘s distintction
Platonic tradition in philosophy: formulate necessaryand sufficient conditions for conepts such asknowledge, virtue and the good. Comes up withgeneral and universal theories, which aim atexplaining or accounting for everything... and end upexplaining nothing!
Euclidian tradition in philosophy: make idealizedassumptions and explore the consequences of theseassumptions... Just like scientists do it!
Should we go for the Euclidian program in philosophy?- perhaps not always, but sometimes- pluralism of methods
VII. Bayesian Networks
1. Probability Theory Let S be a collection of sentences, and P is a
probability function. It satisfies the Kolmogorovaxioms:1. P(A) ≥ 02. P(A) = 1 if A true in all models3. P(A ∨ B) = P(A) + P(B) if A, B mutually exclusive
Some consequences:1. P(¬A) = 1- P(A)2. P(A) = P(B) if (in all models) A ⇔ B3. P(A ∨ B) = P(A) + P(B) - P(A, B); note: P(A, B) := P(A ∧ B)
Conditional Probabilities Definition:
P(A|B) = P (A, B)/P(B) if P(B) ≠ 0 Bayes’ Theorem:
P(B|A) = P(A|B) P(B)/P(A) = P(A|B) P(B)/[P(A|B) P(B) + P(A|¬B) P(¬B)] = P(B)/[P(B) + P(¬B) x]
with the likelihood ratiox := P(A|¬B)/P(A|B)
Conditional Independence
A and B are (unconditionally) independent iffP(A, B) = P(A) P(B) ⇔ P(A|B) = P(A) ⇔ P(B|A) = P(B)
Example: throw a dice; A = observe even number, B =observe number ≤ 4.
A is conditionally independent of B given C iffP(A|B, C) = P(A|C)
Example: A=yellow fingers, B=lung cancer,C=smoking
In symbols: Note: The relation is symmetrical:
⇔
The Semi-Graphoid Axioms
The conditional independence relation satisfies the semi-graphoid axioms:
1. Symmetry:2. Decomposition:3. Weak Union:4. Contraction:
With these axioms, new conditional independencies can beobtained from known independencies.
Joint and Marginal Probability
To specify the joint probability distribution over twobinary propositional variables A, B, three probabilitieshave to be specified (if we do not have any furtherknowledge):E.g. P(A, B) = .2 , P(A, ¬B) = .1, P(¬A, B) =.6
Note: ΣA, B P(A, B) = 1 ⇒ P(¬A, ¬B) = .1 Marginal probability: P(A) = ΣB P(A, B)ν This allows us to calculate whatever marginal or
conditional probability we are interested in from thejoint probability:
ν P(A1,… Am|Am+1,…An) = P(A1,… An)/P(Am+1,… An)= P(A1,… An)/ΣAm+1, An P(A1,… An)
Representing a Joint ProbabilityDistribution
P(A)
P(¬A)
Representing… (cont’d)P(¬A, ¬B)
P(A) P(B)
2. Bayesian Networks Venn diagrams and the specification of all entries in
P(A1,…,An) are not the most efficient ways torepresent a joint probability distribution.
There is also a problem of computational complexity:Specifying the joint probability distribution over nvariables requires the specification of 2n-1 numbers.
The trick: Use information about conditionalindependencies that hold between (sets of) variables.This will reduce the number of numbers that have tobe specified (and stored in a computer).
An Example from Medicine
T: Patient has tuberculosis X: Positive X-ray Given information:
t := P(T) = .01p := P(X|T) = .95 = 1- P(¬X|T) = 1- rate of false negativesq := P(X|¬T) = .02 = rate of false positives
What is P(T|X)? ⇒Apply Bayes' TheoremP(T|X) = P(X|T) P(T) /[P(X|T) P(T) + P(X|¬T) P(¬T)] =
= p t /[p t + q (1-t)] = t/[t + (1-t) x] with x := q/p = .32
T X
P(T) = .1 P(X|T) = .95
P(X|¬T) = .2
Parlance:
“T causes X”
“T directly influences X”
A Bayesian Network Representation
A More Complicated (=Realistic) Scenario
Smoking Visit to …?
Lung CancerBronchitis Tuberculosis
Dyspnoea Positive X-Ray
P(S) = .23
P(B|S)=.3P(B|¬S)=.05
Directed Acyclic Graphs
A directed graph G (V, E) consists of a finite set ofnodes V and an irreflexive binary relation E on V.
A directed acyclic graph (DAG) is a directed graphwhich does not contain cycles.
Some Vocabulary
Parents of node A:par(A)
Ancestor Child node Descendents Non-Descendents Root node
A B
C
D
E
F
GH
The Parental Markov Condition
The Parental Markov Condition (PMC) requires that avariable is conditionally independent of its non-descendents given its parents.
Definition: A Bayesian Network is a DAG with aprobability distribution which respects the PMC.
Three Examples
B
C
A
A B
C
C
A B
“collider”“common cause”
Bayesian Networks at Work
How can one calculate probabilities with B.N.? Here is the answer:
P(A1,…,An) = Πi P(Ai|par(Ai))ν I.e. the joint probability distribution is determined by
the prior probability of the root nodes (par(A) = ∅)and the conditional probabilities of all other nodes.
ν This typically requires the specification of much lessnumbers than 2n-1.
An Example
P(H) = h
P(R) = r
P(REP|H, R) = 1
P(REP|¬H, R) = 0
P(REP|H, ¬R) = a
P(REP|¬H, ¬R) = a
H R
REP
P(H, R, REP) = P(H) P(R)P(REP|H, R)
What is P(H|REP)?
Some More Theory: d-Separation
There are more independencies in a BayesianNetwork than the ones accounted for by the ParentalMarkov Condition.
Is there a systematic way to find all independencesthat hold in a given Bayesian Network?
Yes! d-separation Let A, B, and C be sets of variables. Then:
iff C d-separates A from B. So what is d-separation?
Example 1
A B C
PMC ⇒
But it is also the case that
This does not follow from PMC.
It can, however, be derived from the symmetry axiom forsemi-graphoids.
Example 2
d-Separation
A path p is d-separated (or blocked) by (a set) Z iffthere is a node w satisfying either:1. w has converging arrows (u→w←v) and none of wor its descendents are in Z.2. w does not have converging arrows and w ∈ Z.
If Z blocks every path from X to Y, then Z d-separatesX from Y and .
How to Construct a Bayesian Network
1. Specify all relevant variables.2. Specify all conditional independences which hold
between them.3. Construct a Bayesian Network which exhibits these
independencies.4. Check other (perhaps unwanted) independencies
with the d-separation criterion. Modify the networks.5. Specify the prior probabilities of all root nodes and
the conditional probabilities of all other (child)nodes given their parents.
6. Calculate any probability you are interested in.
3. Modeling Partially ReliableInformation Sources
When we receive information from independent andpartially reliable sources, what is our degree ofconfidence that this information is true?
Independence? Partial reliability?
A. Independence
REPi
is independent of
F1, REP1, Fi-1, REPi-1, Fi+1, REPi+1, Fn, REPn
given Fi
B. Partial Reliability
To model partially reliable information sources,additional model assumptions have to be made.
Examine two models!
Model I: Fixed ReliabilityParadigm: Medical Testing
REP1 REP2 REP3
F2
F1F3
Model I: Fixed ReliabilityParadigm: Medical Testing
REP1 REP2 REP3
F2
F1F3
P(REPi|Fi) = pP(REPi|¬Fi) = q
for p > q
Measuring Reliability
Let’s assume that we get positive reports.p := P(REPi|Fi) (for all i = 1,…, n)q := P(REPi|¬Fi) with p > qr := 1 – q/p
Randomization Full Reliability 0-------------------------------------------------1
Model II: Variable Reliability,Fixed Randomization ParameterParadigm: Scientific Instruments
REP1 REP2 REP3
F2
F1F3
R2 RnR1
Model II: Variable Reliability,Fixed Randomization ParameterParadigm: Scientific Instruments
REP1 REP2 REP3
F2
F1F3
P(REPi|Fi, Ri) = 1P(REPi|¬Fi, Ri) = 0P(REPi|Fi, ¬Ri) = aP(REPi|¬Fi, ¬Ri) = a
R2 R3R1
P(Ri) = ρ
Model IIa: Testing One Hypothesis
REP1 REP2 REP3
HP(REPi|F, Ri) = 1P(REPi|¬F, Ri) = 0P(REPi|F, ¬Ri) = aP(REPi|¬F, ¬Ri) = a
R2 R3R1
P(Ri) = ρ
Taking Stock The Parental Markov Condition is part of the
definition of a Bayesian Network. The d-separation criterion helps us to identify all
conditional independences in a Bayesian Network. We constructed two basic models of partially reliable
information sources:- reliability endogenous (medical testing)- reliability exogenous (scientific instruments)
The Plan for Lecture III. Textbook BayesianismII. The Variety-of-Evidence ThesisIII. TestimonyIV. Coherence, Unification and SimplicityV. What is a Scientific Theory?VI. The Stability of Normal ScienceVII. Voting and Group Decision Making