Upload
madeleine-whitham
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
1
Structure and Uncertainty
Peter Green, University of Bristol, 10 July 2003
3
“If your experiment needs statistics, you ought to have done a better experiment”
Statistics and science
Ernest Rutherford (1871-1937)
5
Graphical models
Modelling
Inference
Mathematics
Algorithms
6
Markov chains
Graphical models
Contingencytables
Spatial statistics
Sufficiency
Regression
Covariance selection
Statisticalphysics
Genetics
AI
8
1. Modelling
Modelling
Inference
Mathematics
Algorithms
9
Structured systems
A framework for building models, especially probabilistic models, for empirical data
Key idea - – understand complex system– through global model– built from small pieces
• comprehensible• each with only a few variables • modular
12
Mendelian inheritance - a natural structured model
A
O
AB
A
O
AB AO
AO OO
OO
Mendel
13
Ion channelmodel
levels &variances
modelindicator
transitionrates
hiddenstate
data
binarysignal
Hodgson and Green, Proc Roy Soc Lond A, 1999
14
levels &variances
modelindicator
transitionrates
hiddenstate
data
binarysignal
O1 O2
C1 C2 C3
** *
*******
*
15
Gene expression using Affymetrix chips
20µm
Millions of copies of a specificoligonucleotide sequence element
Image of Hybridised Array
Approx. ½ million differentcomplementary oligonucleotides
Single stranded, labeled RNA sample
Oligonucleotide element
**
**
*
1.28cm
Hybridised Spot
Slide courtesy of Affymetrix
Expressed genes
Non-expressed genes
Zoom Image of Hybridised Array
16
Gene expression is a hierarchical process
• Substantive question• Experimental design• Sample preparation• Array design & manufacture• Gene expression matrix• Probe level data• Image level data
20
Mapping of rare diseases using Hidden Markov model
G & Richardson, 2002
Larynx cancer in females in France,1986-1993 (standardised ratios)
Posterior probabilityof excess risk
22
Probabilistic expert systems
23
2. Mathematics
Modelling
Inference
Mathematics
Algorithms
24
Graphical models
Use ideas from graph theory to• represent structure of a joint
probability distribution• by encoding conditional
independencies
D
EB
C
A
F
25
• Genetics– pedigree (family connections)
• Lattice systems– interaction graph (e.g. nearest
neighbours)• Gaussian case
– graph determined by non-zeroes in inverse variance matrix
Where does the graph come from?
26
1000
0300
0020
0001
A B C D
A B C D
A
B
C
D
Inverse of (co)variance matrix:
independent case
27
3210
2410
1121
0012
A B C D
non-zero
non-zero),|,cov( CADB
A B C D
A
B
C
D
Inverse of (co)variance matrix:
dependent case
Few links implies few parameters - Occam’s razor
29
Conditional independence
• X and Z are conditionally independent given Y if, knowing Y, discovering Z tells you nothing more about X: p(X|Y,Z) = p(X|Y)
• X Z Y
X Y Z
30
Conditional independence
as seen in data on perinatal mortality vs. ante-natal care….
Does survival depend on ante-natal care?
.... what if you know the clinic?
31
ante
clinic
survival
survival and clinic are dependent
and ante and clinic are dependent
but survival and ante are CI given clinic
Conditional independence
32
D
EB
C
A
F
Conditional independence provides a mathematical basis for splitting up a large system into smaller components
33
B
C
E
D
A
B
F
D
E
34
3. Inference
Modelling
Inference
Mathematics
Algorithms
36
Bayesian paradigm in structured modelling• ‘borrowing strength’• automatically integrates out all sources
of uncertainty• properly accounting for variability at all
levels• including, in principle, uncertainty in
model itself• avoids over-optimistic claims of certainty
38
Bayesian structured modelling• ‘borrowing strength’• automatically integrates out all
sources of uncertainty
• … for example in forensic statistics with DNA probe data…..
39 (thanks to J Mortera)
40
42
4. Algorithms
Modelling
Inference
Mathematics
Algorithms
43
Algorithms for probability and likelihood calculations
Exploiting graphical structure:• Markov chain Monte Carlo• Probability propagation (Bayes nets)• Expectation-Maximisation• Variational methods
44
Markov chain Monte Carlo
• Subgroups of one or more variables updated randomly,– maintaining detailed balance with
respect to target distribution
• Ensemble converges to equilibrium = target distribution ( = Bayesian posterior, e.g.)
45
Markov chain Monte Carlo
?
Updating ? - need only look at neighbours
46
Probability propagation
7 6 5
2 3 41
12
267 236 345626 36
2
form junction tree
47
rootroot
Message passing in junction tree
48
rootroot
Message passing in junction tree
52
Structured systems’ success stories include...
• Genomics & bioinformatics– DNA & protein sequencing,
gene mapping, evolutionary genetics
• Spatial statistics– image analysis, environmetrics,
geographical epidemiology, ecology
• Temporal problems– longitudinal data, financial time series,
signal processing