1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter...

Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case Bayesian Parameter Estimation: General

Estimation Problems of Dimensionality Computational Complexity Component Analysis and Discriminants Hidden Markov Models

Chapter 3:Maximum-Likelihood and Bayesian

Parameter Estimation (part 2)

Pattern Classification, Chapter 3 3

3Bayesian Estimation (Bayesian learning

to pattern classification problems)

In MLE was supposed fixedIn BE is a random variableThe computation of posterior probabilities

P(i | x) lies at the heart of Bayesian classification

Goal: compute P(i | x, D)

Given the sample D, Bayes formula can be written as:

)|(P).,|x(P

)|(P).,|x(P),x|(P

To demonstrate the preceding equation, use:

)().,|(

)().,|(),|(

) this!provides sample (Training )|()(

)|,()|(

)|().|()|,(

Bayesian Parameter Estimation: Gaussian Case

Goal: Estimate using the a-posteriori density P( | D)

The univariate case: P( | D)

is the only unknown parameter

(0 and 0 are known!)

),N( ~ )P(

),N( ~ ) | P(x200

Reproducing density

Identifying (1) and (2) yields:

1kk )(P).|x(P

(1) d)(P).|(P)(P).|(P

)|(PDD

(2) ),(N~)|(P 2nn D

The univariate case P(x | D)P( | D) computedP(x | D) remains to be computed!

It provides:

(Desired class-conditional density P(x | Dj, j))

Therefore: P(x | Dj, j) together with P(j)

And using Bayes formula, we obtain the

Bayesian classification rule:

Gaussian is d)|(P).|x(P)|x(P DD

),(N~)|x(P 2n

)(P).,|x(PMax,x|(PMax jjjj

Bayesian Parameter Estimation: General Theory

P(x | D) computation can be applied to any situation in which the unknown density can be parametrized: the basic assumptions are:The form of P(x | ) is assumed known, but the

value of is not known exactlyOur knowledge about is assumed to be

contained in a known prior density P()The rest of our knowledge is contained in a

set D of n random variables x1, x2, …, xn that follows P(x)

The basic problem is:

“Compute the posterior density P( | D)”

then “Derive P(x | D)”

Using Bayes formula, we have:

And by independence assumption:

)|x(P)|(P k

,d)(P).|(P)(P).|(P

Problems of DimensionalityProblems involving 50 or 100 features

(binary valued)Classification accuracy depends upon the

dimensionality and the amount of training dataCase of two classes multivariate normal with the

same covariance

0)error(Plim

)()(r :where

)error(P

If features are independent then:

Most useful features are the ones for which the difference between the means is large relative to the standard deviation

It has frequently been observed in practice that, beyond a certain point, the inclusion of additional features leads to worse rather than better performance: we have the wrong model !

),...,,(diag

Computational Complexity

Our design methodology is affected by the computational difficulty“big oh” notation

f(x) = O(h(x)) “big oh of h(x)”

(An upper bound on f(x) grows no worse than h(x) for sufficiently large x!)

f(x) = 2+3x+4x2

g(x) = x2

f(x) = O(x2)

)x(hc)x(f ;)x,c( 02

“big oh” is not unique!

f(x) = O(x2); f(x) = O(x3); f(x) = O(x4)

“big theta” notationf(x) = (h(x))If:

f(x) = (x2) but f(x) (x3) )x(gc)x(f)x(gc0

xx ;)c,c,x(

Complexity of the ML Estimation

Gaussian priors in d dimensions classifier with n training samples for each of c classes

For each category, we have to compute the discriminant function

Total = O(d2..n)Total for c classes = O(cd2.n) O(d2.n)

Cost increase when d and n are large!

)n(O)n.2d(O

)1(O)2d.n(O

1t)n.d(O

)(Plnˆln21

)ˆx()ˆx(21

Component Analysis and DiscriminantsCombine features in order to reduce the

dimension of the feature spaceLinear combinations are simple to compute

and tractableProject high dimensional data onto a lower

dimensional spaceTwo classical approaches for finding “optimal”

linear transformationPCA (Principal Component Analysis) “Projection

that best represents the data in a least- square sense”

MDA (Multiple Discriminant Analysis) “Projection that best separates the data in a least-squares sense”

Hidden Markov Models: Markov ChainsGoal: make a sequence of decisions

Processes that unfold in time, states at time t are influenced by a state at time t-1

Applications: speech recognition, gesture recognition, parts of speech tagging and DNA sequencing,

Any temporal process without memoryT = {(1), (2), (3), …, (T)} sequence of statesWe might have 6 = {1, 4, 2, 2, 1, 4}

The system can revisit a state at different steps and not every state need to be visited

First-order Markov models

Our productions of any sequence is described by the transition probabilities

P(j(t + 1) | i (t)) = aij

= (aij, T)

P(T | ) = a14 . a42 . a22 . a21 . a14 . P((1) = i)

Example: speech recognition

“production of spoken words”

Production of the word: “pattern” represented by phonemes

/p/ /a/ /tt/ /er/ /n/ // ( // = silent state)

Transitions from /p/ to /a/, /a/ to /tt/, /tt/ to er/, /er/ to /n/ and /n/ to a silent state

1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter...

Documents

PARAMETER ESTIMATION VIA BAYESIAN INVERSION: THEORY ...d-scholarship.pitt.edu/20115/1/soncinirm_etd2013_Draft2.pdf · PARAMETER ESTIMATION VIA BAYESIAN INVERSION: THEORY, METHODS,

Ocean Ecosystem Model Parameter Estimation in a Bayesian Hierarchical Model (BHM)

Modeling and Bayesian parameter estimation for shape memory

A Hierarchical Bayesian Approach for Parameter Estimation ... · A Hierarchical Bayesian Approach for Parameter Estimation in HIV Models H. T. Banks ⁄, Sarah Grove , Shuhua Hu ,

Sequential Bayesian Estimation of Parameters CASIS 2015 · 2015. 9. 17. · Sequential Bayesian Parameter Estimation (1/2) Batch version: Estimate the posterior distribution Pr[Θ

Dual-purpose Bayesian design for parameter …Dual-purpose Bayesian design for parameter estimation and model discrimination of models with intractable likelihoods M. B. Dehideniya

Comparison of sampling techniques for Bayesian parameter estimation · 2014-01-07 · Comparison of sampling techniques for Bayesian parameter estimation Rupert Allison?, ... MCMC

Bayesian methods for parameter estimation and model comparison · Bayesian methods for parameter estimation and model comparison Carson C Chow, LBM, NIDDK, NIH Monday, April 26, 2010

M L D M Chapter 4 Non-Parameter Estimationmima.sdu.edu.cn/Members/xinshunxu/Courses/ML/Chapter4.pdf · Maximum-Likelihood Estimation Bayesian Parameter Estimation Problems: The assumed

Bayesian parameter estimation of core collapse supernovae ...nchriste/0266-5611_30_11_114008.pdf · Bayesian parameter estimation of core collapse supernovae using gravitational wave

Chapter 3 (part 2): Maximum-Likelihood and …1 Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation •Bayesian Estimation (BE) •Bayesian Parameter Estimation:

Accurate parameter estimation for Bayesian network classifiers … · 2018. 7. 31. · 1304 Mach Learn (2018) 107:1303–1331 Keywords Bayesian network · Parameter estimation ·

ML Parameter Estimation for Markov Random Fields, with ...bouman/publications/pdf/tip10.pdf · ML Parameter Estimation for Markov Random Fields, with Applications to Bayesian Tomography

Real-Time Bayesian Parameter Estimation for Item Response

Astrostats 2013 Lecture 1 Bayesian parameter estimation ... · Lecture 1: Bayesian parameter estimation and model comparison 6 function is symmetric, the mean and mode are 0.5, and

Bayesian Estimation of the Multifractality Parameter for ... · ML and Bayesian estimation. Examples of ML and Bayesian estimators for 1D fBm formulated in the spectral or wavelet

Patternrose/768/ppt/DHSch3part2.pdf · Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2) Pattern Classification, Chapter 3 2 2 zBayesian Estimation (Bayesian

ML Parameter Estimation for Markov Random Fields, with Applications to Bayesian Tomographybouman/publications/pdf/... · 1996-12-04 · ML Parameter Estimation for Markov Random Fields,

Bayesian estimation of model parameter valueteaching.bioinformatics.dtu.dk/.../e/ec/Handout.class08.pdf · 2016-04-06 · Bayesian estimation of model parameter value In this exercise

A Bayesian Approach for Parameter Estimation and ......v A Bayesian Approach for Parameter Estimation and Prediction using a DFT Model for Binding Energies Dave Higdon, Jordan McDonnell,