: INTRODUCTION TO Machine Learning Parametric Methods

姓名 : 李政軒

INTRODUCTION TO Machine Learning Parametric Methods

Parametric Estimation

X = { xt }t where xt ~ p (x)Parametric estimation:

Assume a form for p (x |q ) and estimate q , its sufficient statistics, using X N ( μ, σ2) where q = { μ, σ2}

Maximum Likelihood EstimationLikelihood of q given the sample X

l (θ|X) = p (X |θ) = ∏t p (xt|θ)

Log likelihood L(θ|X) = log l (θ|X) = ∑

t log p (xt|θ)

Maximum likelihood estimatorθ* = argmaxθ L(θ|X)

Examples: Bernoulli/MultinomialBernoulli: Two states, failure/success, x in

{0,1} P (x) = po

x (1 – po ) (1 – x)

L (po|X) = log ∏t po

xt (1 – po ) (1 – xt)

MLE: po = ∑t xt / N

Multinomial: K>2 states, xi in {0,1}P (x1,x2,...,xK) = ∏

L(p1,p2,...,pK|X) = log ∏t ∏

MLE: pi = ∑t xi

Gaussian (Normal) Distribution

p(x) = N ( μ, σ2)

MLE for μ and σ2:

xp exp

Bias and Variance

Unknown parameter qEstimator di = d (Xi) on sample Xi

Bias: bq(d) = E [d] – qVariance: E [(d–E [d])2]

Mean square error: r (d,q) = E [(d–q)2]

= (E [d] – q)2 + E [(d–E [d])2]

= Bias2 + Variance

Bayes’ Estimator

Treat θ as a random var with prior p (θ)Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθMaximum a Posteriori (MAP): θMAP =

argmaxθ p(θ|X)

Maximum Likelihood (ML): θML = argmaxθ p(X|θ)

Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

Parametric Classification

CPCxpxg

log| log or

log log log

Given the sample

ML estimates are

Discriminant becomes

tt ,rx 1}{ X

if ijx

iii CP

sxg ˆ log log log

Parametric Classification

Parametric Classification (a)and(b) for two classes when the input is one-

dimensional. Variances are equal and the posteriors intersect at one point, which is the threshold if decision.

Parametric Classification (a)and(b) for two classes when the input is one-dimensional.

Variances are unequal and the posteriors intersect at two points. In (c), the expected risks are shown for the two classes and for reject with

Regression

|:estimator

log| log

, log|XL

Regression: From LogL to Error

|exp log|

Linear Regression

0101 wxwwwxg tt ,|

xwxwxr

Polynomial Regression

2012 wxwxwxwwwwwxg ttktkk

t ,,,,|

xxxxxx

rw TT DDD1

Square Error:

Relative Square Error:

Absolute Error: E (θ |X) = ∑t |rt – g(xt| θ)|

ε-sensitive Error: E (θ |X) = ∑ t

1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| –

Other Error Measures

tt xgrE ||X

Bias and Variance

222 ||| xgExgExgExrExxgxrEE XXXX bias variance

222 xgxrExxrErExxgrE ||||

noise squared error

Estimating Bias and Variance

xgxgNM

Variance

M samples Xi={xti , rt

i}, i=1,...,M

are used to fit gi (x), i =1,...,M

Bias/Variance Dilemma

Example: gi(x)=2 has no variance and high bias

gi(x)= ∑t rt

i/N has lower bias with variance

As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with

Bias/Variance Dilemma

(a) Function, f(x) = 2sin(1.5x), and one noisy (N(0,1)) dataset sampled from the function. Five samples are taken, each containing twenty in-stances. (b), (c), (d) are five polynomial fits, namely, gi(.), of order 1, 3 and 5. for each case, dotted line is the average of the five fits namely, .

Polynomial Regression

Best fit “min error”

In the same setting as that of previous, using one hundred models instead of five, bias, variance, and error for polynomials of order 1 to 5.

Model Selection

Cross-validation: Measure generalization accuracy by testing on data unused during training

Regularization: Penalize complex modelsE’=error on data + λ model

complexityAkaike’s information criterion (AIC), Bayesian information criterion (BIC)

Minimum description length (MDL): Kolmogorov complexity, shortest description of data

Structural risk minimization (SRM)

Best fit, “elbow”

Model Selection

Bayesian Model Selection

model model|datadata|model

Prior on models, p(model)

Regularization, when prior favors simpler models

Bayes, MAP of the posterior, p(model|data)Average over a number of models with high

posterior

Regression example

Coefficients increase in magnitude as order increases:1: [-0.0769, 0.0016]2: [0.1682, -0.6657, 0.0080]3: [0.4238, -2.5778, 3.4675, -0.00024: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]

tt wxgrEtionregulariza 22

1 ww ||: X

: INTRODUCTION TO Machine Learning Parametric Methods

Documents

What Are Non Parametric Methods!

Application of ambient analysis techniques for the ... › ~vanfrl › documents › ... · non-parametric and/or parametric spectral estimation methods. From the manyavailable methods,

Anomaly Detection Systems. 2/86 Contents Statistical methods –parametric –non-parametric (clustering) Systems with learning

Power Spectrum Density Estimation Methods for Michelson ......Conventional FFT-based non-parametric methods are widely used for this purpose. However, However, non-parametric methods

Parametric and Nonparametric Statistical Methods for

PARAMETRIC STUDY OF MACHINE FOUNDATION SUPPORTING

PARAMETRIC AND OPTIMAL DESIGN OF MODULAR MACHINE TOOLS

CAN PARAMETRIC STATISTICAL METHODS BE TRUSTED · PDF fileCAN PARAMETRIC STATISTICAL METHODS BE TRUSTED FOR FMRI BASED GROUP STUDIES? ... bDivision of Statistics and Machine Learning,

Non-parametric Methods for Correlation Analysis in

parametric and non-parametric regression methods in identifying an

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

Parametric and Nonparametric Population Methods: Their

Non-parametric Bayesian Methods - Cambridge Machine Learning …mlg.eng.cam.ac.uk/tutorials/07/zg.pdf · 2007-07-02 · Non-parametric Bayesian Models •Bayesian methods are most

Consumer Behavior Prediction using Parametric and Nonparametric Methods

Machine Learning 5. Parametric Methods

STATISTICAL PARAMETRIC AND NON-PARAMETRIC METHODS …aei.pitt.edu › 91450 › 1 › 4282.pdf · 2017-10-05 · STATISTICAL PARAMETRIC AND NON-PARAMETRIC METHODS OF DETERMINING THE

Non-parametric Bayesian Methods

الاحصاء اللامعلمي Non-parametric methods

Non-Parametric Methods and Support Vector Machinesshwu/courses/ml/slides/07_KNN_SVM.pdf · Non-Parametric Methods K-NN method is a special case of non-parametric (or memory-based)methods

1 Today’s lecture Inferential methods - review –Bayesian – frequentist –Parametric, non-parametric, semi-parametric A more modern approach to non-parametric