CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning...

CHAPTER 4: Parametric Methods

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Parametric Estimation X = { xt }t where xt ~ p (x) Parametric estimation:

Assume a form for p (x | θ) and estimate θ, its sufficient statistics, using Xe.g., N ( μ, σ2) where θ = { μ, σ2}

Problem: How can we obtain θ from X? Assumption: X contains samples of a one-

dimensional random variable Later multivariate estimation: X contains

multiple and not only a single measurement.

Maximum Likelihood Estimation

Density function p with parameters θ is given and xt~p (X |θ)

Likelihood of θ given the sample Xl (θ|X) = p (X |θ) = ∏

t p (xt|θ)

We look θ for that “maximizes the likelihood of the sample”! Log likelihood

L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

Maximum likelihood estimator (MLE)θ* = argmaxθ L(θ|X)

Homework: Sample: 0, 3, 3, 4, 5 and x~N(,)? Use MLE to find(,)!

Examples: Bernoulli/Multinomial Bernoulli: Two states, failure/success, x in {0,1}

P (x) = pox (1 – po )

(1 – x)

L (po|X) = log ∏t po

xt (1 – po ) (1 – xt)

MLE: po = ∑t xt / N

Multinomial: K>2 states, xi in {0,1}

P (x1,x2,...,xK) = ∏i pi

L(p1,p2,...,pK|X) = log ∏t ∏

MLE: pi = ∑t xi

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5

Gaussian (Normal) Distribution

1 x-xp

p(x) = N ( μ, σ2)

MLE for μ and σ2:

Bias and Variance

Unknown parameter θEstimator di = d (Xi) on sample Xi

Bias: bθ(d) = E [d] – θVariance: E [(d–E [d])2]

Mean square error of the estimator d: r (d,θ) = E [(d–θ)2]

= (E [d] – θ)2 + E [(d–E [d])2]= Bias2 + Variance

Error in the Model itself Variation/randomness of the model

Bayes’ Estimator Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) * p(θ) / p(X) Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X)

Maximum Likelihood (ML): θML = argmaxθ p(X|θ)

Bayes’ Estimator: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

Comments: ML just takes the maximum value of the density function Compared with ML, MAP additionally considers priors Bayes’ estimator averages over all possible values of θ which are

weighted by their likelihood to occur (which is measured by a probability distribution p(θ)).

For MAP see: http://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation

Bayes’ Estimator: Example

xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)

θML = m

θMAP = θBayes’ =

σ: converges to m

Parametric Classification

CPCxpxg

log| log

lyequivalent or

log2 log21

kind of p(Ci|x)

Given the sample

ML estimates are

Discriminant becomes

tt,rx 1}{ X

, if 0

iii CP̂

mxsxg log

2 log2 log

Equal variances

Single boundary athalfway between means

Variances are different

Two boundaries

Homework!

Regression

|:for estimator

log| log

log|XLMaximizing the probabilityof the sample again!

Regression: From LogL to Error

1 log|

Skip to 20!

Linear Regression 0101| wxww,wxg tt

xwxwxr

yw 1ARelationship to what we discussed in Topic2??

Polynomial Regression

2012| wxwxwxww,w,w,,wxg ttktkk

rw TT DDD1

Here we get k+1 equations with k+1 unknowns!

Other Error Measures

Square Error:

Relative Square Error:

Absolute Error: E (θ|X) = ∑t |rt – g(xt|θ)|

ε-sensitive Error:

E (θ|X) = ∑ t 1(|rt – g(xt|θ)|>ε) (|rt –

g(xt|θ)| – ε)

tt xgrE X

xgrE X

Bias and Variance

222 ||| xgExgExgExrExxgxrEE XXXX bias variance

222 |||| xgxrExxrErExxgrE

noise squared error

To be revisited next week!

Estimating Bias and Variance

M samples Xi={xti , rt

i}, i=1,...,M

are used to fit gi (x), i =1,...,M

xgxgNM

1Variance

Initially skip!

Bias/Variance Dilemma

Example: gi(x)=2 has no variance and high bias

gi(x)= ∑t rt

i/N has lower bias with variance

As we increase complexity, bias decreases (a better fit to data) and variance increases (fit varies more with

data) Bias/Variance dilemma: (Geman et al., 1992)

variance

Already visited as Topic4!

Polynomial Regression

Best fit “min error”

Model Selection

Cross-validation: Measure generalization accuracy by testing on data unused during training

Regularization: Penalize complex modelsE’=error on data + λ model complexity

Akaike’s information criterion (AIC), Bayesian information criterion (BIC)

Minimum description length (MDL): Kolmogorov complexity, shortest description of data

Structural risk minimization (SRM)

Remark: will be discussed in more depth later: Topic 11

Bayesian Model Selection

Prior on models, p(model)

Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high

posterior (voting, ensembles: Chapter 15)

model model|datadata|model

CHAPTER 5:

Multivariate Methods

Multivariate Data

Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples

Multivariate Parameters

ijijji

,...,E

Corr :nCorrelatio

Cov:Covariance

:Mean 1μx

μμ XXX

Parameter Estimation

mxmxs:

d,...,i,N

matrix nCorrelatio

matrix Covariance

1 mean Sample

Multivariate Normal Distribution

μxμxx

1212 2

Mahalanobis distance between x and

http://www.analyzemath.com/Calculators/inverse_matrix_3by3.html

Multivariate Normal Distribution Mahalanobis distance: (x – μ)T ∑–1 (x – μ)

measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

iiii xz

zzzzxxp

2212122

Remark: is the correlation between the two variables

Called z-score zi for xi

Bivariate Normal

Independent Inputs: Naive Bayes If xi are independent, offdiagonals of ∑ are 0,

Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:

If variances are also equal, reduces to Euclidean distance

Parametric Classification

If p (x | Ci ) ~ N ( μi , ∑i )

Discriminant functions are

i/diCp μxμxx 1

212 21

log| log

μΣμΣ xx

Estimation of Parameters

iii CP̂g log21

log21 1 mxmxx SS

Different Si

Quadratic discriminant

log log21

log221

mmmxxxx

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Common Covariance Matrix S

Shared common sample covariance S

Discriminant reduces to

which is a linear discriminant

iCP̂ SS

ii CP̂g log21 1 mxmxx S

Initially skip!

Common Covariance Matrix SInitially skip!

Diagonal S

When xj j = 1,..d, are independent, ∑ is diagonal

p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)

Classify based on weighted Euclidean distance (in sj units) to the nearest mean

i CP̂s

mxg log

Likely covered in April!

Diagonal S

variances may bedifferent

Diagonal S, equal variances

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean

Each mean can be considered a prototype or template and this is template matching

CP̂mxs

Diagonal S, equal variances

Model Selection

As we increase complexity (less restricted S), bias decreases and variance increases

Assume simple models (allow some bias) to control variance (regularization)

Assumption Covariance matrix No of parameters

Shared, Hyperspheric Si=S=s2I 1

Shared, Axis-aligned Si=S, with sij=0 d

Shared, Hyperellipsoidal Si=S d(d+1)/2

Different, Hyperellipsoidal

Si K d(d+1)/2

Discrete Features

Binary features:if xj are independent (Naive Bayes’)

the discriminant is linear

ijjijj

CPpxpx

log1 log 1 log

log| log

Estimated parameters

ijij Cxpp |1

jj ppCxp1

Discrete Features

Multinomial (1-of-nj) features: xj {v1, v2,..., vnj}

if xj are independent

ikjijkijk CvxpCzpp ||1

iijkj k jki

log log

Multivariate Regression

Multivariate linear model

Multivariate polynomial model: Define new higher-order variables

z1=x1, z2=x2, z3=x12, z4=x2

2, z5=x1x2

and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 10)

dtt w,...,w,wxgr 10|

211010

xwxwwrw,...,w,wE

xwxwxww

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning...

Documents

Parametric Portland: Customized parametric workflows for reinforced concrete design ... · 2015-12-15 · Parametric Portland: Customized parametric workflows for reinforced concrete

Parametric or non parametric relationship practice problems

MACHINE LEARNING 1. Introduction. What is Machine Learning? Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Need

Tutorial parametric v. non-parametric

Is this a parametric or non-parametric question?

Parametric versus semi nonparametric parametric regression models

n G SystemCreo Parametric Complete Mold Design, Creo Parametric Progressive Die, Creo Parametric Prismatic & Multi-surface Milling, Creo Parametric Production Machining, Creo Parametric

Parametric and Non-Parametric Methods for Efficiency

Non-parametric statistics Dr David Field. Parametric vs. non-parametric The t test covered in Lecture 5 is an example of a “parametric test” Parametric

5. Non-Parametric TechniquesNon Non--Parametric Techniques Parametric … · 2009-10-19 · Non Non Non-Parametric Techniques-Parametric Techniques Parametric Techniques • All Parametric

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1)

Chapter 16 Parametric Technology - cadhistory.net Parametric Technology.pdf · Chapter 16 . Parametric Technology . Parametric Technology Corporation was founded in May 1985 by Dr

1 Non-parametric and semi-parametric RSSI/distance

Parametric Simulation using OpenModelica€¦ · Parametric Simulation using OpenModelica - Parametric Simulation using OpenModelica20 January, 2020 Parametric Simulation using OpenModelica

Hidden Markov Models By Marc Sobel. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Modeling

Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example Loan

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Parametric Modeling with Creo Parametric 2 - SDC 1-2 Parametric Modeling with Creo Parametric Introduction The feature-based parametric modeling technique enables the designer to ·

Parametric versus non parametric test