Sumeet Agarwal, EEL709 (Most figures from Bishop, PRMLweb.iitd.ac.in/~sumeet/linear_classif.pdf ·...

Linear Models for Classification

Sumeet Agarwal, EEL709

(Most figures from Bishop, PRML)

Approaches to classification

● Discriminant function: Directly assigns each data point x to a particular class Ci

● Model the conditional class distribution p(C i|x): allows separation of inference and decision

● Generative approach: model class likelihoods, p(x|Ci), and priors, p(Ci); use Bayes' theorem to get posteriors:

p(Ci|x) ~ p(x|Ci)p(Ci)

Linear discriminant functions

y(x) = wTx + w0

Multiple ClassesProblem of ambiguous regions

Multiple Classes

Consider a single K-class discriminant, with K linear functions:y

k(x) = w

Tx + wk0

And assign x to class Ck if y

k(x) > y

j(x) for all j ≠ k

Implies singly connected and convex decision regions:

Least squares for classificationToo sensitive to outliers:

Least squares for classification

Problematic due to evidently non-Gaussian distribution of target values:

Fisher's linear discriminant

Linear classification model is like 1-D projection of data: y = wTx.Thus we need to find a decision threshold along this 1-D projection (line). Simplest measure is separation of the class means: m

2 – m

1 = wT(m

2 – m

1). If classes have nondiagonal

covariances, then a better idea is to use the Fisher criterion:

J(w) = (m2 – m

1)2 / (s

2 + s2

Where s1

denotes the variance of class 1 in the 1-D projection.

Maximising J() attempts to give a large separation between projected class means, but also a small variance within each

class.

Fisher's linear discriminant

Line joining class means Fisher discriminant

The Perceptron

Φ1(x)

Φ2(x)

Φ3(x)

Φ4(x)

f()Activationfunction

wTΦ(x)0

f(wTΦ(x))

A non-linear transformation in the form of a step function is applied to the weighted sum of the input features. This is inspired by the way neurons appear to function, mimicking the action potential.

The perceptron criterion

● We'd like a weight vector w such that wTΦ(xi) > 0 for xi C∈ 1

(say, ti=1) and wTΦ(xi) < 0 for xi C∈ 2 (ti=-1)

● Thus, we want wTΦ(xi)ti > 0 i; those data points for which ∀this is not true will be misclassified

● The perceptron criterion tries to minimise the 'magnitude' of misclassification, i.e., it tries to minimise -wTΦ(xi)ti for all misclassified points (the set of which is denoted by M):

EP(w) = -∑i∈M wTΦ(xi)ti

● Why not just count the number of misclassified points? Because this is a piecewise constant function of w, and thus the gradient is zero at most places, making optimisation hard

Learning by gradient descent

w(τ+1) = w(τ) – η E∇ P(w)

= w(τ) + ηΦ(xi)ti

(if xi is misclassified)

We can show that after this update, the error due to xi will be reduced:

-w(τ+1)TΦ(xi)ti = -w(τ)TΦ(xi)ti – (Φ(xi)ti)TΦ(xi)ti

< -w(τ)TΦ(xi)ti

(having set η=1, which can be done without loss of generality)

Perceptron convergence

Perceptron convergence theorem guarantees exact solution in finite steps for linearly separable data; but no convergence for nonseparable data

Gaussian Discriminant Analysis

● Generative approach, with class-conditional densities (likelihoods) modelled as Gaussians

For the case of two classes, we have:

Logistic sigmoid

● In the Gaussian case, we get

The assumption of equal

covariance matrices leads

to linear decision

boundaries

Allowing for unequal covariance matrices for different classes leads to quadratic decision boundaries

Parameter estimation for GDA

Maximum Likelihood Estimators

Likelihood:(assuming equal covariance matrices)

Logistic Regression

● An example of a probabilistic discriminative model

● Rather than learning P(x|Ci) and P(Ci), attempts to directly learn P(Ci|x)

● Advantages: fewer parameters, better if assumptions in class-conditional density formulation are inaccurate

● We have seen how the class posterior for a two-class setting can be written as a logistic sigmoid acting on a linear function of the feature vector Φ:

● This model is called logistic regression, even though it is

a model for classification, not regression!

Parameter learning

● If we let

then the likelihood function is

and we can define a corresponding error, known as cross-entropy:

Parameter learning● The derivative of the sigmoid function is given by:

● Using this, we can obtain the gradient of the error function with respect to w:

● Thus the contribution to the gradient from point n is given by the 'error' between model prediction and actual class label (yn – tn) times the basis function vector for that point, Φn

● Could use this for sequential learning by gradient descent, exactly as for least-squares linear regression

Nonlinear basis functions

Sumeet Agarwal, EEL709 (Most figures from Bishop, PRMLweb.iitd.ac.in/~sumeet/linear_classif.pdf ·...

Documents

22200474 History of Lic Sumeet

Sumeet Mehta CSE 1 (2 nd Year)

Cloud Scheduler - University of Victoriaheprcdocs.phys.uvic.ca/presentations/globusworld-armstrong-2010.p… · Cloud Scheduler Patrick Armstrong, Ashok Agarwal, Adam Bishop, Andre

Chapter 7- User Centered Design By: Atul Saboo Hardik Mishra Mohit Almal Pankaj Agarwal Sumeet Jalan

The Multi-Armed Bandit Problem€¦ · Sumeet Katariya Electrical and Computer Engineering December 7, 2013 Sumeet Katariya Multi-armed Bandit. Motivation Model Algorithms Outline

Sumeet Sharma Pansys

Avinash Kumar Agarwal Rashmi Avinash Agarwal Tarun Gupta

SUMEET~1 (2)

C2 · 2021. 5. 11. · Koushalya Devi Agarwal Mahesh Kumar Agarwal Amita Agarwal Dinesh Agarwal Anita Agarwal Amit Agarwal Rachna Agarwal Shrey Agarwal Ankit Agarwal Ambika Agarwal

10Qin2M Slides Sumeet Shrivastava - Array July 2013

Yoonjoo Choi , Sumeet Agarwal and Charlotte M. Deaneafter superimposing anchor structures. MODELLER setting The default loop reﬁnement script was used. One hundred loop models were

Sumeet packers p vt.ltd

History of Lic - Sumeet

Presented By: Dr Sumeet Kaur

Sumeet vij enterprise_knowledge_graph

Big Gains With Little Virtual Machines Sumeet Mehra

Automatically generated PDF from existing images. › file-basket › Dec 2016.pdf · nidhi agarwal ashok kumar agarwal urmila agarwal poonam agarwal utsav agarwal kan ika agarwal

Counsel to Tracy Klestadt, Plan Administrator of the ......1 Deepak Agarwal, Sheela Agarwal, Melina Agarwal, Vipesh Agarwal, Vishal Agarwal and Iftikar Ahmed are hereinafter referred

Navamsa in Astrology - C.S. PATEL Page 001 Image 0001 1 · Outstanding Books On Astrology, Palmistry & Numerology Sumeet Chugh Sumeet Chugh Sumeet Chugh Sumeet Chugh ... Astrology

Sumeet Rohatgi Elsevier Innovation Live