Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf ·...

Neural Networks

Aarti Singh

Machine Learning 10-701/15-781

Nov 29, 2010

Slides Courtesy: Tom Mitchell

Logistic Regression

Assumes the following functional form for P(Y|X):

Logistic

function

(or Sigmoid):

Logistic function applied to a linearfunction of the data

Features can be discrete or continuous!

Logistic Regression is a Linear Classifier!

Assumes the following functional form for P(Y|X):

Decision boundary:

(Linear Decision Boundary)

Training Logistic Regression

How to learn the parameters w0, w1, … wd?

Training Data

Maximum (Conditional) Likelihood Estimates

Discriminative philosophy – Don’t waste effort learning P(X), focus on P(Y|X) – that’s all that matters for classification!

Optimizing concave/convex function

• Conditional likelihood for Logistic Regression is concave

• Maximum of a concave function = minimum of a convex function

Gradient Ascent (concave)/ Gradient Descent (convex)

Gradient:

Learning rate, >0Update rule:

Logistic Regression as a Graph

Sigmoid Unit

Neural Networks to learn f: X Y

• f can be a non-linear function• X (vector of) continuous and/or discrete variables

• Y (vector of) continuous and/or discrete variables

• Neural networks - Represent f by network of logistic/sigmoid units, we will focus on feedforward networks:

Input layer, X

Output layer, Y

Hidden layer, H

Sigmoid Unit

/activation function (also linear, threshold)

Differentiable

Forward Propagation for prediction

Sigmoid unit:

1-Hidden layer,

1 output NN:

Prediction – Given neural network (hidden units and weights), use it to predict

the label of a test point

Forward Propagation –

Start from input layer

For each subsequent layer, compute output of sigmoid unit

• Consider regression problem f:XY , for scalar Y

y = f(x) + assume noise N(0,), iid

deterministic

M(C)LE Training for Neural Networks

Learned

neural network

• Let’s maximize the conditional data likelihood

• Consider regression problem f:XY , for scalar Y

y = f(x) + noise N(0,σ)

deterministic

MAP Training for Neural Networks

Gaussian P(W) = N(0,σI)

ln P(W) c i wi2

E – Mean Square Error

For Neural Networks,

E[w] no longer convex in w

Using all training data D

Error Gradient for a Sigmoid Unit

l l l lly

Sigmoid Unit

Error Gradient for 1-Hidden layer, 1-output neural network

see Notes.pdf

l l l llyk

l l l l

l l lo

Using Forward propagation

yk = target output (label)

of output unit k

ok(h) = unit output

(obtained by forward

propagation)

wij = wt from i to j

Note: if i is input variable,

oi = xi

Objective/Error no

longer convex in

weights

Regularization – train neural network by maximize M(C)AP

Early stopping

Regulate # hidden units – prevents overly complex models≡ dimensionality reduction

How to avoid overfitting?

left strt right up

Artificial Neural Networks: Summary

• Actively used to model distributed computation in brain

• Highly non-linear regression/classification

• Vector-valued inputs and outputs

• Potentially millions of parameters to estimate - overfitting

• Hidden layers learn intermediate representations – how many to use?

• Prediction – Forward propagation

• Gradient descent (Back-propagation), local minima problems

• Mostly obsolete – kernel tricks are more popular, but coming back in new form as deep belief networks (probabilistic interpretation)

Neural Networks - Carnegie Mellon School of Computer …aarti/Class/10701/slides/Lecture22.pdf ·...

Documents

Logistic Regressionguestrin/Class/10701/slides/logisticregression.pdf · 3 5 ©Carlos Guestrin 2005-2007 Logistic Regression – a Linear classifier-6 -4 -2 0 2 4 6 0 0.1 0.2 0.3

Lecture22 Ee620 Limiting Amps

Handout Lecture22

Algorithm lecture22

Malaysia logistic masterplan _ logistic

Emitter Coupled Logic & BiCMOSece322/LECTURES/Lecture22/Lecture22.pdf · Slide 2 Emitter Coupled Logic • ECL Inverter • Voltage Transfer Characteristics • ECL NOR Gate

Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005 · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Emitter Coupled Logic & BiCMOScourse.ece.cmu.edu/~ece322/LECTURES/Lecture22/Lecture22.pdf · BiCMOS Advantages: • Improved speed & robustness over CMOS • Lower power dissipation

Emitter Coupled Logic & BiCMOS - Carnegie Mellon …ece322/LECTURES/Lecture22/Lecture22.pdfSlide 2 Emitter Coupled Logic • ECL Inverter • Voltage Transfer Characteristics • ECL

Machine Learningepxing/Class/10701/slides/lecture22-TM.pdfRepresentation: Data: Each document is a vector in the word space Ignore the order of words in a document. Only count matters!

10701 Manual 2IN1 V1 20170106 - kenmorefloorcare.com

Logistic Regression, Generative and Discriminative Classifiersguestrin/Class/10701-S05/slides/LogRegress-1-24-0… · Logistic Regression, Generative and Discriminative Classifiers

Lecture22 Ethics

KM 10701 Marine Operators Manual 2.4L4.3L5.7L6.0L6.2L LS3.…

10701: Introduction to Machine Learningepxing/Class/10701-20/slides/...10701: Introduction to Machine Learning Neural Networks and Deep Learning (1) -Basics in artificial neural networksEric

Advanced Machine Learning - Carnegie Mellon School of ...epxing/Class/10701-10s/Lecture/lecture22.pdf · Advanced Machine Learning Learning Graphical Learning Graphical Models Models

ITS Undergraduate 10701 Paper

Oop lecture22

Revisiting Logistic Regression & Naïve Logistic Regression ...epxing/Class/10701-10s/Lecture/lecture6.pdf · Connection to Gaussian Naïve Bayes There are several distributions that

Lecture22 Thermo