INTRODUCTION TO Machine Learningethem/i2ml/slides/v1-1/i2...INTRODUCTION TO Machine Learning ETHEM...

INTRODUCTION TO

Machine Learning

ETHEM ALPAYDIN© The MIT Press, 2004

alpaydin@boun.edu.trhttp://www.cmpe.boun.edu.tr/~ethem/i2ml

Lecture Slides for

CHAPTER 10:

Linear Discrimination

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Likelihood- vs. Discriminant-based Classification

Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x)

gi(x) = log P(Ci|x)

Discriminant-based: Assume a model for gi(x|Φi); no density estimation

Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries

Linear Discriminant

Linear discriminant:

Advantages:Simple: O(d) space/computation

Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring)Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable

( ) 01

00| ij

Tiiii wxwww,g +=+= ∑

Generalized Linear Model

Quadratic discriminant:

Higher-order (product) terms:

Map from x to z using nonlinear basis functions and use a linear discriminant in z-space

215224

2132211 xxz,xz,xz,xz,xz =====

( ) 00| iTii

Tiiii ww,,g ++= xwxxwx WW

( ) ( )∑=

jjiji wg

Two Classes( ) ( ) ( )

( ) ( )( ) ( )

201021

202101

−+−=

+−+=

( )⎩⎨⎧ >

otherwise

0if choose

Geometry

Multiple Classes( ) 00| i

Tiiii ww,g += xwwx

Classes arelinearly separable

( ) ( )xx j

if Choose

Pairwise Separation( ) 00| ij

Tijijijij ww,g += xwwx

( )⎪⎩

⎪⎨

⎧∈≤∈>

=otherwisecare tdon'

if choose

>≠∀ xij

From Discriminants to Posteriors

When p (x | Ci ) ~ N ( µi , ∑)

( )iiTiiii

iTiiii

+−==

−− µµµ ΣΣw

( ) ( )

( )( )[ ]

otherwise and

01 log

if choose

1| and |

yCPCPy

⎪⎩

⎪⎨

>−>−

−=≡ xx

( )( ) ( )( )

( )( )

( ) ( )( ) ( )[ ]( ) ( )( ) ( )[ ]

( )( )

( ) ( ) ( )

( )( )

( ) ( ) ( )[ ]001

210211

1sigmoid|

logitof inverse The 21

log21exp2

21exp2log

|log|logit

+−+=+=

−+−=−=

+−−−π

−−−π=

−−

−−−

µµµµµµ

Sigmoid (Logistic) Function

( ) ( )( ) 50if choose and sigmoid Calculate 2.

or ,0if choose and Calculate 1.

Gradient-Descent

E(w|X) is error with parameters w on sample Xw*=arg minw E(w | X)

Gradient

Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient

E,...,

E ⎥⎦

⎤⎢⎣

⎡∂∂

∂∂

=∇21

Gradient-Descent

wt wt+1

E (wt)

E (wt+1)

∀∂∂

η−=

Logistic Discrimination

Two classes: Assume log likelihood ratio is linear

( )( )

( )( ) ( )( )

( )( )

( ) ( )[ ]01

log where

|log|logit

wCP̂y

+−+==

Training: Two Classes

{ } ( )( ) ( )[ ]

( ) ( )( )( )( )

( ) ( ) ( )ttt

yryrw,E

−−+−=

+−+==

∏ −

1 log1 log|

Bernoulli|

Training: Gradient-Descent( ) ( ) ( )

( ) ( )

( )∑

−=∂∂

−=∆

−⎟⎟⎠

⎞⎜⎜⎝

⎛−−

−=∂∂

−=∆

−−+−=

d,...,j,xyr

yydady

yryrw,E

1 asigmoidIf

1 log1 log| Xw

100 1000

{ } ( )( )( )

( ) [ ][ ]

{ }( ) ( )( )

{ }( )

( ) ( )∑∑

∏∏∑

−η=−η=

rtiiii

K,...,i,w

wCP̂y

1Mult|

∆∆ xw

K>2 Classes

softmax

Example

Generalizing the Linear Model

Quadratic:

Sum of basis functions:

where φ(x) are basis functions Kernels in SVM

Hidden units in neural networks

( )( ) 0|

|log i

i wCpCp

++= xwxxxx

( )( ) ( ) 0|

|log i

i wCpCp

+φ= xwxx

Discrimination by Regression

Classes are NOT mutually exclusive and exhaustive

( )( ) ( )[ ]

( ) ( )

( ) ( )( ) ( ) ttt

−−η=

⎥⎥⎦

⎢⎢⎣

−πσ

+−+=+=

σεε+=

1sigmoid

0 where

Optimal Separating Hyperplane

(Cortes and Vapnik, 1995; Vapnik, 1995)

as rewritten be can which

1 for 1

that such and find

if 1 where

−=+≤+

+=+≥+

⎩⎨⎧

∈−∈+

Margin

Distance from the discriminant to the closest instances on either side

Distance of x to the hyperplane is

We require

For a unique sol’n, fix ρ||w||=1and to max margin

xw 0wtT +

wr tTt

∀ρ≥+

( ) t,wr tTt ∀+≥+ 1 to subject 21 min 0

( )[ ]

1 to subject 21 min

=α⇒=∂

α=⇒=∂

α++α−=

−+α−=

∀+≥+

∑∑

Most αt are 0 and only a small number have αt >0; they are the support vectors

∑∑∑

∑ ∑∑

∀≥=

+−−=

0 and 0 to subject

tsTtst

tttTTd

ααα

Soft Margin Hyperplane

Not linearly separable

Soft error

New primal is

( ) ttTt wxr ξ−≥+ 10w

∑ξt

( )[ ] ∑∑∑ ξµ−ξ+−+α−ξ+=t

tp wxrCL 1

( ) ( ) ( ) ( )

( ) ( )∑

∑∑

α=α=

xφxφxφwx

Kernel Machines

Preprocess input x by basis functions

z = φ(x) g(z)=wTz

g(x)=wT φ(x)

The SVM solution

Kernel Functions

Polynomials of degree q:

Radial-basis functions:

Sigmoidal functions:

( ) ( )qtTt ,K 1+= xxxx

(Cherkassky and Mulier, 1998)

( ) ( )( )

( ) [ ]T

x,x,xx,x,x,

yxyxyyxxyxyx

212121

2121212211

+++++=

( )⎥⎥

⎢⎢

−−= 2

( ) ( )12tanh += tTt ,K xxxx

SVM for Regression

Use a linear model (possibly kernelized)

f(x)=wTx+w0

Use the є-sensitive error function

( )( ) ( )( )⎪⎩

⎪⎨⎧

ε−−

ε<−=ε otherwise

if 0tt

frf,re

( )( )

≥ξξ

ξ+ε≤−+

ξ+ε≤+−

( )∑ −+ ξ+ξ+t

INTRODUCTION TO Machine Learningethem/i2ml/slides/v1-1/i2...INTRODUCTION TO Machine Learning ETHEM...

Documents

Introduction to Machine Learning Figuresethem/i2ml/i2ml-figs.pdf · 2016. 8. 8. · x: milage y: price Figure 1.2: A training dataset of used cars and the function ﬁtted. For simplicity,

INTRODUCTION TO Machine Learningethem/i2ml/slides/v1-1/i2ml-chap6-v… · Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Feature Selection

ETHEM ALPAYDIN © The MIT Press, 2010fac.ksu.edu.sa/sites/default/files/i2ml2e-chap6-v1-0.pdf · Why Reduce Dimensionality? Reduces time complexity: Less computation Reduces space

Çerkes Ethem - Anılarım

1 Hidden Markov Model Instructor : Saeed Shiry CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004

ÇERKES ETHEM HADİSESİNİN NUTUK TEMELLİNCELENMESİ · 2016-12-11 · ÇERKES ETHEM HADİSESİNİN NUTUK TEMELLİ İNCELENMESİ 63 Yozgat İsyanı Öncesi ve Sonrası Gelişmeler:

scan0006 - pfri.uniri.hrstrojno utenje 0100010101 ethem alpaydin 10110100100000 1011 coo 1110000 0100110101100001 011000110110100001101001 011011100110010100100000 010011000110010101100001

Ethem Alpaydın Department of Computer Engineering Boğaziçi University alpaydin@boun.edu.tr Intelligent Data Mining

Introduction to Machine Learning - Ethem Alpaydin

ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr ethem/i2ml2e Lecture Slides for

ETHEM ALPAYDIN © The MIT Press, 2010ethem/i2ml2e/2e_v1-0/i2ml2...Simpler models are more robust on small datasets More interpretable; simpler explanation Data visualization (structure,

KURAMSAL BİR YAKLAIM: ETHEM BARAN’IN YARIM ROMANI

Ethem eldem turkcell sunum

INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/lectures/...1 INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Edited for CS 536 Fall

GÜROL ŞAHİN ULUTAŞ (1) AYLİN ÖZGEN ALPAYDIN (2) FATMA TANELİ (1)

Doç. Dr. Ethem CEBECİOĞLU _

Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna

Bigisayar ethem hoca

INTRODUCTION TO Machine Learningethem/i2ml/slides/v1-1/i2ml-chap7-v1...(Chapter 8) Lecture Notes for E ... Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The

ETHEM ALPAYDIN © The MIT Press, 2010scrippsj/cs678/slides/edition2/i2...What We Talk About When We Talk About“Learning” Learning general models from a data of particular examples