UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2 ... · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2: Algebra and Calculus Review Dr. Yanjun Qi University of

UVACS6316/4501–Fall2016

MachineLearning

Lecture2:AlgebraandCalculusReview

Dr.YanjunQi

UniversityofVirginiaDepartmentof

ComputerScience

9/1/16 Dr.YanjunQi/UVACS6316/f16 1

9/1/16 2

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 7 103 4 and = 8 115 6 9 12

A B C=A–B=?

C=A+B=?

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 4 7 1 42 5 8 and = 2 53 6 9 3 6

A B C=AB=?

C=BA=?

?

?

?

Dr.YanjunQi/UVACS6316/f16

Minimumrequirement

test

Today:

q DataRepresentaMonforMLsystems

q ReviewofLinearAlgebraandMatrixCalculus

9/1/16 3Dr.YanjunQi/UVACS6316/f16

ATypicalMachineLearningPipeline

9/1/16

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

Dr.YanjunQi/UVACS6316/f16 4

Evaluation

Optimization

e.g. Data Cleaning Task-relevant

e.g. SUPERVISED LEARNING

•  Find function to map input space X to output space Y

•  Generalisation:learnfuncNon/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples

9/1/16 5Dr.YanjunQi/UVACS6316/f16KEY

ADataset

•  Data/points/instances/examples/samples/records:[rows]•  Features/a0ributes/dimensions/independentvariables/covariates/

predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:special

columntobepredicted[lastcolumn]


MainTypesofColumns

•  Con7nuous:arealnumber,forexample,ageorheight

•  Discrete:asymbol,like“Good”or“Bad”


e.g. SUPERVISED Classification

•  e.g.Here,targetYisadiscretetargetvariable

9/1/16 8f(x?)

Trainingdatasetconsistsofinput-outputpairs


Today:

q DataRepresentaMonforMLsystems

q ReviewofLinearAlgebraandMatrixCalculus


DEFINITIONS - SCALAR

◆ ��a scalar is a number –  (denoted with regular type: 1 or 22)

10 9/1/16 Dr.YanjunQi/UVACS6316/f16

DEFINITIONS - VECTOR

◆ Vector: a single row or column of numbers – denoted with bold small letters – row vector a = –  column vector (default) b =

[ ]54321

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

54321


DEFINITIONS - VECTOR

9/1/16 12

•  Vector in Rn is an ordered set of n real numbers. –  e.g. v = (1,6,3,4) is in R4

–  A column vector: –  A row vector: ⎟⎟

⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

4361

( )4361


DEFINITIONS - MATRIX ◆ A matrix is an array of numbers A = ◆ Denoted with a bold Capital letter ◆ All matrices have an order (or dimension): that is, the number of rows * the number of

columns. So, A is 2 by 3 or (2 * 3). u  A square matrix is a matrix that has the

same number of rows and columns (n * n)

⎥⎦⎤

⎢⎣⎡

232221

131211

aaaaaa


DEFINITIONS - MATRIX

•  m-by-n matrix in Rmxn with m rows and n columns, each entry filled with a (typically) real number:

•  e.g. 3*3 matrix

9/1/16 14

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

2396784821


Squarematrix

Special matrices

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

fedcba

000

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

cb

a

000000

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

jihgf

edcba

000

000

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

fedcb

a000

diagonal upper-triangular

tri-diagonal lower-triangular

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

100010001

I (identity matrix)


Special matrices: Symmetric Matrices

e.g.:


Review of MATRIX OPERATIONS

1)  Transposition 2)  Addition and Subtraction 3)  Multiplication 4)  Norm (of vector) 5)  Matrix Inversion 6)  Matrix Rank 7)  Matrix calculus


(1) Transpose

Transpose: You can think of it as –  �flipping� the rows and columns

( )baba T

=⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛dbca

dcba T

e.g.


(2) Matrix Addition/Subtraction

•  Matrix addition/subtraction – Matrices must be of same size.


(2) Matrix Addition/SubtractionAnExample

•  Ifwehave

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

1 2 7 10 8 12+ = 3 4 + 8 11 = 11 15

5 6 9 12 14 18C A B

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 7 103 4 and = 8 115 6 9 12

A B

thenwecancalculateC=A+Bby


(2) Matrix Addition/SubtractionAnExample

•  Similarly,ifwehave

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

1 2 7 10 -6 -8- = 3 4 - 8 11 = -5 -7

5 6 9 12 -4 -6C A B

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 7 103 4 and = 8 115 6 9 12

A B

thenwecancalculateC=A-Bby


OPERATION on MATRIX



(3)ProductsofMatrices•  WewritethemulNplicaNonoftwomatricesAandBasAB

•  Thisisreferredtoeitheras•  pre-mulNplyingBbyA or

•  post-mulNplyingAbyB•  SoformatrixmulNplicaNonAB,Aisreferredtoasthepremul7plierandBisreferredtoasthepostmul7plier


(3)ProductsofMatrices

9/1/16 24

CondiMon:n=q

m x n q x p m x p


n

(3)ProductsofMatrices

•  InordertomulNplymatrices,theymustbeconformable(thenumberofcolumnsinthepremulNpliermustequalthenumberofrowsinpostmulNplier)

•  Notethat•  an(mxn)x(nxp)=(mxp)•  an(mxn)x(pxn)=cannotbedone•  a(1xn)x(nx1)=ascalar(1x1)


ProductsofMatrices

•  IfwehaveA(3x3)andB(3x2)then

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

11 12 13 11 12 11 12

21 22 23 21 22 21 22

31 32 31 3231 32 33

a a a b b c ca a a x b b = c c

b b c ca a aAB C

11 11 11 12 21 13 31

12 11 12 12 22 13 32

21 21 11 22 21 23 31

22 21 12 22 22 23 32

31 31 11 32 21 33 31

32 31 12 32 22 33 32

c = a b + a b +a bc = a b +a b +a bc = a b +a b +a bc = a b +a b +a bc = a b +a b +a bc = a b +a b +a b

where test


MatrixMulNplicaNonAnExample

•  Ifwehave

⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎣ ⎦

11 12

21 22

31 32

c c1 4 7 1 4 30 662 5 8 x 2 5 = c c = 36 813 6 9 3 6 42 96c c

AB

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

11 11 11 12 21 13 31

12 11 12 12 22 13 32

21 21 11 22 21 23 31

22 21 12 22 22 23 32

31 31 11 32 21 33 31

32 31 12 32 22 3

c = a b + a b + a b =1 1 + 4 2 +7 3 = 30

c = a b + a b + a b =1 4 + 4 5 +7 6 = 66

c = a b + a b + a b = 2 1 +5 2 +8 3 = 36

c = a b + a b + a b = 2 4 +5 5 +8 6 = 81

c = a b + a b + a b = 3 1 +6 2 +9 3 = 42

c = a b + a b + a ( ) ( ) ( )3 32b = 3 4 +6 5 +9 6 = 96

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 4 7 1 42 5 8 and = 2 53 6 9 3 6

A B

then

where


SomeProperNesofMatrixMulNplicaNon

•  Notethat•  Evenifconformable,ABdoesnotnecessarilyequalBA(i.e.,matrixmulNplicaNonisnotcommuta7ve)

• MatrixmulNplicaNoncanbeextendedbeyondtwomatrices

• matrixmulNplicaNonisassocia7ve,i.e.,A(BC)=(AB)C


SomeProperNesofMatrixMulNplicaNon

◆ Multiplication and transposition (AB)T = BTAT

u Multiplication with Identity Matrix


SpecialUsesforMatrixMulNplicaNon

•  ProductsofScalars&MatricesèExample,Ifwehave

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 3.5 7.03.5 3 4 = 10.5 14.0

5 6 17.5 21.0bA

⎡ ⎤⎢ ⎥=⎢ ⎥⎣ ⎦

1 23 4 and b = 3.55 6

A

thenwecancalculatebAby

NotethatbA=Abifbisascalar9/1/16 30Dr.YanjunQi/UVACS6316/f16


•  Dot(orInner)ProductoftwoVectors• PremulNplicaNonofacolumnvectorabyconformablerowvectorbyieldsasinglevaluecalledthedotproductorinnerproduct-If

aT = 3 4 6!"

#$ and b =

528

!

"

%%

#

$

&&

aTb = a•b = 3 4 6!"

#$

528

!

"

%%

#

$

&& = 3 5( )+ 4 2( )+ 6 8( )= 71 = bTa

thentheirinnerproductgivesus

whichisthesumofproductsofelementsinsimilarposiNonsforthetwovectors9/1/16 31Dr.YanjunQi/UVACS6316/f16


•  OuterProductoftwoVectors• PostmulNplicaNonofacolumnvectorabyconformablerowvectorbyieldsamatrixcontainingtheproductsofeachpairofelementsfromthetwomatrices(calledtheouterproduct)-If

aT = 3 4 6!"

#$ and b =

528

!

"

%%

#

$

&&

abT =346

!

"

##

$

%

&&

5 2 8!"

$% =

152030

6812

243248

!

"

##

$

%

&&

thenabTgivesus



•  OuterProductoftwoVectors,e.g.aspecialcase:



•  SumtheSquaredElementsofaVector• PremulNplyacolumnvectorabyitstranspose–If

thenpremulNplicaNonbyarowvectoraT

willyieldthesumofthesquaredvaluesofelementsfora,i.e.

⎡ ⎤⎢ ⎥=⎢ ⎥⎣ ⎦

52 8

a

aT = 5 2 8!"

#$

aTa = 5 2 8!"

#$

528

!

"

%%

#

$

&& = 52 +22 + 82 = 93



•  Matrix-VectorProducts(I)



•  Matrix-VectorProducts(II)



•  Matrix-VectorProducts(III)



•  Matrix-VectorProducts(IV)


MATRIX OPERATIONS



A norm of a vector ||x|| is informally a measure of the �length� of the vector.

–  Common norms: L1, L2 (Euclidean)

–  Linfinity

(4) Vector norms


VectorNorm(L2,whenp=2)


VectorNorms(e.g.,)

MoreGeneral:Norm

•  AnormisanyfuncNong()thatmapsvectorstorealnumbersthatsaNsfiesthefollowingcondiNons:


Orthogonal&Orthonormal

Ifu•v=0,||u||2!=0,||v||2!=0àuandvareorthogonal

Ifu•v=0,||u||2=1,||v||2=1àuandvareorthonormal

x•y=

InnerProductdefinedbetweencolumnvectorxandy,asè


Orthogonal matrices

• Aisorthogonalif:

• NotaNon:

Example:


Orthogonal matrices

• NotethatifAisorthogonal,iteasytofinditsinverse:

Property:


•  DefiniMon:Givenavectornorm||x||,thematrixnormdefinedbythevectornormisgivenby:

•  Whatdoesamatrixnormrepresent?•  Itrepresentsthemaximum“stretching”thatAdoestoavectorx->(Ax).

MatrixNorm

xAx

Ax 0max

≠=

TheoremA:Thematrixnormcorrespondingto1-normismaximumabsolutecolumnsum:

Proof:Frompreviousslide,wecanhaveAlso,whereAjisthej-thcolumnofA.

Matrix1-Norm

∑=

=n

iijjaA

11max

111max AxAx =

=

�

Ax = x1A1 + x2A2 +!+ xnAn = x jA jj=1

n

∑

MATRIX OPERATIONS



(5)InverseofaMatrix

•  TheinverseofamatrixAiscommonlydenotedbyA-1orinvA.

•  TheinverseofannxnmatrixAisthematrixA-1suchthatAA-1=I=A-1A

•  Thematrixinverseisanalogoustoascalarreciprocal

•  Amatrixwhichhasaninverseiscallednonsingular


(5)InverseofaMatrix

•  ForsomenxnmatrixA,aninversematrixA-1

maynotexist.•  Amatrixwhichdoesnothaveaninverseissingular.

•  AninverseofnxnmatrixAexistsiff|A|not0


THE DETERMINANT OF A MATRIX

◆ The determinant of a matrix A is denoted by |A| (or det(A) or det A).

◆ Determinants exist only for square matrices.

◆ E.g. If A = ⎥⎦

⎤⎢⎣

⎡

2221

1211

aaaa

21122211 aaaaA −=



2 x 2

3 x 3

n x n



diagonal matrix:


HOW TO FIND INVERSE MATRIXES? An example,

◆ If ◆  and |A| not 0

⎥⎦

⎤⎢⎣

⎡−

−=

acbd-

)det(11

AA

⎥⎦

⎤⎢⎣

⎡=

dcba

A


Matrix Inverse

•  The inverse A-1 of a matrix A has the property: AA-1=A-1A=I

•  A-1 exists only if

•  Terminology –  Singular matrix: A-1 does not exist –  Ill-conditioned matrix: A is close to being singular


PROPERTIES OF INVERSE MATRICES

◆ ��

◆ ��

◆ ��

( ) 111 -- ABAB =−

€

AT( )−1 = A-1( )T

( ) AA =−11-


Inverse of special matrix

•  For diagonal matrices

•  For orthogonal matrices –  a square matrix with real entries whose columns and rows

are orthogonal unit vectors (i.e., orthonormal vectors)


Pseudo-inverse

•  The pseudo-inverse A+ of a matrix A (could be non-square, e.g., m x n) is given by:

•  It can be shown that:


MATRIX OPERATIONS



(6) Rank: Linear independence

•  A set of vectors is linearly independent if none of them can be written as a linear combination of the others.

x3=−2x1+x2

èNOTlinearlyindependent


(6) Rank: Linear independence

•  Alternative definition: Vectors v1,…,vk are linearly independent if c1v1+…+ckvk = 0 implies c1=…=ck=0

9/1/16 62

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

000

|||

|||

3

2

1

321

ccc

vvv

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

000

313201

vu

(u,v)=(0,0),i.e.thecolumnsarelinearlyindependent.

e.g.


(6) Rank of a Matrix

•  rank(A) (the rank of a m-by-n matrix A) is = The maximal number of linearly independent columns =The maximal number of linearly independent rows

•  If A is n by m, then –  rank(A)<= min(m,n) –  If n=rank(A), then A has full row rank –  If m=rank(A), then A has full column rank

Rank=? Rank=?


(6) Rank of a Matrix

•  Equal to the dimension of the largest square sub-matrix of A that has a non-zero determinant.

Example: has rank 3


(6) Rank and singular matrices


MATRIX OPERATIONS



( ) ( )0

limh

f a h f ah→

+ −iscalledthederivaNveofat.f a

Wewrite: ( ) ( ) ( )0

limh

f x h f xf x

h→

+ −′ =

�ThederivaNveoffwithrespecttoxis…�

TherearemanywaystowritethederivaMveof ( )y f x=

Review:DerivaNveofaFuncNon

èe.g.definetheslopeofthecurvey=f(x)atthepointx9/1/16 67Dr.YanjunQi/UVACS6316/f16

-3

-2

-10

1

2

3

4

5

6

-3 -2 -1 1 2 3x

2 3y x= −

( ) ( )2 2

0

3 3limh

x h xy

h→

+ − − −′ =

2 2 2

0

2limh

x xh h xy

h→

+ + −′ =

2y x′ =-6-5-4-3-2-10

123456

-3 -2 -1 1 2 3x

0lim2h

y x h→

′ = +0

→

Review:DerivaNveofaQuadraNcFuncNon


SomeimportantrulesfortakingderivaNves


Review:DefiniNonsofgradient(Matrix_calculus/Scalar-by-matrix)


Inprinciple,gradientsareanaturalextensionofparNalderivaNvestofuncNonsofmulNplevariables.

èDenominatorlayout

Review:DefiniNonsofgradient(Matrix_calculus/Scalar-by-vector)

9/1/16 71

•  Sizeofgradientisalwaysthesameasthesizeof

if


èDenominatorlayout

For Examples


Exercise:asimpleexample


!!

f (w)=wTx = w1 ,w2 ,w3!"

#$

123

%

&

'''

(

)

***=w1+2w2+3w3

!!

∂ f∂w

=∂wTx∂w

= x =123

"

#

$$$

%

&

'''

èDenominatorlayout

EvenmoregeneralMatrixCalculus:TypesofMatrixDerivaMves

ByThomasMinka.OldandNewMatrixAlgebraUsefulforStaNsNcs


Review:HessianMatrix/n==2case

•  1stderivaNvetogradient,

•  2ndderivaNvetoHessian

f (x, y)

g =∇f =∂f∂x

∂f∂y

#

$

%%

&

'

((

H =∂2 f∂x2

∂2 f∂x∂y

∂2 f∂x∂y

∂2 f∂y2

#

$

%%%

&

'

(((

759/1/16 Dr.YanjunQi/UVACS6316/f16

SinglevariateàmulNvariate

Review:HessianMatrix


TodayRecap

q DataRepresentaMonq LinearAlgebraandMatrixCalculusReview1)  Transposition 2)  Addition and Subtraction 3)  Multiplication 4)  Norm (of vector) 5)  Matrix Inversion 6)  Matrix Rank 7)  Matrix calculus


Extra:

•  HW1isreleasedtoday@Collab•  HW1isduenextSat@midnight

•  HandoutforLecture2hasbeenposted@hxp://www.cs.virginia.edu/yanjun/teach/2016f/schedule.html


Extra

•  Thefollowingtopicsarecoveredbyhandout,butnotbythisslide(willbecovered…)– Trace()– Eigenvalue/Eigenvectors– PosiNvedefinitematrix,Grammatrix– QuadraNcform– ProjecNon(vectoronaplane,oronavector)


References

q hxp://www.cs.cmu.edu/~zkolter/course/linalg/index.html

q Prof.JamesJ.Cochran’stutorialslides“MatrixAlgebraPrimerII”

q hxp://www.cs.cmu.edu/~aarN/Class/10701/recitaNon/LinearAlgebra_Matlab_Review.ppt

q Prof.AlexanderGray’sslidesq  Prof. George Bebis’ slides q  Prof. Hal Daum ́e III’ notes


Documents

UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2 ... · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2: Algebra and Calculus Review Dr. Yanjun Qi University of