55
Applied Math for Machine Learning Prof. Kuan-Ting Lai 2021/3/11

Applied Math for Machine Learning - AIoT Lab

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applied Math for Machine Learning - AIoT Lab

Applied Math for Machine Learning

Prof. Kuan-Ting Lai

2021/3/11

Page 2: Applied Math for Machine Learning - AIoT Lab

Applied Math for Machine Learning

β€’ Linear Algebra

β€’Probability

β€’Calculus

β€’Optimization

Page 3: Applied Math for Machine Learning - AIoT Lab

Linear Algebra

β€’ Scalarβˆ’ real numbers

β€’ Vector (1D)βˆ’ Has a magnitude & a direction

β€’ Matrix (2D)βˆ’ An array of numbers arranges in rows &

columns

β€’ Tensor (>=3D)βˆ’ Multi-dimensional arrays of numbers

Page 4: Applied Math for Machine Learning - AIoT Lab

Real-world examples of Data Tensors

β€’ Timeseries Data – 3D (samples, timesteps, features)

β€’ Images – 4D (samples, height, width, channels)

β€’ Video – 5D (samples, frames, height, width, channels)

4

Page 5: Applied Math for Machine Learning - AIoT Lab

Vector Dimension vs. Tensor Dimension

β€’ The number of data in a vector is also called β€œdimension”

β€’ In deep learning , the dimension of Tensor is also called β€œrank”

β€’ Matrix = 2d array = 2d tensor = rank 2 tensor

https://deeplizard.com/learn/video/AiyK0idr4uM

Page 6: Applied Math for Machine Learning - AIoT Lab

The Matrix

Page 7: Applied Math for Machine Learning - AIoT Lab

Matrix

β€’ Define a matrix with m rows and n columns:

Santanu Pattanayak, ”Pro Deep Learning with TensorFlow,” Apress, 2017

Page 8: Applied Math for Machine Learning - AIoT Lab

Matrix Operations

β€’ Addition and Subtraction

Page 9: Applied Math for Machine Learning - AIoT Lab

Matrix Multiplication

β€’ Two matrices A and B, where

β€’ The columns of A must be equal to the rows of B, i.e. n == p

β€’ A * B = C, where

β€’m

n

p

qq

m

Page 10: Applied Math for Machine Learning - AIoT Lab

Example of Matrix Multiplication (3-1)

https://www.mathsisfun.com/algebra/matrix-multiplying.html

Page 11: Applied Math for Machine Learning - AIoT Lab

Example of Matrix Multiplication (3-2)

https://www.mathsisfun.com/algebra/matrix-multiplying.html

Page 12: Applied Math for Machine Learning - AIoT Lab

Example of Matrix Multiplication (3-3)

https://www.mathsisfun.com/algebra/matrix-multiplying.html

Page 13: Applied Math for Machine Learning - AIoT Lab

Matrix Transpose

https://en.wikipedia.org/wiki/Transpose

Page 14: Applied Math for Machine Learning - AIoT Lab

Dot Product

β€’ Dot product of two vectors become a scalar

β€’ Inner product is a generalization of the dot product

β€’ Notation: 𝑣1 βˆ™ 𝑣2 or 𝑣1𝑇𝑣2

Page 15: Applied Math for Machine Learning - AIoT Lab

Dot Product in a Matrix

Page 16: Applied Math for Machine Learning - AIoT Lab

Outer Product

https://en.wikipedia.org/wiki/Outer_product

Page 17: Applied Math for Machine Learning - AIoT Lab

Linear Independence

β€’ A vector is linearly dependent on other vectors if it can be expressed as the linear combination of other vectors

β€’ A set of vectors 𝑣1, 𝑣2,β‹― , 𝑣𝑛 is linearly independent if π‘Ž1𝑣1 +π‘Ž2𝑣2 +β‹―+ π‘Žπ‘›π‘£π‘› = 0 implies all π‘Žπ‘– = 0, βˆ€π‘– ∈ {1,2,⋯𝑛}

Page 18: Applied Math for Machine Learning - AIoT Lab

Span the Vector Space

β€’ n linearly independent vectors can span n-dimensional space

Page 19: Applied Math for Machine Learning - AIoT Lab

Rank of a Matrix

β€’ Rank is:βˆ’ The number of linearly independent row or column vectors

βˆ’ The dimension of the vector space generated by its columns

β€’ Row rank = Column rank

β€’ Example:

https://en.wikipedia.org/wiki/Rank_(linear_algebra)

Row-echelon

form

Page 20: Applied Math for Machine Learning - AIoT Lab

Identity Matrix Iβ€’ Any vector or matrix multiplied by I remains unchanged

β€’ For a matrix 𝐴, 𝐴𝐼 = 𝐼𝐴 = 𝐴

Page 21: Applied Math for Machine Learning - AIoT Lab

Inverse of a Matrix

β€’ The product of a square matrix 𝐴 and its inverse matrix π΄βˆ’1

produces the identity matrix 𝐼

β€’ π΄π΄βˆ’1 = π΄βˆ’1𝐴 = 𝐼

β€’ Inverse matrix is square, but not all square matrices has inverses

Page 22: Applied Math for Machine Learning - AIoT Lab

Pseudo Inverse

β€’ Non-square matrix and have left-inverse or right-inverse matrix

β€’ Example:

βˆ’ Create a square matrix 𝐴𝑇𝐴

βˆ’ Multiplied both sides by inverse matrix (𝐴𝑇𝐴)βˆ’1

βˆ’ (𝐴𝑇𝐴)βˆ’1𝐴𝑇 is the pseudo inverse function

𝐴π‘₯ = 𝑏, 𝐴 ∈ β„π‘šΓ—π‘›, 𝑏 ∈ ℝ𝑛

𝐴𝑇𝐴π‘₯ = 𝐴𝑇𝑏

π‘₯ = (𝐴𝑇𝐴)βˆ’1𝐴𝑇𝑏

Page 23: Applied Math for Machine Learning - AIoT Lab

Norm

β€’ Norm is a measure of a vector’s magnitude

β€’ 𝑙2 norm

β€’ 𝑙1 norm

β€’ 𝑙𝑝 norm

β€’ π‘™βˆž norm

Page 24: Applied Math for Machine Learning - AIoT Lab

Eigen Vectors

β€’ Eigenvector is a non-zero vector that changed by only a scalar factor Ξ»when linear transform 𝐴 is applied to:

β€’ π‘₯ are Eigenvectors and πœ† are Eigenvalues

β€’ One of the most important concepts in machine learning, ex:βˆ’ Principle Component Analysis (PCA)

βˆ’ Eigenvector centrality

βˆ’ PageRank

βˆ’ …

𝐴π‘₯ = πœ†π‘₯, 𝐴 ∈ ℝ𝑛×𝑛, π‘₯ ∈ ℝ𝑛

Page 25: Applied Math for Machine Learning - AIoT Lab

Example: Shear Mappingβ€’ Horizontal axis is the

Eigenvector

Page 26: Applied Math for Machine Learning - AIoT Lab

Principle Component Analysis (PCA)

β€’ Eigenvector of Covariance Matrix

https://en.wikipedia.org/wiki/Principal_component_analysis

Page 27: Applied Math for Machine Learning - AIoT Lab

NumPy for Linear Algebra

β€’ NumPy is the fundamental package for scientific computing with Python. It contains among other things:βˆ’a powerful N-dimensional array objectβˆ’sophisticated (broadcasting) functionsβˆ’tools for integrating C/C++ and Fortran codeβˆ’useful linear algebra, Fourier transform, and random

number capabilities

Page 28: Applied Math for Machine Learning - AIoT Lab

Create Tensors

Scalars (0D tensors) Vectors (1D tensors) Matrices (2D tensors)

Page 29: Applied Math for Machine Learning - AIoT Lab

Create 3D Tensor

Page 30: Applied Math for Machine Learning - AIoT Lab

Attributes of a Numpy Tensor

β€’ Number of axes (dimensions, rank)βˆ’ x.ndim

β€’ Shapeβˆ’ This is a tuple of integers showing how many data the tensor has along each axis

β€’ Data typeβˆ’ uint8, float32 or float64

Page 31: Applied Math for Machine Learning - AIoT Lab

Numpy Multiplication

Page 32: Applied Math for Machine Learning - AIoT Lab

Unfolding the Manifold

β€’ Tensor operations are complex geometric transformation in high-dimensional spaceβˆ’ Dimension reduction

Page 33: Applied Math for Machine Learning - AIoT Lab

Basics of Probability

Page 34: Applied Math for Machine Learning - AIoT Lab

Three Axioms of Probability

β€’ Given an Event 𝐸 in a sample space 𝑆, S = 𝑖=1ڂ𝑁 𝐸𝑖

β€’ First axiomβˆ’ 𝑃 𝐸 ∈ ℝ, 0 ≀ 𝑃(𝐸) ≀ 1

β€’ Second axiomβˆ’ 𝑃 𝑆 = 1

β€’ Third axiomβˆ’ Additivity, any countable sequence of mutually exclusive events πΈπ‘–βˆ’ 𝑃 𝑖=1Ϊ‚

𝑛 𝐸𝑖 = 𝑃 𝐸1 + 𝑃 𝐸2 +β‹―+ 𝑃 𝐸𝑛 = σ𝑖=1𝑛 𝑃 𝐸𝑖

Page 35: Applied Math for Machine Learning - AIoT Lab

Union, Intersection, and Conditional Probability β€’ 𝑃 𝐴 βˆͺ 𝐡 = 𝑃 𝐴 + 𝑃 𝐡 βˆ’ 𝑃 𝐴 ∩ 𝐡

β€’ 𝑃 𝐴 ∩ 𝐡 is simplified as 𝑃 𝐴𝐡

β€’ Conditional Probability 𝑃 𝐴|𝐡 , the probability of event A given B has occurred

βˆ’ 𝑃 𝐴|𝐡 = 𝑃𝐴𝐡

𝐡

βˆ’ 𝑃 𝐴𝐡 = 𝑃 𝐴|𝐡 𝑃 𝐡 = 𝑃 𝐡|𝐴 𝑃(𝐴)

Page 36: Applied Math for Machine Learning - AIoT Lab

Chain Rule of Probability

β€’ The joint probability can be expressed as chain rule

Page 37: Applied Math for Machine Learning - AIoT Lab

Mutually Exclusive

β€’ 𝑃 𝐴𝐡 = 0

β€’ 𝑃 𝐴 βˆͺ 𝐡 = 𝑃 𝐴 + 𝑃 𝐡

Page 38: Applied Math for Machine Learning - AIoT Lab

Independence of Events

β€’ Two events A and B are said to be independent if the probability of their intersection is equal to the product of their individual probabilitiesβˆ’ 𝑃 𝐴𝐡 = 𝑃 𝐴 𝑃 𝐡

βˆ’ 𝑃 𝐴|𝐡 = 𝑃 𝐴

Page 39: Applied Math for Machine Learning - AIoT Lab

Bayes Rule

𝑃 𝐴|𝐡 =𝑃 𝐡|𝐴 𝑃(𝐴)

𝑃(𝐡)

Proof:

Remember 𝑃 𝐴|𝐡 = 𝑃𝐴𝐡

𝐡

So 𝑃 𝐴𝐡 = 𝑃 𝐴|𝐡 𝑃 𝐡 = 𝑃 𝐡|𝐴 𝑃(𝐴)Then Bayes 𝑃 𝐴|𝐡 = 𝑃 𝐡|𝐴 𝑃(𝐴)/𝑃 𝐡

Page 40: Applied Math for Machine Learning - AIoT Lab

NaΓ―ve Bayes Classifier

Page 41: Applied Math for Machine Learning - AIoT Lab

NaΓ―ve = Assume All Features Independent

Page 42: Applied Math for Machine Learning - AIoT Lab

Normal (Gaussian) Distributionβ€’ One of the most important distributions

β€’ Central limit theoremβˆ’ Averages of samples of observations of random variables independently drawn from

independent distributions converge to the normal distribution

Page 43: Applied Math for Machine Learning - AIoT Lab
Page 44: Applied Math for Machine Learning - AIoT Lab

Differentiation

OR

Page 45: Applied Math for Machine Learning - AIoT Lab

Derivatives of Basic Function 𝑑𝑦

𝑑π‘₯

Page 46: Applied Math for Machine Learning - AIoT Lab

Gradient of a Functionβ€’ Gradient is a multi-variable generalization of the derivative

β€’ Apply partial derivatives

β€’ Example

Page 47: Applied Math for Machine Learning - AIoT Lab

Chain Rule

47

Page 48: Applied Math for Machine Learning - AIoT Lab

Maxima and Minima for Univariate Function

β€’ If 𝑑𝑓(π‘₯)

𝑑π‘₯= 0, it’s a minima or a maxima point, then we study the

second derivative:

βˆ’ If 𝑑2𝑓(π‘₯)

𝑑π‘₯2< 0 => Maxima

βˆ’ If 𝑑2𝑓(π‘₯)

𝑑π‘₯2> 0 => Minima

βˆ’ If 𝑑2𝑓(π‘₯)

𝑑π‘₯2= 0 => Point of reflection

Minima

Page 49: Applied Math for Machine Learning - AIoT Lab

Gradient Descent

Page 50: Applied Math for Machine Learning - AIoT Lab

Gradient Descent along a 2D Surface

Page 51: Applied Math for Machine Learning - AIoT Lab

Avoid Local Minimum using Momentum

Page 52: Applied Math for Machine Learning - AIoT Lab

Optimization

https://en.wikipedia.org/wiki/Optimization_problem

Page 53: Applied Math for Machine Learning - AIoT Lab

Principle Component Analysis (PCA)

β€’ Assumptionsβˆ’ Linearity

βˆ’ Mean and Variance are sufficient statistics

βˆ’ The principal components are orthogonal

Page 54: Applied Math for Machine Learning - AIoT Lab

Principle Component Analysis (PCA)

max. cov 𝐘, 𝐘

𝑠. 𝑏. 𝑑 𝐖T𝐖 = 𝐈

Page 55: Applied Math for Machine Learning - AIoT Lab

References

β€’ Francois Chollet, β€œDeep Learning with Python,” Chapter 2 β€œMathematical Building Blocks of Neural Networks”

β€’ Santanu Pattanayak, ”Pro Deep Learning with TensorFlow,” Apress, 2017

β€’ Machine Learning Cheat Sheet

β€’ https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

β€’ https://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization-How-does-it-solve-the-problem-of-overfitting-Which-regularizer-to-use-and-when

β€’ Wikipedia