41
DS 4400 Alina Oprea Associate Professor, CCIS Northeastern University September 11 2018 Machine Learning and Data Mining I

Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

DS 4400

Alina Oprea

Associate Professor, CCIS

Northeastern University

September 11 2018

Machine Learning and Data Mining I

Page 2: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Class Outline• Introduction – 1 week

– Probability and linear algebra review

• Supervised learning - 5 weeks– Linear regression– Classification (logistic regression, LDA, kNN, decision trees,

random forest, SVM, Naïve Bayes)– Model selection, regularization, cross validation

• Neural networks and deep learning – 1.5 weeks– Back-propagation, gradient descent– NN architectures

• Unsupervised learning – 2.5 weeks– Dimensionality reduction (PCA)– Clustering (k-means, hierarchical)

• Adversarial ML – 1 week– Security of ML at testing and training time

2

Page 3: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Grading

• Assignments – 20%– 4-5 assignments based on studied material in class,

including programming exercises– Language: R or Python; Jupyter notebooks

• Final project – 25%– Select your own project based on public dataset– Submit short project proposal and milestone– Presentation at end of class (10 min) and report

• Exams – 50%– Midterm – 25%– Final exam – 25%

• Class participation – 5%– Participate in class discussion and on Piazza

3

Page 4: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Supervised Learning

DataPre-

processingFeature

extractionLearning

model

Training

Labeled(Typically)

ClassificationRegression

Testing

New data

Unlabeled

Learning model

Predictions

HealthySick

Normalization Selection

PriceRisk score

Classification Regression

4

Page 5: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

5

Training

Testing

Hypothesis space

መ𝑓 model

Page 6: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Review

• ML is a subset of AI designing learning algorithms

• Learning tasks are supervised (e.g., classification and regression) or unsupervised (e.g., clustering)

– Supervised learning uses labeled training data

• Learning the “best” model is challenging

– Select hypothesis space and loss function

– Design algorithm to min loss function (error on training)

– Bias-Variance tradeoff

– Need to generalize on new, unseen test data

– Occam’s razor (prefer simplest model with good performance)

6

Page 7: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Outline

• Probability review– Random variables

– Expectation, Variance, CDF, PDF

– Example distributions

– Independence and conditional independence

– Bayes’ Theorem

• Linear algebra review– Matrix, vectors

– Inner products

– Norms

– Distance

7

Page 8: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Probability review

8

Page 9: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Discrete Random Variables

9

Page 10: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Visualizing A

10

Page 11: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Axioms of Probability

11

Page 12: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Interpreting the Axioms

12

Page 13: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Interpreting the Axioms

13

Page 14: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Interpreting the Axioms

14

Page 15: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

The union bound

• For events A and BP[ A ∪ B ] ≤ P[A] + P[B]

Axiom: P[ A ∪ B ] = P[A] + P[B] – P[A ∩ B]

If A ∩ B = Φ, then P[ A ∪ B ] = P[A] + P[B]

Example:A1 = { all x in {0,1}n s.t lsb2(x)=11 } ; A2 = { all x in {0,1}n s.t. msb2(x)=11 }

P[ lsb2(x)=11 or msb2(x)=11 ] = P[A1∪A2] ≤ ¼+¼ = ½

AB

15

𝑈

Page 16: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Negation Theorem

16

Page 17: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Marginalization

17

Page 18: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Random Variables (Discrete)

Def: a random variable X is a function X:U⟶V Def: A discrete random variable takes a finite number of values: |V| is finite

Example: X is modeling a coin toss with output 1 (heads) or 0 (tail)Pr[X=1] = p, Pr[X=0] = 1-p

We write X ⟵ U to denote a uniform random variable (discrete) over U

for all u∈U: Pr[ X = u ] = 1/|U|

Example: If p=1/2; then X is a uniform coin toss

Probability Mass Function (PMF): p(u) = Pr[X = u]

18

Bernoulli Random Variable

Page 19: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Example

1. X is the number of heads in a sequence of ncoin tosses

What is the probability P[𝑋 = 𝑘]?

P 𝑋 = 𝑘 = (𝑛𝑘) 𝑝𝑘 1 − 𝑝 𝑛−𝑘 Binomial Random Variable

2. X is the sum of two fair diceWhat is the probability P[𝑋 = 𝑘] for 𝑘 ∈ {2,… , 12}?

P[X=2]=1/36; P[X=3]=2/36; P[X=4]= 3/36For what k is P[𝑋 = 𝑘] highest?

19

Page 20: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Example discrete RVs

20

Page 21: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Multi-Value Random Variable

21

Page 22: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Multi-Value Random Variable

22

Page 23: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Continuous Random Variables

• X:U⟶V is continuous RV if it takes infinite number of values

• The cumulative distribution function CDF F: R ⟶ {0,1} for X is defined for every value x by:

F(x) = Pr(X x)

• The probability distribution function PDF f(x) for X is

f(x) = dF(x)/dx

A pdf and associated cdf

Increasing

23

Page 24: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Example continuous RV

24

Page 25: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Example CDFs and PDFs

25

Page 26: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Expectation and variance

Expectation for discrete random variable X

Properties• 𝐸 𝑎𝑔 𝑋 = 𝑎 𝐸 𝑔 𝑋• Linearity: 𝐸 𝑓 𝑋 + 𝑔 𝑋 = 𝐸 𝑓 𝑋 + 𝐸 𝑔 𝑋

Variance

26

Page 27: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Continuous RV

Expectation for continuous random variable 𝑋

Variance is similar!

Example: Let 𝑋 be uniform RV on [a,b]• What is the CDF and PDF?• Compute the expectation and variance of 𝑋

27

Page 28: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Conditional Probability

Def: Events A and B are independent if and only if Pr[ A ∩ B ] = Pr[A] ∙ Pr[B]

If 𝐴 and 𝐵 are independent

Pr[𝐴|𝐵] =Pr 𝐴 ∩ 𝐵

Pr[𝐵]=Pr 𝐴]Pr[𝐵

Pr[𝐵]= Pr[A]

28

Page 29: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Inference from Conditional Probability

29

Page 30: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Inference from Conditional Probability

30

Page 31: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Inference from Conditional Probability

31

Page 32: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Bayes’ Rule

32

Page 33: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Linear algebra

33

Page 34: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Vectors and matrices

• Vector in Rn is an ordered set of n real numbers.

– e.g. v = (1,6,3,4) is in R4

– A column vector:

– A row vector:

• m-by-n matrix is an object in Rmxn with m rows and n columns, each entry filled with a (typically) real number:

4

3

6

1

( )4361

239

6784

821

34

Page 35: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Vector norms: A norm of a vector ||x|| is informally a measure of the “length” of the vector.

– Common norms: L1, L2 (Euclidean)

– Linfinity

Norms

35

Page 36: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Vector products

• Vector dot (inner) product:

• Vector outer product:

We will use lower case letters for vectors The elements are

referred by xi.

If u•v=0, ||u||2 != 0, ||v||2 != 0 → u and v are orthogonal

If u•v=0, ||u||2 = 1, ||v||2 = 1 → u and v are orthonormal

36

Page 37: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Matrix multiplication

• Matrix product:

++

++=

=

=

2222122121221121

2212121121121111

2221

1211

2221

1211,

babababa

babababaAB

bb

bbB

aa

aaA

We will use upper case letters for matrices. The elements

are referred by Ai,j.

e.g.

37

Page 38: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Properties

• Associativity

𝐴𝐵 𝐶 = 𝐴 𝐵𝐶

• Distributivity

𝐴 𝐵 + 𝐶 = 𝐴𝐵 + 𝐴𝐶

• Commutativity

𝐴𝐵 = 𝐵𝐴

38

Page 39: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Special matrices

f

ed

cba

00

0

c

b

a

00

00

00

ji

hgf

edc

ba

00

0

0

00

fed

cb

a

0

00

Diagonal Upper-triangular

Tri-diagonal Lower-triangular

100

010

001

I (Identity matrix)

39

Page 40: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

Matrix transpose

Transpose: You can think of it as – “flipping” the rows and columns

OR – “reflecting” vector/matrix on line

( )bab

aT

=

=

db

ca

dc

baT

e.g.

A is a symmetric matrix if 𝐴 = 𝐴𝑇40

Page 41: Machine Learning and Data Mining I · • Final project –25% –Select your own project based on public dataset –Submit short project proposal and milestone –Presentation at

References

Probability

• Review notes from Stanford's machine learning class

• Sam Roweis's probability review

Linear algebra

• Review notes from Stanford's machine learning class

• Sam Roweis's linear algebra review

41