41
Machine/Deep Learning with Theano Softmax classification : Multinomial classification Application & Tips : Learning rate, data preprocessing, overfitting Deep Neural Nets for Everyone

Multinomial classification and application of ML

  • Upload
    -

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multinomial classification and application of ML

Machine/Deep Learningwith Theano

Softmax classification : Multinomial classificationApplication & Tips : Learning rate, data preprocessing, overfitting

Deep Neural Nets for Everyone

Page 2: Multinomial classification and application of ML

Multinomial Classification

Softmax classification

Page 3: Multinomial classification and application of ML

Logistic Regression

𝐻 𝐿(𝑋 )=𝑊𝑋

𝐻 𝐿 ( 𝑋 )=𝑍

𝑔 (𝑍 )=1

1+𝑒−𝑍

𝐻𝑅 ( 𝑋 )=𝑔 (𝐻 𝐿(𝑋 ))

𝑋

𝑊

𝑍 𝑌

: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )

Page 4: Multinomial classification and application of ML

Binomial Classification

?왼쪽의 그림은 원 일까 ?

yes/no

Page 5: Multinomial classification and application of ML

Binomial Classification

리의 경향성선의 경향성

𝑥1

𝑥2

원𝑋

𝑊

𝑍 𝑌

다각형

Page 6: Multinomial classification and application of ML

Multinomial Classification

ABC

?왼쪽의 그림은 A/B/C 중 무엇일까 ?

Page 7: Multinomial classification and application of ML

𝑥1

𝑥2

AB C

Multinomial Classification

Page 8: Multinomial classification and application of ML

𝑥1

𝑥2

AB C

Multinomial Classification

𝑋

𝑊

𝑍 𝑌

A?

Page 9: Multinomial classification and application of ML

𝑥1

𝑥2

AB C

Multinomial Classification

𝑋

𝑊

𝑍 𝑌

?B

Page 10: Multinomial classification and application of ML

𝑥1

𝑥2

AB C

Multinomial Classification

𝑋

𝑊

𝑍 𝑌

?C

Page 11: Multinomial classification and application of ML

𝑥1

𝑥2

AB C

Multinomial Classification

𝑋

𝑊

𝑍 𝑌

𝑋

𝑊

𝑍 𝑌

𝑋

𝑊

𝑍 𝑌A?

B?

C?

Page 12: Multinomial classification and application of ML

Multinomial Classification

𝑋

𝑊

𝑍[𝑤1 𝑤2 𝑤3 ][𝑥1𝑥2𝑥3] ¿ [𝑤1𝑥1+𝑤2 𝑥2+𝑤3𝑥3 ]

Page 13: Multinomial classification and application of ML

Multinomial Classification

𝑋

𝑊

𝑍 [𝑤 𝐴1 𝑤𝐴2 𝑤 𝐴3 ] [𝑥1𝑥2𝑥3] ¿ [𝑤 𝐴1𝑥1+𝑤 𝐴2𝑥2+𝑤𝐴 3𝑥3 ]

𝑋

𝑊

𝑍 [𝑤𝐵1 𝑤𝐵2 𝑤𝐵 3 ] [𝑥1𝑥2𝑥3] ¿ [𝑤𝐵1𝑥1+𝑤𝐵2𝑥2+𝑤𝐵3𝑥3 ]

𝑋

𝑊

𝑍 [𝑤𝐶 1 𝑤𝐶 2 𝑤𝐶 3 ] [𝑥1𝑥2𝑥3] ¿ [𝑤𝐶 1𝑥1+𝑤𝐶 2𝑥2+𝑤𝐶 3 𝑥3 ]

Page 14: Multinomial classification and application of ML

Multinomial Classification

𝑋

𝑊

𝑍

[𝑤𝐴1 𝑥1+𝑤 𝐴2𝑥2+𝑤𝐴3 𝑥3𝑤𝐵 1𝑥1+𝑤𝐵 2𝑥2+𝑤𝐵 3𝑥3𝑤𝐶 1𝑥1+𝑤𝐶 2𝑥2+𝑤𝐶 3𝑥3 ]𝑋

𝑊

𝑍 [𝑥1𝑥2𝑥3]¿𝑋

𝑊

𝑍

[𝑤𝐴1 𝑤 𝐴2 𝑤 𝐴3

𝑤𝐵1 𝑤𝐵2 𝑤𝐵 3𝑤𝐶 1 𝑤𝐶 2 𝑤𝐶 3

]

[𝐻 𝐴(𝑋 )𝐻𝐵(𝑋 )𝐻𝐶 (𝑋 )]¿

Page 15: Multinomial classification and application of ML

Multinomial Classification

[𝐻 𝐴(𝑋 )𝐻𝐵(𝑋 )𝐻𝐶 (𝑋 )] [ 1505−0.1]

example ABC

How Simi-lar?

¿

Page 16: Multinomial classification and application of ML

Multinomial Classification : Softmax Function

Score Probability

𝑯 𝑨 ( 𝑿 )=𝒁 𝑨

𝑯 𝑩 ( 𝑿 )=𝒁𝑩

𝑯𝑪 ( 𝑿 )=𝒁𝑪

𝒀 𝑨

𝒀 𝑩

𝒀 𝑪

(2) (1)

Page 17: Multinomial classification and application of ML

Multinomial Classification

𝑋𝑊 𝐴

𝑍 𝐴

𝑋𝑊 𝐵

𝑍𝐵

𝑋𝑊 𝐶

𝑍𝐶

ABC

softmax hot encoding(find maximum)

1.0

0 .0

0 .0

𝑌 𝐵

𝑌 𝑐

𝑌 𝐴0 .8

0 .15

0 .05

Page 18: Multinomial classification and application of ML

Cost Function

Cross Entropy Function

Page 19: Multinomial classification and application of ML

Entropy Function

(Information) Entropy

𝐻 (𝑝 )=−∑ 𝑝 (𝑥) log𝑝 (𝑥)

• 확률 분포 p 에 담긴 불확실성을 나타내는 지표

• 이 값이 클 수록 일정한 방향성과 규칙성이 없는 chaos

• p 라는 대상을 표현하기위해 필요한 정보량 (bit)

Page 20: Multinomial classification and application of ML

Cross Entropy Function

Cross Entropy

𝐻 (𝑝 ,𝑞 )=−∑ 𝑝 (𝑥) log𝑞(𝑥 )

• 두 확률 분포 p, q 사이에 존재하는 정보량을 계산하는 방법

•  p->q 로 정보를 바꾸기 위해 필요한 정보량 (bit)

Page 21: Multinomial classification and application of ML

Cross Entropy Cost Function

𝑋𝑊 𝐴

𝑍 𝐴

𝑋𝑊 𝐵

𝑍𝐵

𝑋𝑊 𝐶

𝑍𝐶

𝑌 𝐴

𝑌 𝐵

𝑌 𝑐

: Prediction ( 0 ~ 1 ) : Real Value ( 0 or 1 )𝐷 (𝑌 𝑖 ,𝑌 𝑖 )=−∑ 𝑌 𝑖 log𝑌 𝑖

Page 22: Multinomial classification and application of ML

Cross Entropy Cost Function

[𝑌 𝐴

𝑌 𝐵

𝑌 𝐶]=[100] [𝑌 𝐴

𝑌 𝐵𝑌 𝐶

]=[100]

𝐷 (𝑌 𝑖 ,𝑌 𝑖)=−∑ 𝑌 𝑖 log𝑌 𝑖

Page 23: Multinomial classification and application of ML

Cross Entropy Cost Function

𝐷 (𝑌 𝑖 ,𝑌 𝑖 )=−∑ 𝑌 𝑖 log𝑌 𝑖

[𝑌 𝐴

𝑌 𝐵

𝑌 𝐶]=[100] [𝑌 𝐴

𝑌 𝐵𝑌 𝐶

]=[010]

Page 24: Multinomial classification and application of ML

Logistic Cost VS Cross Entropy

binomial classification 의 경우 각각 오직 2 가지 경우의 Real Data 와 H(x) 값이 나올 수 있다 . [01 ][10 ]

위 행렬은 다음과 같이 표현 할 수 있다 . [ 𝐻 (𝑥)1−𝐻 (𝑥)]𝐻 (𝑥 ) , 𝑦 {01

[ 𝑦1− 𝑦 ]

Page 25: Multinomial classification and application of ML

Logistic Cost VS Cross Entropy

Cross Entropy Cost Function에 대입하면 𝐻 (𝐻 (𝑥 ), 𝑦 )=−[ 𝑦1− 𝑦 ] ∙ log [ 𝐻 (𝑥 )

1−𝐻 (𝑥 )]

Page 26: Multinomial classification and application of ML

Cross Entropy Cost Function

𝐿= 1𝑁∑

𝑛𝐷𝑛 (𝑌 ,𝑌 )=− 1𝑁∑ (∑𝑌 𝑖 log𝑌 𝑖)

N 개의 training set 에 대한 Cost 들의 합

Page 27: Multinomial classification and application of ML

Application & Tips

Learning RateData Preprocessing

Overshooting

Page 28: Multinomial classification and application of ML

Gradient Descent Function

𝑊=𝑊 −𝛼 𝜕𝜕𝑊 𝐶𝑜𝑠𝑡(𝑊 )

Learning Rate

Page 29: Multinomial classification and application of ML

Learning rate : Overshooting

𝐿(𝑊 )

𝑊

Page 30: Multinomial classification and application of ML

Learning rate : Too small

𝐿(𝑊 )

𝑊

Page 31: Multinomial classification and application of ML

Data Preprocessing

𝐿(𝑊 )

𝑊 𝑤1

𝑤2

Page 32: Multinomial classification and application of ML

Data Preprocessing

𝑤1

𝑤2

𝑊=𝑊 −𝛼 𝜕𝜕𝑊 𝐶𝑜𝑠𝑡(𝑊 )

변하면서 각 weight 값들에 미치는 영향이 다를 때 적절한 Learning rate 을 찾기가 힘들어진다 .

Page 33: Multinomial classification and application of ML

Data Preprocessing : Standardization

𝑤𝑖 ′=𝑤𝑖−𝜇𝑖

𝜎 𝑖

의 평균

의 표준편차

Page 34: Multinomial classification and application of ML

Overfitting

• training data 에 과도하게 최적화 되는 현상

• real data 에 대해선 잘 동작하지 않는다 .

Page 35: Multinomial classification and application of ML

Overfitting

𝑥2

𝑥1

𝑥2

𝑥1

Page 36: Multinomial classification and application of ML

Overfitting

• 많은 양의 training data 로 학습 시킨다 .

• feature() 의 개수를 줄인다 .

• Regularization

Solution:

Page 37: Multinomial classification and application of ML

Overfitting : Regularization

𝐿= 1𝑁∑

𝑛𝐷𝑛 (𝑌 ,𝑌 )+λ∑𝑊 2

• weight 가 너무 큰 값을 가지지 않도록 한다 . => Cost 함수가 굴곡이 심하지 않도록 조정한다 .

Regularization Strength

Page 38: Multinomial classification and application of ML

Overfitting : Regularization

𝐿= 1𝑁∑

𝑛𝐷𝑛 (𝑌 ,𝑌 )+λ∑𝑊 2

Regularization Strength

Page 39: Multinomial classification and application of ML

Application & Tips

Learning and Test data sets

Page 40: Multinomial classification and application of ML

Training, validation and test sets

• training data 에 대해서는 이미 정답을 memorize 한 상태이기 때문에 실제 real data 에 잘 작동 하는지 확인을 할 수 없다 . => Test data 필요 !

• 학습된 machine 에 대해서 적절한 learning rate 와 regularization strengt 를 찾기 위한 validation 작업이 있어야 한다 . => Validation data 필요 !

Page 41: Multinomial classification and application of ML

Online Learning

Data

Model

• 너무 많은 양의 데이터가 있을 때 , 분할하여 나누어 학습시킨다 .

• Data 가 지속적으로 유입 되는 경우 사용되기도 한다 .