Deep Learning intro. - Kangwoncs.kangwon.ac.kr/.../12_deeplearning_intro.pdf · 2016-06-17 · 𝑖...

Preview:

Citation preview

𝑠𝑖𝑔𝑚𝑎 𝜶

Deep Learning intro.

𝑠𝑖𝑔𝑚𝑎 𝜶

2016.01.02.

𝑠𝑖𝑔𝑚𝑎 𝜶 2

Outline

Natural Language Processing (NLP)

Representation and Processing

Deep Learning Models

𝑠𝑖𝑔𝑚𝑎 𝜶

Natural Language Processing

𝑠𝑖𝑔𝑚𝑎 𝜶 4

Natural Language Processing (NLP)

• 답변

• 검색

• 추론

• 대화

언어이해 언어생성응용• 지능형로봇

• 정보검색

• 기계번역

• 문서요약

• 질문

• 단어이해

• 의미이해

• 의도파악

𝑠𝑖𝑔𝑚𝑎 𝜶

Representation and Processing

𝑠𝑖𝑔𝑚𝑎 𝜶 6

Representation in mathematics

<0.156, 0.421, 0.954, …>

<0.096, 0.510, 0.991, …>

<0.496, 0.951, 0.321, …>

<0.196, 0.851, 0.119, …>

<…, 0.486, 0.854, …>

<…, 0.751, 0.912, …>

<…, 0.123, 2.554, 5.124, …>

<…, 7.451, 21.45, 8.999>

<…, 1.109, 11.854, 0.456>

Real World Vector Space

https://www.google.com/imghp?hl=ko

𝑠𝑖𝑔𝑚𝑎 𝜶 7

오리 vs. 토끼

𝑠𝑖𝑔𝑚𝑎 𝜶 8

위장

𝑠𝑖𝑔𝑚𝑎 𝜶 9

Neural Network for Human

https://uncyclopedia.kr/wiki/%EB%87%8C

Neural Network

Pattern recognition

Multi layer

Human: 10 layers

I see lion

𝑠𝑖𝑔𝑚𝑎 𝜶 10

Neural Network

Vector representation

Pattern of layers

+ Learning

𝑠𝑖𝑔𝑚𝑎 𝜶 11

Pattern of layers

Deep learning automatic pattern combination

Why we say deep ?

… … … … … …

Unit

layer

n

m

Connection link: (n x n) x (m-1)

Automatic combination

𝑠𝑖𝑔𝑚𝑎 𝜶 12

How to use layers?

Input vector

Output real number or class (vector)

Vector representation “One-hot”

𝑠𝑖𝑔𝑚𝑎 𝜶 13

Vector representation

[Symbol]

Lion[Text representation] [One-hot representation]

<0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …>

[Symbol representation]

<1.45, 75.12, 0.425, 0.953, …>

𝑠𝑖𝑔𝑚𝑎 𝜶 14

Jung, DEEP LEARNING FOR KOREAN NLP

𝑠𝑖𝑔𝑚𝑎 𝜶 15

How to define symbol to one-hot

Lion

Big cat

[Symbolic words]

<0, 0, 1, 0, 0>

<0, 1, 0, 0, 1>

[One-hot]

If it uses AND op., two words is non-match

∴ we need symbolic vector representation

𝑠𝑖𝑔𝑚𝑎 𝜶 16

How to define symbol to one-hot

Lion

Big cat

TigerDog

Wolf

Mouse

∴ [Symbolic representation]

<0, 0, 1, 0, 0>

<0, 1, 0, 0, 1>

<1.45, 75.12, 0.425, 0.953, …>

<1.78, 61.11, 0.611, 2.011, …>

Use cosine similarity

[Symbolic vectors] (from NNLM)

𝑠𝑖𝑔𝑚𝑎 𝜶 17

Neural Network Language Model

Feed-forward NN

parametric Estimator

overall parameter set 𝜃 = (𝐶,𝑤)

one-hot representation• [0 1 0 0 0 0 0 0 0 0]

Lookup Table• word embedding

Non-linear projection• activation function

Normalize weight• softmax (length: 𝑛)

𝑠𝑖𝑔𝑚𝑎 𝜶 18

Neural Network Language Model

max𝜃 → 𝑙𝑜𝑑 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑

𝐿 = max𝜃

1

𝑇 𝑡 𝑙𝑜𝑔𝑓(𝑤𝑡, 𝑤𝑡−1, … , 𝑤𝑡−𝑛+1)

parameters• ℎ: 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠

• 𝑚: 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑤𝑖𝑡ℎ 𝑒𝑎𝑐ℎ 𝑤𝑜𝑟𝑑

• 𝑏: 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑏𝑖𝑎𝑠𝑒𝑠

• 𝑑: 𝑡ℎ𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟 𝑏𝑖𝑎𝑠𝑒𝑠

• 𝑈: ℎ − 𝑡𝑜 − 𝑜 𝑤𝑒𝑖𝑔ℎ𝑡𝑠

• 𝑊: 𝐼 − 𝑡𝑜 − 𝑜 𝑤𝑒𝑖𝑔ℎ𝑡𝑠

• 𝐻: 𝐼 − 𝑡𝑜 − 𝐻 𝑤𝑒𝑖𝑔ℎ𝑡𝑠

• 𝐶:𝑤𝑜𝑟𝑑 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑙𝑜𝑜𝑘𝑢𝑝 𝑡𝑎𝑏𝑙𝑒)

• 𝜃 = (𝑏, 𝑑,𝑊, 𝑈,𝐻, 𝐶)

𝑠𝑖𝑔𝑚𝑎 𝜶 19

NNLM for Korean

Leeck, 딥러닝을이용한한국어의존구문분석

𝑠𝑖𝑔𝑚𝑎 𝜶

Deep Learning Models

𝑠𝑖𝑔𝑚𝑎 𝜶 21

Deep learning Models

“강대주변에스타벅스위치가어디야?”• 강대/NNG 주변/NNG 에/JX 스타벅스/NNG …

Feed-forward Neural Network (FFNN)

𝑊𝑡

Y

강대

NNG

주변

NNG

JX

FFNN:

1-FFNN 2-FFNN 3-FFNN

𝑠𝑖𝑔𝑚𝑎 𝜶 22

Deep learning Models

“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]

• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]

Recurrent Neural Network (RNN)

𝑊𝑡

Y

unfold 강대

B

주변

I

I

스타벅스

I

위치

I

RNN

𝑠𝑖𝑔𝑚𝑎 𝜶 23

Deep learning Models

“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]

• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]

Long Short-Term Memory RNN (LSTM-RNN)• Using gate matrix (LSTM or GRU)

𝑊𝑡

Y

unfold 강대

B

주변

I

I

스타벅스

I

위치

I

LSTM-RNN

𝑠𝑖𝑔𝑚𝑎 𝜶 24

Deep learning Models

“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]

• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]

LSTM-RNN CRF • Using gate matrix (LSTM or GRU)

𝑊𝑡

Y

unfold 강대

B

주변

I

I

스타벅스

I

위치

I

LSTM-RNN

Viterbi or Beam search

𝑠𝑖𝑔𝑚𝑎 𝜶 25

Deep learning Models

“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]

• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]

Bidirectional LSTM-RNN CRF (Bi-LSTM-RNN CRF)• Using gate matrix (LSTM or GRU)

Viterbi or Beam search

강대

B

주변

I

I

스타벅스

I

위치

I

forward

backward

𝑠𝑖𝑔𝑚𝑎 𝜶 26

Deep learning Models

Sequence-to-sequence model

Two different LSTM: Input/output sentence LSTM

Using the Shallow LSTM

Reverse input sentence

Training: Decoding & Rescoring

𝑠𝑖𝑔𝑚𝑎 𝜶 27

Deep learning Models

Encoder-Decoder Architecture

𝑠𝑖𝑔𝑚𝑎 𝜶 28

Pointer Networks

• Seq2seq와 attention mechanism 을기반으로한딥러닝모델

• 입력열의위치(인덱스)를출력열로하는모델

• X = {A:0, B:1, C:2, D:3, <EOS>:4}

• Y = {3, 2, 0, 4}

A B C D <EOS> D C A <EOS>

Encoding Decoding

Deep learning Models

𝑠𝑖𝑔𝑚𝑎 𝜶 29

Deep learning Models

Siamese Neural Network

𝑠𝑖𝑔𝑚𝑎 𝜶 30

References

Jung, DEEP LEARNING FOR KOREAN NLP

Lee, 딥러닝을이용한한국어의존구문분석

Park, Point networks for Coreference Resolution

Park, Bi-LSTM-RNN CRF for Mention Detection

𝑠𝑖𝑔𝑚𝑎 𝜶 31

QA

감사합니다.

박천음, 최수길, 박찬민, 최재혁, 홍다솔

𝑠𝑖𝑔𝑚𝑎 𝜶 , 강원대학교

Email: parkce3@gmail.ac.kr

Recommended