View
2
Download
0
Category
Preview:
Citation preview
𝑠𝑖𝑔𝑚𝑎 𝜶
Deep Learning intro.
𝑠𝑖𝑔𝑚𝑎 𝜶
2016.01.02.
𝑠𝑖𝑔𝑚𝑎 𝜶 2
Outline
Natural Language Processing (NLP)
Representation and Processing
Deep Learning Models
𝑠𝑖𝑔𝑚𝑎 𝜶
Natural Language Processing
𝑠𝑖𝑔𝑚𝑎 𝜶 4
Natural Language Processing (NLP)
• 답변
• 검색
• 추론
• 대화
언어이해 언어생성응용• 지능형로봇
• 정보검색
• 기계번역
• 문서요약
• 질문
• 단어이해
• 의미이해
• 의도파악
𝑠𝑖𝑔𝑚𝑎 𝜶
Representation and Processing
𝑠𝑖𝑔𝑚𝑎 𝜶 6
Representation in mathematics
<0.156, 0.421, 0.954, …>
<0.096, 0.510, 0.991, …>
<0.496, 0.951, 0.321, …>
<0.196, 0.851, 0.119, …>
<…, 0.486, 0.854, …>
<…, 0.751, 0.912, …>
<…, 0.123, 2.554, 5.124, …>
<…, 7.451, 21.45, 8.999>
<…, 1.109, 11.854, 0.456>
Real World Vector Space
https://www.google.com/imghp?hl=ko
𝑠𝑖𝑔𝑚𝑎 𝜶 7
오리 vs. 토끼
𝑠𝑖𝑔𝑚𝑎 𝜶 8
위장
𝑠𝑖𝑔𝑚𝑎 𝜶 9
Neural Network for Human
https://uncyclopedia.kr/wiki/%EB%87%8C
Neural Network
Pattern recognition
Multi layer
Human: 10 layers
I see lion
𝑠𝑖𝑔𝑚𝑎 𝜶 10
Neural Network
Vector representation
Pattern of layers
+ Learning
𝑠𝑖𝑔𝑚𝑎 𝜶 11
Pattern of layers
Deep learning automatic pattern combination
Why we say deep ?
… … … … … …
…
Unit
layer
n
m
Connection link: (n x n) x (m-1)
Automatic combination
𝑠𝑖𝑔𝑚𝑎 𝜶 12
How to use layers?
Input vector
Output real number or class (vector)
Vector representation “One-hot”
𝑠𝑖𝑔𝑚𝑎 𝜶 13
Vector representation
[Symbol]
Lion[Text representation] [One-hot representation]
<0, 0, 0, 0, 0, 1, 0, 0, 0, 0, …>
[Symbol representation]
<1.45, 75.12, 0.425, 0.953, …>
𝑠𝑖𝑔𝑚𝑎 𝜶 14
Jung, DEEP LEARNING FOR KOREAN NLP
𝑠𝑖𝑔𝑚𝑎 𝜶 15
How to define symbol to one-hot
Lion
Big cat
[Symbolic words]
<0, 0, 1, 0, 0>
<0, 1, 0, 0, 1>
[One-hot]
If it uses AND op., two words is non-match
∴ we need symbolic vector representation
𝑠𝑖𝑔𝑚𝑎 𝜶 16
How to define symbol to one-hot
Lion
Big cat
TigerDog
Wolf
Mouse
∴ [Symbolic representation]
<0, 0, 1, 0, 0>
<0, 1, 0, 0, 1>
<1.45, 75.12, 0.425, 0.953, …>
<1.78, 61.11, 0.611, 2.011, …>
Use cosine similarity
[Symbolic vectors] (from NNLM)
𝑠𝑖𝑔𝑚𝑎 𝜶 17
Neural Network Language Model
Feed-forward NN
parametric Estimator
overall parameter set 𝜃 = (𝐶,𝑤)
one-hot representation• [0 1 0 0 0 0 0 0 0 0]
Lookup Table• word embedding
Non-linear projection• activation function
Normalize weight• softmax (length: 𝑛)
𝑠𝑖𝑔𝑚𝑎 𝜶 18
Neural Network Language Model
max𝜃 → 𝑙𝑜𝑑 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
𝐿 = max𝜃
1
𝑇 𝑡 𝑙𝑜𝑔𝑓(𝑤𝑡, 𝑤𝑡−1, … , 𝑤𝑡−𝑛+1)
parameters• ℎ: 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠
• 𝑚: 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑤𝑖𝑡ℎ 𝑒𝑎𝑐ℎ 𝑤𝑜𝑟𝑑
• 𝑏: 𝑡ℎ𝑒 𝑜𝑢𝑡𝑝𝑢𝑡 𝑏𝑖𝑎𝑠𝑒𝑠
• 𝑑: 𝑡ℎ𝑒 ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟 𝑏𝑖𝑎𝑠𝑒𝑠
• 𝑈: ℎ − 𝑡𝑜 − 𝑜 𝑤𝑒𝑖𝑔ℎ𝑡𝑠
• 𝑊: 𝐼 − 𝑡𝑜 − 𝑜 𝑤𝑒𝑖𝑔ℎ𝑡𝑠
• 𝐻: 𝐼 − 𝑡𝑜 − 𝐻 𝑤𝑒𝑖𝑔ℎ𝑡𝑠
• 𝐶:𝑤𝑜𝑟𝑑 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 (𝑙𝑜𝑜𝑘𝑢𝑝 𝑡𝑎𝑏𝑙𝑒)
• 𝜃 = (𝑏, 𝑑,𝑊, 𝑈,𝐻, 𝐶)
𝑠𝑖𝑔𝑚𝑎 𝜶 19
NNLM for Korean
Leeck, 딥러닝을이용한한국어의존구문분석
𝑠𝑖𝑔𝑚𝑎 𝜶
Deep Learning Models
𝑠𝑖𝑔𝑚𝑎 𝜶 21
Deep learning Models
“강대주변에스타벅스위치가어디야?”• 강대/NNG 주변/NNG 에/JX 스타벅스/NNG …
Feed-forward Neural Network (FFNN)
𝑊𝑡
Y
강대
NNG
주변
NNG
에
JX
FFNN:
1-FFNN 2-FFNN 3-FFNN
𝑠𝑖𝑔𝑚𝑎 𝜶 22
Deep learning Models
“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]
• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]
Recurrent Neural Network (RNN)
𝑊𝑡
Y
unfold 강대
B
주변
I
에
I
스타벅스
I
위치
I
RNN
𝑠𝑖𝑔𝑚𝑎 𝜶 23
Deep learning Models
“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]
• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]
Long Short-Term Memory RNN (LSTM-RNN)• Using gate matrix (LSTM or GRU)
𝑊𝑡
Y
unfold 강대
B
주변
I
에
I
스타벅스
I
위치
I
LSTM-RNN
𝑠𝑖𝑔𝑚𝑎 𝜶 24
Deep learning Models
“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]
• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]
LSTM-RNN CRF • Using gate matrix (LSTM or GRU)
𝑊𝑡
Y
unfold 강대
B
주변
I
에
I
스타벅스
I
위치
I
LSTM-RNN
Viterbi or Beam search
𝑠𝑖𝑔𝑚𝑎 𝜶 25
Deep learning Models
“강대주변에스타벅스위치가어디야?”• 𝑌𝑡𝑒𝑥𝑡 [강대주변에스타벅스위치], [어디]
• 𝑌𝑡𝑎𝑔𝑠 [ B I I I I ], [ B ]
Bidirectional LSTM-RNN CRF (Bi-LSTM-RNN CRF)• Using gate matrix (LSTM or GRU)
Viterbi or Beam search
강대
B
주변
I
에
I
스타벅스
I
위치
I
forward
backward
𝑠𝑖𝑔𝑚𝑎 𝜶 26
Deep learning Models
Sequence-to-sequence model
Two different LSTM: Input/output sentence LSTM
Using the Shallow LSTM
Reverse input sentence
Training: Decoding & Rescoring
𝑠𝑖𝑔𝑚𝑎 𝜶 27
Deep learning Models
Encoder-Decoder Architecture
𝑠𝑖𝑔𝑚𝑎 𝜶 28
Pointer Networks
• Seq2seq와 attention mechanism 을기반으로한딥러닝모델
• 입력열의위치(인덱스)를출력열로하는모델
• X = {A:0, B:1, C:2, D:3, <EOS>:4}
• Y = {3, 2, 0, 4}
A B C D <EOS> D C A <EOS>
Encoding Decoding
Deep learning Models
𝑠𝑖𝑔𝑚𝑎 𝜶 29
Deep learning Models
Siamese Neural Network
𝑠𝑖𝑔𝑚𝑎 𝜶 30
References
Jung, DEEP LEARNING FOR KOREAN NLP
Lee, 딥러닝을이용한한국어의존구문분석
Park, Point networks for Coreference Resolution
Park, Bi-LSTM-RNN CRF for Mention Detection
𝑠𝑖𝑔𝑚𝑎 𝜶 31
QA
감사합니다.
박천음, 최수길, 박찬민, 최재혁, 홍다솔
𝑠𝑖𝑔𝑚𝑎 𝜶 , 강원대학교
Email: parkce3@gmail.ac.kr
Recommended