1
Learning to Teach with Dynamic Loss Functions 1 Lijun Wu, 2 Fei Tian, 2 Yingce Xia, 3 Yang Fan, 2 Tao Qin, 1 Jianhuang Lai and 2 Tie-Yan Liu 1 Sun Yat-sen University 2 Microsoft Research 3 University of Science and Technology of China Contact: [email protected] [email protected] Contact: [email protected] [email protected] 1. Machine Learning Fixed data order Fixed model space Fixed loss function 2. Loss Function Teaching Goal: Automatically discover the optimal loss functions for student model training. 3. Teaching Requirement Requirements of Loss Function Teaching Adaptive Dynamic ( ,), with as its coefficient = (− log + ) = {, } Teacher Model: = Adaptive = ( ) Dynamic Reward: dev measure 4. Challenge & Algorithm Gradient-based Optimization for Teacher Model Algorithm/Structure 5. Experiments Image Classification Task Neural Machine Translation Task Student model: :→ , = σ , ∈ ( ,) , : measure Teacher model: max ( , )

Contact: Learning to Teach with Dynamic Loss Functions ... · Learning to Teach with Dynamic Loss Functions 1Lijun Wu, 2Fei Tian, 2Yingce Xia, 3Yang Fan, 2Tao Qin, 1Jianhuang Lai

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Contact: Learning to Teach with Dynamic Loss Functions ... · Learning to Teach with Dynamic Loss Functions 1Lijun Wu, 2Fei Tian, 2Yingce Xia, 3Yang Fan, 2Tao Qin, 1Jianhuang Lai

Learning to Teach with Dynamic Loss Functions

1Lijun Wu, 2Fei Tian, 2Yingce Xia, 3Yang Fan, 2Tao Qin, 1Jianhuang Lai and 2Tie-Yan Liu

1 Sun Yat-sen University 2 Microsoft Research 3 University of Science and Technology of China

Contact:

[email protected]

[email protected]

Contact:

[email protected]

[email protected]

1. Machine Learning

• Fixed data order

• Fixed model space

• Fixed loss function

2. Loss Function Teaching

• Goal:

Automatically discover the

optimal loss functions for

student model training.

3. Teaching Requirement

• Requirements of Loss Function Teaching

Adaptive

Dynamic

𝐿𝜙(𝑓𝜔 𝑥 , 𝑦), with 𝜙 as its

coefficient

𝐿𝜙 = 𝜎(− log𝑇 𝑝 𝑥 W Ԧ𝑦 + 𝑏)

𝜙 = {𝑊, 𝑏}

Teacher

Model:

𝝓 = 𝝁𝜽

Adaptive

𝜙𝑡 = 𝜇𝜃(𝑠𝑡)

Dynamic

Reward: dev measure

4. Challenge & Algorithm

• Gradient-based Optimization for Teacher

Model

• Algorithm/Structure

5. Experiments

• Image Classification Task

• Neural Machine Translation Task

• Student model:

𝑓𝜔: 𝑥 → 𝑦

𝐿 𝑓𝜔, 𝐷𝑡𝑟𝑎𝑖𝑛 =σ 𝑥,𝑦 ∈𝐷𝑡𝑟𝑎𝑖𝑛

𝑙(𝑓𝜔 𝑥 , 𝑦)

𝑚 𝑓𝜔 𝑥 , 𝑦 : measure

• Teacher model:

𝑢𝜃

max𝜃

𝑚(𝑓𝜔, 𝐷𝑑𝑒𝑣)