19
Layer-wise Asynchronous Training of Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams

Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Layer-wise Asynchronous Training of Neural Network with Synthetic Gradient

Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams

Page 2: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

CONTENTS

BACKGROUND

INTRODUCTION

SYNCHRONOUS TRAINING

INSIGHTS

ASYNCHRONOUS TRAINING

Page 3: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Part OneBACKGROUND01

Page 4: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

1BACKGROUND

Page 5: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Asynchronous SGD

Downpour SGD

Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in neural information processing systems. 2012.

Page 6: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Part TwoINTRODUCTION02

Page 7: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

2-1OVERVIEW

Jaderberg, Max, et al. "Decoupled neural interfaces using synthetic gradients." arXiv preprint arXiv:1608.05343 (2016).

Page 8: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

2-2

Page 9: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Part ThreeSYNCHRONOUS TRAINING03

Page 10: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

3-1

𝐿ℳ =% 𝛿' − 𝛿)' **

'

𝐿ℐ = % ℎ' − ℎ.' **

'

Minimizing the synthetic input/gradient simultaneously with the general loss

𝐿 𝑤0, 𝑏0,⋯ ,𝑤', 𝑏',⋯ ,𝑤4, 𝑏4 = 𝐿(𝑊,𝐵)

Page 11: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Part FourASYNCHRONOUS TRAINING04

Page 12: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

4-1ONEPOSSIBLEARCHITECTURE

Page 13: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

4-1 INFRASTRUCTURE - BASIC

Page 14: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

4-2INFRASTRUCTURE- LIGHT

Page 15: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

4-2SLAVE

Page 16: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

4-3MASTER

Page 17: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Part FiveINSIGHTS05

Page 18: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

5-1

• Synthetic gradient and synthetic input as a new alternative of batch normalization / Dropput

• M and I auxiliary network introduces noises in the input/gradient of each layer

• No exact update is required!

• The model will learn how to reduce the variance within each batch

• while keeping the flavor of that specific batch.

Page 19: Layer-wise Asynchronous Training of Neural …wcohen/10-605/2016-project-presentations/...Neural Network with Synthetic Gradient Xupeng Tong, Hao Wang, Ning Dong, Griffin Adams CONTENTS

Thank You