30
Reciprocity between HPC and Deep Learning Computer Science Department North Carolina State University Xipeng Shen

Reciprocity between HPC and Deep Learning

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Reciprocity between HPC and Deep Learning

ComputerScienceDepartmentNorthCarolinaStateUniversity

XipengShen

AboutMe

2

• Programming Systems and Machine Learning • Making computing more intelligent and efficient

Graduated

3

– 8 Ph.Ds – 4 assistant professors

– Rutgers University (Zheng Zhang 2012) – UC Santa Barbara (Yufei Ding 2017) – UC Riverside (Zhijia Zhao 2015) – Colorado School of Mines (Bo Wu 2014)

– 4 industry – Microsoft, Google, IBM, Qualcomm

– 3 recent masters – Google, Amazon, Qualcomm

CurrentSponsors

4

5

Programming systems and  Intelligent Computing for future

This Talk

6

DeepLearningHPC

Reciprocity

Generalized Strength Reduction for Enabling Algorithmic Optimizations

7

ICDM’17

DeepLearning HPC

PLDI’17 VLDB’2015 ICML’15

Motivation

8

Computing efficiencyis a key.

sources: IBM

9

Strength Reduction

b/2à b>>1Traditional: only instruction level.

• A basic compiler concept

Our Goal: Generalize it with Triangle Inequality

Triangle Inequality

10

ORNL Raleigh

Atlanta

b

a

d

|a-b|≤d≤a+b

Example

11

a

d b

TriangularInequality:a-b≤d≤a+b

C1

C2X

K-Means

#Distances: O(K) V.S. O(N * K)

C’2

12

• Connection: – Replacing expensive distance computations with cheaper bounds

for comparisons.

• Challenges: – Minimize costs of TI usage while avoiding calculations. – Extend to other algorithms.

TI Optimization V.S. Strength Reduction

VLDB’2015 ICML’15

13

Baseline:ClassicK-means

(16GB,8-coreIntelIvyBridge)Speedu

p(X)

K-Means(K=1024)

TOP Yinyang K-Means

CodelinkinICML’15paper.

Clusteringresultsaresameasoriginalmethod’s.

14

Speedu

p(X)

Baseline:ClassicK-means(16GB,8-core)

XX

TOP

OnK-Means

Yinyang K-Means

15

Speedups(X) by manual version0 1 102 104

Spee

dups

(X) b

y TO

P ve

rsio

n

1

102

104

KnnKnnjoinKmeansICPNbodyP2PReference line

In manual version0 106 1013

In T

OP

vers

ion

106

1013

KnnKnnjoinKmeansICPNbodyP2PReference line

Average speedups: 50X vs 20X. Save at least 93% calculations.

Speedups #distancecalculations

ManuallyOptimized ManuallyOptimizedTOPOptim

ized

TOPOptim

ized

Intel i5-4570 CPU and 8G memory

16

Angular Triangle Inequality (ATI)

Angles: = 120。 = 10。

Holds in spaces of any (>1) dimensions!ATI always gives tighter bounds than TI does!

(detailedproofinPLDI’17paper).

17

18

0

0.75

1.5

2.25

3

Mnist f-Mnist Cal01 Newsgroup Micro-norb

GTX980 TitanX

Speedup(X)onTraining(Baseline:CuBlas)

Datasets

19

Speedup(X)onInferenceonTablet(Nexus7NVIDIATegra3T30L)

DNN for Sparse Format Selection

20

PACT’17

DeepLearning HPC

Problem

21

SpMV:coreofmanyHPCapplications.

CSR

COO

DIA

ELL

HYB

SparseMatrix Storageformats

Multi-foldperformancedifferences.

Noonefitsall.

Basic Idea

22

Treatmatrixasanimage,useimagerecognitionmethodsforselection.

cat

dog

CNN

CSR

COO

DIA

ELL

HYB

?CNN

Special Challenge I

23

• Input representation: fixed size required. • Image scaling does not work well

Solution:Binarymatrix,densitymatrix,histogrammatrix.

Special Challenge II

24

• DNN structure design • Early Merging versus Late Merging

Special Challenge III

25

• Architecture Sensitivity • Best formats differ on different machines for a matrix

TransferLearning

ComparedtoPriorDT-basedMethod

26

PLDI’13Accuracy:85%boostedto93%.(9200matrices;Formats:coo,csr,dia,ell)

BenefitsfromTransferLearning

27

FromXeonE5-4603toRadeonA8-7600

OtherWork:Egeria[SC’17]

28

Advising Sentence

Recognition

HPC Documents

Relevance Calculation

Collection of advising sentences

Relevant advising

sentences

Query

user

AsynthesizerofHPCadvisingtools.

OtherUndergoingEfforts

• Emergingmemorytechnology• w/Intel;NSFaward.[Micro’17]

• GPUprogramoptimizations• GoogleFacultyAward;DOECareerAward,LLNLaward.[ASPLOS’11,Micro’14,Micro’15,Micro’17]

• SpeedingupDNNensembletraining• ORNLaward.[Micro’17]

• Machinelearningfasthyperparametertuning• ORNLaward.

29

ToLearnMore

30

https://research.csc.ncsu.edu/picture/

Orgoogle“XipengShen”

Programming systems and  Intelligent Computing for future