Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Reciprocity between HPC and Deep Learning
ComputerScienceDepartmentNorthCarolinaStateUniversity
XipengShen
AboutMe
2
• Programming Systems and Machine Learning • Making computing more intelligent and efficient
Graduated
3
– 8 Ph.Ds – 4 assistant professors
– Rutgers University (Zheng Zhang 2012) – UC Santa Barbara (Yufei Ding 2017) – UC Riverside (Zhijia Zhao 2015) – Colorado School of Mines (Bo Wu 2014)
– 4 industry – Microsoft, Google, IBM, Qualcomm
– 3 recent masters – Google, Amazon, Qualcomm
Generalized Strength Reduction for Enabling Algorithmic Optimizations
7
ICDM’17
DeepLearning HPC
PLDI’17 VLDB’2015 ICML’15
9
Strength Reduction
b/2à b>>1Traditional: only instruction level.
• A basic compiler concept
Our Goal: Generalize it with Triangle Inequality
12
• Connection: – Replacing expensive distance computations with cheaper bounds
for comparisons.
• Challenges: – Minimize costs of TI usage while avoiding calculations. – Extend to other algorithms.
TI Optimization V.S. Strength Reduction
VLDB’2015 ICML’15
13
Baseline:ClassicK-means
(16GB,8-coreIntelIvyBridge)Speedu
p(X)
K-Means(K=1024)
TOP Yinyang K-Means
CodelinkinICML’15paper.
Clusteringresultsaresameasoriginalmethod’s.
15
Speedups(X) by manual version0 1 102 104
Spee
dups
(X) b
y TO
P ve
rsio
n
1
102
104
KnnKnnjoinKmeansICPNbodyP2PReference line
In manual version0 106 1013
In T
OP
vers
ion
106
1013
KnnKnnjoinKmeansICPNbodyP2PReference line
Average speedups: 50X vs 20X. Save at least 93% calculations.
Speedups #distancecalculations
ManuallyOptimized ManuallyOptimizedTOPOptim
ized
TOPOptim
ized
Intel i5-4570 CPU and 8G memory
16
Angular Triangle Inequality (ATI)
Angles: = 120。 = 10。
Holds in spaces of any (>1) dimensions!ATI always gives tighter bounds than TI does!
(detailedproofinPLDI’17paper).
18
0
0.75
1.5
2.25
3
Mnist f-Mnist Cal01 Newsgroup Micro-norb
GTX980 TitanX
Speedup(X)onTraining(Baseline:CuBlas)
Datasets
Problem
21
SpMV:coreofmanyHPCapplications.
CSR
COO
DIA
ELL
HYB
…
SparseMatrix Storageformats
Multi-foldperformancedifferences.
Noonefitsall.
Basic Idea
22
Treatmatrixasanimage,useimagerecognitionmethodsforselection.
cat
dog
CNN
CSR
COO
DIA
ELL
HYB
…
?CNN
Special Challenge I
23
• Input representation: fixed size required. • Image scaling does not work well
Solution:Binarymatrix,densitymatrix,histogrammatrix.
Special Challenge III
25
• Architecture Sensitivity • Best formats differ on different machines for a matrix
TransferLearning
ComparedtoPriorDT-basedMethod
26
PLDI’13Accuracy:85%boostedto93%.(9200matrices;Formats:coo,csr,dia,ell)
OtherWork:Egeria[SC’17]
28
Advising Sentence
Recognition
HPC Documents
Relevance Calculation
Collection of advising sentences
Relevant advising
sentences
Query
user
AsynthesizerofHPCadvisingtools.
OtherUndergoingEfforts
• Emergingmemorytechnology• w/Intel;NSFaward.[Micro’17]
• GPUprogramoptimizations• GoogleFacultyAward;DOECareerAward,LLNLaward.[ASPLOS’11,Micro’14,Micro’15,Micro’17]
• SpeedingupDNNensembletraining• ORNLaward.[Micro’17]
• Machinelearningfasthyperparametertuning• ORNLaward.
29