Upload
fujitsu-global
View
168
Download
1
Embed Size (px)
Citation preview
0 copy Copyright 2017 FUJITSU
Fujitsu Forum2017
FujitsuForum
1 copy Copyright 2017 FUJITSU
Designing for intensity parallelism from analytics to AI
Ian Godfrey
Director of the Solutions Business for Fujitsu Systems Europe
Manju Annie Oommen
Global Product Marketing Manager Fujitsu
2 copy Copyright 2017 FUJITSU
Agenda
HPC Diversifies
1
Co-creating solutions
4
Q amp A
5
Similarities between HPC and Deep
Learning optimization
3
What Changed over the years
2
3 copy Copyright 2017 FUJITSU
HPC Diversifies Hunger for compute power
Increasing connected devices worldwide
Size of digital universeincreasing
Driving more applications
64Bn Devices
10 Zettabytes
1000s of apps
2016
28Bn Devices
180 Zettabytes
20K New apps
gt2020
10 times more data to be generated by 2025 Emergence of High Performance Data Analytics
Fraud and anomaly detectionIdentifying harmful potentially harmful patterns and causes using graphical semantic analysis or other high performance analytics techniques real time
MarketingPromote products or services using complex algorithms to discern potential customers demographics buyingpreferences and habits
Business intelligenceUses HPDA to identify opportunities to advance the market position and competitiveness of businesses by better understanding themselves their competitors and the evolving dynamics of the markets they participate in
Other Commercial HPDAAn example of such a high-potential workload is the use of HPDA to manage large IT infrastructures ranging from on premise data centers to public clouds and Internet-of-Things (IoT) Infrastructures- involves solving complex problems
Existing HPC usersbull Intelligence
community FSIbull Data-driven
scienceengineering (eg biology)
bull Knowledge discovery
bull MLDL cognitive AI
New commercial users
bull Fraudanomaly detection
bull Business intelligence
bull Affinity marketingbull Personalized
medicine
Fastest processingtransformationof large volume data
Real-time analysisto extract invisible insight from the data
Accelerated deep-learning technologyby GPU computation
HPDA to grow robustly to be a $54Bn market
Cust
om
er
be
ne
fits
2
3
1
Source Information from analysts and various tele communication firms
4 copy Copyright 2017 FUJITSU
Neural Networks are Old ndash What changed
Scale drives deep learning progress
Availability of
More Data
Faster ComputeHardware
Better Algorithm
Best results are obtained by training a large neural network orand by feeding in more data
RepetitiveTraining
His
tory
1943 First electrical model of neural network
1958 Perceptron
1986 Backpropogation
1990s Convolutional Networks (LeCun)
2006 Deep Belief Network (Hinton)
201314 Google buys Deep Mind
HPC speeding up Deep learning Research
5 copy Copyright 2017 FUJITSU
What does deep learning deal with
Deep Learning
Dee
p L
earn
ing
is t
he
mac
hin
ersquos
per
cep
tio
n o
f Imagesbull Facesbull Self driving
Soundbull Voice searchbull Music Genbull Translation
Textbull CRMbull Search +bull Ads
Time Seriesbull Health databull Sensorsbull Finance
ARTIFICIAL INTELLIGENCEA program that can sense reasons act and adapt
MACHINE LEARNINGAlgorithms whose performance improve when
exposed to more data over time
DEEP LEARNINGMulti-layered neural networks learn from
vast amounts of data
Unsupervised LearningSupervised Learning
Cluster Analysis Time Series Unstructured
Convolutional Neural Network(CNN)
Recurrent Neural Network(RNN)
RNN+ Long-short term Memory(LSTM)
Reinforcement Learning
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
1 copy Copyright 2017 FUJITSU
Designing for intensity parallelism from analytics to AI
Ian Godfrey
Director of the Solutions Business for Fujitsu Systems Europe
Manju Annie Oommen
Global Product Marketing Manager Fujitsu
2 copy Copyright 2017 FUJITSU
Agenda
HPC Diversifies
1
Co-creating solutions
4
Q amp A
5
Similarities between HPC and Deep
Learning optimization
3
What Changed over the years
2
3 copy Copyright 2017 FUJITSU
HPC Diversifies Hunger for compute power
Increasing connected devices worldwide
Size of digital universeincreasing
Driving more applications
64Bn Devices
10 Zettabytes
1000s of apps
2016
28Bn Devices
180 Zettabytes
20K New apps
gt2020
10 times more data to be generated by 2025 Emergence of High Performance Data Analytics
Fraud and anomaly detectionIdentifying harmful potentially harmful patterns and causes using graphical semantic analysis or other high performance analytics techniques real time
MarketingPromote products or services using complex algorithms to discern potential customers demographics buyingpreferences and habits
Business intelligenceUses HPDA to identify opportunities to advance the market position and competitiveness of businesses by better understanding themselves their competitors and the evolving dynamics of the markets they participate in
Other Commercial HPDAAn example of such a high-potential workload is the use of HPDA to manage large IT infrastructures ranging from on premise data centers to public clouds and Internet-of-Things (IoT) Infrastructures- involves solving complex problems
Existing HPC usersbull Intelligence
community FSIbull Data-driven
scienceengineering (eg biology)
bull Knowledge discovery
bull MLDL cognitive AI
New commercial users
bull Fraudanomaly detection
bull Business intelligence
bull Affinity marketingbull Personalized
medicine
Fastest processingtransformationof large volume data
Real-time analysisto extract invisible insight from the data
Accelerated deep-learning technologyby GPU computation
HPDA to grow robustly to be a $54Bn market
Cust
om
er
be
ne
fits
2
3
1
Source Information from analysts and various tele communication firms
4 copy Copyright 2017 FUJITSU
Neural Networks are Old ndash What changed
Scale drives deep learning progress
Availability of
More Data
Faster ComputeHardware
Better Algorithm
Best results are obtained by training a large neural network orand by feeding in more data
RepetitiveTraining
His
tory
1943 First electrical model of neural network
1958 Perceptron
1986 Backpropogation
1990s Convolutional Networks (LeCun)
2006 Deep Belief Network (Hinton)
201314 Google buys Deep Mind
HPC speeding up Deep learning Research
5 copy Copyright 2017 FUJITSU
What does deep learning deal with
Deep Learning
Dee
p L
earn
ing
is t
he
mac
hin
ersquos
per
cep
tio
n o
f Imagesbull Facesbull Self driving
Soundbull Voice searchbull Music Genbull Translation
Textbull CRMbull Search +bull Ads
Time Seriesbull Health databull Sensorsbull Finance
ARTIFICIAL INTELLIGENCEA program that can sense reasons act and adapt
MACHINE LEARNINGAlgorithms whose performance improve when
exposed to more data over time
DEEP LEARNINGMulti-layered neural networks learn from
vast amounts of data
Unsupervised LearningSupervised Learning
Cluster Analysis Time Series Unstructured
Convolutional Neural Network(CNN)
Recurrent Neural Network(RNN)
RNN+ Long-short term Memory(LSTM)
Reinforcement Learning
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
2 copy Copyright 2017 FUJITSU
Agenda
HPC Diversifies
1
Co-creating solutions
4
Q amp A
5
Similarities between HPC and Deep
Learning optimization
3
What Changed over the years
2
3 copy Copyright 2017 FUJITSU
HPC Diversifies Hunger for compute power
Increasing connected devices worldwide
Size of digital universeincreasing
Driving more applications
64Bn Devices
10 Zettabytes
1000s of apps
2016
28Bn Devices
180 Zettabytes
20K New apps
gt2020
10 times more data to be generated by 2025 Emergence of High Performance Data Analytics
Fraud and anomaly detectionIdentifying harmful potentially harmful patterns and causes using graphical semantic analysis or other high performance analytics techniques real time
MarketingPromote products or services using complex algorithms to discern potential customers demographics buyingpreferences and habits
Business intelligenceUses HPDA to identify opportunities to advance the market position and competitiveness of businesses by better understanding themselves their competitors and the evolving dynamics of the markets they participate in
Other Commercial HPDAAn example of such a high-potential workload is the use of HPDA to manage large IT infrastructures ranging from on premise data centers to public clouds and Internet-of-Things (IoT) Infrastructures- involves solving complex problems
Existing HPC usersbull Intelligence
community FSIbull Data-driven
scienceengineering (eg biology)
bull Knowledge discovery
bull MLDL cognitive AI
New commercial users
bull Fraudanomaly detection
bull Business intelligence
bull Affinity marketingbull Personalized
medicine
Fastest processingtransformationof large volume data
Real-time analysisto extract invisible insight from the data
Accelerated deep-learning technologyby GPU computation
HPDA to grow robustly to be a $54Bn market
Cust
om
er
be
ne
fits
2
3
1
Source Information from analysts and various tele communication firms
4 copy Copyright 2017 FUJITSU
Neural Networks are Old ndash What changed
Scale drives deep learning progress
Availability of
More Data
Faster ComputeHardware
Better Algorithm
Best results are obtained by training a large neural network orand by feeding in more data
RepetitiveTraining
His
tory
1943 First electrical model of neural network
1958 Perceptron
1986 Backpropogation
1990s Convolutional Networks (LeCun)
2006 Deep Belief Network (Hinton)
201314 Google buys Deep Mind
HPC speeding up Deep learning Research
5 copy Copyright 2017 FUJITSU
What does deep learning deal with
Deep Learning
Dee
p L
earn
ing
is t
he
mac
hin
ersquos
per
cep
tio
n o
f Imagesbull Facesbull Self driving
Soundbull Voice searchbull Music Genbull Translation
Textbull CRMbull Search +bull Ads
Time Seriesbull Health databull Sensorsbull Finance
ARTIFICIAL INTELLIGENCEA program that can sense reasons act and adapt
MACHINE LEARNINGAlgorithms whose performance improve when
exposed to more data over time
DEEP LEARNINGMulti-layered neural networks learn from
vast amounts of data
Unsupervised LearningSupervised Learning
Cluster Analysis Time Series Unstructured
Convolutional Neural Network(CNN)
Recurrent Neural Network(RNN)
RNN+ Long-short term Memory(LSTM)
Reinforcement Learning
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
3 copy Copyright 2017 FUJITSU
HPC Diversifies Hunger for compute power
Increasing connected devices worldwide
Size of digital universeincreasing
Driving more applications
64Bn Devices
10 Zettabytes
1000s of apps
2016
28Bn Devices
180 Zettabytes
20K New apps
gt2020
10 times more data to be generated by 2025 Emergence of High Performance Data Analytics
Fraud and anomaly detectionIdentifying harmful potentially harmful patterns and causes using graphical semantic analysis or other high performance analytics techniques real time
MarketingPromote products or services using complex algorithms to discern potential customers demographics buyingpreferences and habits
Business intelligenceUses HPDA to identify opportunities to advance the market position and competitiveness of businesses by better understanding themselves their competitors and the evolving dynamics of the markets they participate in
Other Commercial HPDAAn example of such a high-potential workload is the use of HPDA to manage large IT infrastructures ranging from on premise data centers to public clouds and Internet-of-Things (IoT) Infrastructures- involves solving complex problems
Existing HPC usersbull Intelligence
community FSIbull Data-driven
scienceengineering (eg biology)
bull Knowledge discovery
bull MLDL cognitive AI
New commercial users
bull Fraudanomaly detection
bull Business intelligence
bull Affinity marketingbull Personalized
medicine
Fastest processingtransformationof large volume data
Real-time analysisto extract invisible insight from the data
Accelerated deep-learning technologyby GPU computation
HPDA to grow robustly to be a $54Bn market
Cust
om
er
be
ne
fits
2
3
1
Source Information from analysts and various tele communication firms
4 copy Copyright 2017 FUJITSU
Neural Networks are Old ndash What changed
Scale drives deep learning progress
Availability of
More Data
Faster ComputeHardware
Better Algorithm
Best results are obtained by training a large neural network orand by feeding in more data
RepetitiveTraining
His
tory
1943 First electrical model of neural network
1958 Perceptron
1986 Backpropogation
1990s Convolutional Networks (LeCun)
2006 Deep Belief Network (Hinton)
201314 Google buys Deep Mind
HPC speeding up Deep learning Research
5 copy Copyright 2017 FUJITSU
What does deep learning deal with
Deep Learning
Dee
p L
earn
ing
is t
he
mac
hin
ersquos
per
cep
tio
n o
f Imagesbull Facesbull Self driving
Soundbull Voice searchbull Music Genbull Translation
Textbull CRMbull Search +bull Ads
Time Seriesbull Health databull Sensorsbull Finance
ARTIFICIAL INTELLIGENCEA program that can sense reasons act and adapt
MACHINE LEARNINGAlgorithms whose performance improve when
exposed to more data over time
DEEP LEARNINGMulti-layered neural networks learn from
vast amounts of data
Unsupervised LearningSupervised Learning
Cluster Analysis Time Series Unstructured
Convolutional Neural Network(CNN)
Recurrent Neural Network(RNN)
RNN+ Long-short term Memory(LSTM)
Reinforcement Learning
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
4 copy Copyright 2017 FUJITSU
Neural Networks are Old ndash What changed
Scale drives deep learning progress
Availability of
More Data
Faster ComputeHardware
Better Algorithm
Best results are obtained by training a large neural network orand by feeding in more data
RepetitiveTraining
His
tory
1943 First electrical model of neural network
1958 Perceptron
1986 Backpropogation
1990s Convolutional Networks (LeCun)
2006 Deep Belief Network (Hinton)
201314 Google buys Deep Mind
HPC speeding up Deep learning Research
5 copy Copyright 2017 FUJITSU
What does deep learning deal with
Deep Learning
Dee
p L
earn
ing
is t
he
mac
hin
ersquos
per
cep
tio
n o
f Imagesbull Facesbull Self driving
Soundbull Voice searchbull Music Genbull Translation
Textbull CRMbull Search +bull Ads
Time Seriesbull Health databull Sensorsbull Finance
ARTIFICIAL INTELLIGENCEA program that can sense reasons act and adapt
MACHINE LEARNINGAlgorithms whose performance improve when
exposed to more data over time
DEEP LEARNINGMulti-layered neural networks learn from
vast amounts of data
Unsupervised LearningSupervised Learning
Cluster Analysis Time Series Unstructured
Convolutional Neural Network(CNN)
Recurrent Neural Network(RNN)
RNN+ Long-short term Memory(LSTM)
Reinforcement Learning
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
5 copy Copyright 2017 FUJITSU
What does deep learning deal with
Deep Learning
Dee
p L
earn
ing
is t
he
mac
hin
ersquos
per
cep
tio
n o
f Imagesbull Facesbull Self driving
Soundbull Voice searchbull Music Genbull Translation
Textbull CRMbull Search +bull Ads
Time Seriesbull Health databull Sensorsbull Finance
ARTIFICIAL INTELLIGENCEA program that can sense reasons act and adapt
MACHINE LEARNINGAlgorithms whose performance improve when
exposed to more data over time
DEEP LEARNINGMulti-layered neural networks learn from
vast amounts of data
Unsupervised LearningSupervised Learning
Cluster Analysis Time Series Unstructured
Convolutional Neural Network(CNN)
Recurrent Neural Network(RNN)
RNN+ Long-short term Memory(LSTM)
Reinforcement Learning
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
6 copy Copyright 2017 FUJITSU
Industry segmentation and use cases
Healthcare
bull Pharmaceuticalbull Genomicsbull Imagery and medical
diagnostic
Marketing Automation
bull CRMbull Market Classificationbull Demand Predictionbull Document Generation
bull Enterprise Resource Planning
bull Predictive MaintenanceAnalysis
bull Machine transcriptionbull Machine translation
Defense and Social Security
bull Surveillance and Security
bull Cyber securitybull Image recognitionbull Motion detection
Consumere-commerceRetail
TransportLogistics
bull Autonomous carsbull Motion detectionbull Networked carCo-
ordinated trafficbull Commercial Dronesbull Optimized route
bull Sentiment Analysisbull Classificationbull Recommendation enginebull Demand predictionbull Automated consulting
bull Search bull Emailsbull Personalizationbull Smart Assistantbull Chatbots
Others
bull Educationbull Fintechbull Gamingbull Telcobull Media
Manufacturing Industrial
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
7 copy Copyright 2017 FUJITSU
Industry wide presence of Deep Learning
Social Infra4 Financial
9
Public Sector18
Distribution26
Manufacturing43
Sector wise
Call center28
Knowledge Utilization
20
Manufacturing16
Demand Prediction
13
Maintenance 8
Fintech9
Healthcare6
Application wise
Source Based on projects amp PoCs in Fujitsu
Artificial Intelligence is the new ElectricityhellipAndrew Ng
DL is not a vertical market It is more akin to an algorithm or method of computation like an FFT
Intersect360 Research tracks AI (including deep learning machine learning cognitive computing etc) as part of the hyper scale market
Similar to but distinct from HPC
Low precision intensely parallel strong affinity to public cloud
Cloud providers and end users are in early stages of investment for their applications
AI may become a pervasive technology that is embedded in non-hyperscale manifestations
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
8 copy Copyright 2017 FUJITSU
Fujitsu shaping HPC Diversification
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
9 copy Copyright 2017 FUJITSU
HPC the foundation to accelerating AI technology
ampFX100
for simulation andpre-processing technology
Zinrai Deep Learning amp DLUfor a high-speed learning environment
Digital Annealerfor combinatorial optimal solutions
Quantum
Computing
Deep
Learning
HPC
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
10 copy Copyright 2017 FUJITSU
Proximity in AI and HPC
HPC AIDL
HyperscaleSupercomputing
Multi-node
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
11 copy Copyright 2017 FUJITSU
Characterising Performance Computing
Computational scope Customer usage
Primary focus is performance
Compute-intensive algorithms
Maths solvers
Applications arbitrarily scalable
Is still ldquoHPCrdquo on only a few nodes ndash there is entry-level HPC
Largest supercomputers are gt$100 million
Problem-solving
Data Analysis
Scientific Simulation
Technical Modelling
Virtual Prototyping
Top tier users push boundaries and influence technology throughout industry
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
12 copy Copyright 2017 FUJITSU
Convolutional Neural Network Breakthrough
Krizhevsky A Sutskever I Hinton G Imagenet classification with deep convolutional neural networks In NIPS (2012)
Deeper Network
in Network
Deep DNN first blood
One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom The GPUs communicate only at certain layers The networkrsquos input is 150528-
dimensional and the number of neurons in the networkrsquos remaining layers is given by 253440ndash
186624ndash64896ndash64896ndash43264ndash4096ndash4096ndash1000
2014 2013 2012
Use of 2 GPUs ndash data parallelism
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
13 copy Copyright 2017 FUJITSU
Neural Network starting point
119860119888119905 119871 119895 = 120590 119860119888119905 119871 minus 1 119894 119909 119882 119871 119894 119895 + 119861119894119886119904 119871 [119895]
119860119888119905 119871 minus 1 1
119860119888119905 119871 minus 1 2
119860119888119905 119871 minus 1 3
119882 119871 1][1
119882 119871 3][1
119882 119871 2][1 120590
Activation function
eg tanh ReLu
Weight
Feed-forward network
3 neurons 1 hidden layer
Fundamental multiply-add structure
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
14 copy Copyright 2017 FUJITSU
Vectorisation in Linear Algebra
Core intensive code in Linpack benchmark
do 30 j = kp1 n
t = a(lj)
if (l eq k) go to 20
a(lj) = a(kj)
a(kj) = t
20 continue
call daxpy(n-kta(k+1k)1a(k+1j)1)
30 continue
do 40 kb = 1 n
k = n + 1 - kb
b(k) = b(k)a(kk)
t = -b(k)
call daxpy(k-1ta(1k)1b(1)1)
40 continue
do 10 i = 1n
dy(iy) = dy(iy) + dadx(ix)
ix = ix + incx
iy = iy + incy
10 continue
Fujitsu K computer
Source httpswwwtop500orglists201706
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
15 copy Copyright 2017 FUJITSU
Network Illustration
Source Nervana
119882119894rarr119895 784 times 100
119887119895 100
119882119894rarr119895 100 times 10
119887119895 10
Total
parameters119888(119900119906119905119901119906119905 119905119903119906119905ℎ)
Cost function
N = 10 output units
(one for each digit)
Each unit i encodes the
probability of the input image
of being of the digit iN = 100 hidden units
(user-defined parameter)
N = 28 x 28 pixels
= 784 input units
Fully connected network
convolution not present for now
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
16 copy Copyright 2017 FUJITSU
CNN Computing Operations
Dense Matrix Multiplies
Recurrent Layers
Convolutions All-Reduce
Deep Learning ingredients
1 Randomly seed weights
2 Forward-pass
3 Cost
4 Backward-pass
5 Update weights
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
17 copy Copyright 2017 FUJITSU
Parallelisation Hierarchy
Vectorisation ndash Is SIMD parallelism used well
Scalar tuning ndash What happens in the pipeline
Memory ndash Is cache usage maximised or RAM access streamlined
Threading ndash do cores cooperation efficiently
Communication ndash can coordination in a distributed or
heterogeneous system be improved
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
18 copy Copyright 2017 FUJITSU
Naiumlve Nested Loops in CNN Algorithms
Forward Propagation
Backward Propagation Convolution
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
19 copy Copyright 2017 FUJITSU
A short word on Tensors
Tensors are systems of components organized by one or more indices that transform according to specific rules under a set of transformations
The number of indices is called the rank of the tensor
Tensor rank 0 is a scalar
Tensor rank 1 is a vector
Tensors are important in many areas of physics (general relativity electromagnetic theory)
In N-dimensional space a tensor of rank n has Nn components
Transformation rules are independent of choice of reference frame ndash ideal for expressing universal physical laws
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
20 copy Copyright 2017 FUJITSU
Optimised Functions
Software Libraries
Tensor functions hand-coded for CPUs or GPUs
Intel MKL-DNN
Emergence of dedicated processing units and ISAs
Tensor Arithmetic in hardware
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
21 copy Copyright 2017 FUJITSU
Multi-threading CNN Training
1 thread
4 threads
16 threads
64 threads
Training on CIFAR-10 with Intel-Caffe 1000 iterations Full Solver
Dataset consists of 60000 32x32 colour images
in 10 classes with 6000 images per class ndash
50000 training images and 10000 test images
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
22 copy Copyright 2017 FUJITSU
MPI Parallelism in CFD
Global model decomposed into
8 balanced MPI domainsHalo at interface
between domains
Communicate between processes with
MPI primitivesMPI_Send MPI_Recv MPI_Wait
MPI_AllToAll MPI_AllReduce MPI_BarrierDomain surfaces
adapted to cell
weights
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
23 copy Copyright 2017 FUJITSU
MPI in Deep Learning
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
24 copy Copyright 2017 FUJITSU
MPI Parallel Performance
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
25 copy Copyright 2017 FUJITSU
AI evolution driving CPU and GPU releases
Performance
Intelreg Xeon Phitrade Processor
Knights Mill
Intelreg Xeon Processor
Skylake
Lake
Crest
Intelreg Xeonreg Processor + FPGA
Intelreg Lake Crest Deep neural network processor
Da
tace
nte
rEd
ge
Clo
ud
Da
tace
nte
r
Infe
ren
ceTr
ain
ing
Intelreg Nervana
NVIDIA Tesla P4P40
NVIDIA Drive PX
Google TPU
NVIDIA Pascal 100
FPGA SOC(IntelXilinx)
FUJITSU
PRIMERGY CX600
K Computer
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
26 copy Copyright 2017 FUJITSU
Fujitsu Gateway ndashIntelligent Application Platform
Cloud Services
Cloud bursting ndash
Gateway
On premise cloud ndash
UNCAIArtificial Intelligence
Smart City Surveillance
Manufacturing process
optimisation
HPC for Data Analytics
Based on PRIMERGY
with Parallel File
System
Reference Architecture
Products and Solutions
CELSIUS
Intel amp Mellanox
Cluster InterconnectNVDIA GPGPU
PRIMERGY
RX2540 M4
SKL based
copy FUJITSU LIMITED 201726
PRIMEFLEX for HPC
Solutions
ProductsCX600 M1
KNL KNM
based
Entry ETERNUS
storage Cloud
PRIMERGY
RX2530 M4
SKL based
High-end ETERNUS
storage
NetApp storageDDN storage
Workgroup Data CenterDepartmental
Liquid Cooling
+ immersion cooling
FY2018
CX400 M4
SKL based
CX2550
M4
HPC
CX2570
M4
GPU
CX2580
M4
FPGA
Engineering Cloud
Industry 40
MONOZUKURI
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
27 copy Copyright 2017 FUJITSU
New PRIMEFLEX Options
Reference designs defined for AI Deep Learning frameworks
PRIMEFLEX configuration tool provided for
fast definition of a complete solution
PRIMEFLEX for HPC Integrated Solutions incorporate Fujitsu Intelligent Application Platform as the application platform within the software stack
Ref arch for off-premise
Cloud-bursting capability
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
28 copy Copyright 2017 FUJITSU
DLHPC trends
DL opportunity represents 6-7 of Hyperscale Market
Speculative figure likely 100 yy growth
DL is not a vertical market
It is more akin to an algorithm or method of computation like an FFT
AIDL exists in proximity to HPC
Driven by same architectural objective ndash performance and scale
Converged math and programming methodologies
Technological cross-fertilization
bull Software compilers libraries tools
bull Hardware processors memory interconnect
Source Intersect360 Research 2016
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
29 copy Copyright 2017 FUJITSU
Summary
Combine algorithmic expertise on HPC and MLFujitsu has the rare capability to combine technologies amp provide fully optimized solution
Shape of a network is subject to skilled programming Optimise through algorithmic and modelling discoveries Relationship between depth and result quality remains largely empirical
AI usage is primarily on Cloud todayCustomer looks for simplified integrated solutions
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
30 copy Copyright 2017 FUJITSU
Fujitsu Sans Light ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacutethorn
yumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucircuumlyacute
thornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
Fujitsu Sans Medium ndash abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789 notrdquopound$^amp()_+-=[]rsquo~ltgt| copyuml~iexclcentcurrenyenbrvbarsectumlordflaquoraquonot-
regmacrdegplusmnsup2sup3microparamiddotcedilsup1ordmfrac14frac12frac34iquestAgraveAacuteAcircAtildeAumlAringCcedilEgraveAEligEacuteEcircEumlIgraveIacuteIcircIumlETHNtildeOgraveOacuteOcircOtildeOumltimesOslashUgraveUacuteUcircUumlYacuteTHORNszligagraveaacuteacircatildeaumlaringaeligccedilegraveeacuteecirceumligraveiacuteicirciumlethntildeograveoacuteocircotildeoumldivideoslashugraveuacuteucirc
uumlyacutethornyumlĐıŒœŠšŸŽžƒʼˆˇˉ˙˚˛˜˝-‒ndashmdash―lsquorsquosbquoldquordquobdquodaggerDaggerbullhellippermillsaquorsaquoolinefrasl⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉eurotradeΩrarrpart∆prodsumminusradicinfinintasympnelegesdotlozfifl
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
31 copy Copyright 2017 FUJITSU
Deep Learning Networks
Image Identity
BACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK
32 copy Copyright 2017 FUJITSU
Unsupervised Learning
Genome Market Segmentation Fraud Detection
Astronomical data analysisGoogle NewsBACK