How many cores will we need?Chien-ping lu, phd
Sr. director, Mediatek inc
| how many cores will we need? | December 4, 2013 | Confidential2
a group of hippos is called …
A Crash
| how many cores will we need? | December 4, 2013 | Confidential3
a group of crows is called …
A Murder
| how many cores will we need? | December 4, 2013 | Confidential4
a group of giraffes is called …
A Tower
From Wikipedia
| how many cores will we need? | December 4, 2013 | Confidential5
So, it is not surprising that we use
“A Parade” of elephants “An Army” of ants“A Herd” of sheep
| how many cores will we need? | December 4, 2013 | Confidential6
From frequency to MULTIcore scaling
performance
Time Power wall: 2005
Parallel ComputingSerial Computing
Power
Power
Frequency
| how many cores will we need? | December 4, 2013 | Confidential7
How many cores will we need?
Performance
Time
Moderate Massive
| how many cores will we need? | December 4, 2013 | Confidential8
Performance
Time
2x 4x 3x
8x 4x 16x 4x
Dark silicon (OR DARK CORES)?
| how many cores will we need? | December 4, 2013 | Confidential9
Light up the cores
power
Degree of Parallelism (number of cores)
Power ceiling
GPU-style “cores”
Parallelism wall
Little cores
Big cores
Redefine the cores to be heterogeneousRedefine the cores to be heterogeneous
Body tracking Ray tracing
Amdahl’s law
Dark Silicon:A concern on power
Dark Silicon:A concern on power
An argument against parallel computing
An argument against parallel computing
| how many cores will we need? | December 4, 2013 | Confidential10
Front End
Front End
Front End
Front End
Front End
Front End
ALU
ALU
ALU
ALU
ALU
ALU
The elephants: CPU coresFor multiple-instruction-multiple-DATA (MIMD) execution
A CPU core runs 1 iteration of the parallel loopThe same color means the same piece of code
Front End
Front End
Front End
Front End
Front End
Front End
ALU
ALU
ALU
ALU
ALU
ALU
Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloadsParallel.For (…)
…
…
…
…Else
| how many cores will we need? | December 4, 2013 | Confidential11
Front End
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Front End
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
Front End
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
army of ants: simt coresFor SIMT (single-instruction-multiple-thread ) Execution
A branch is emulated thru divergence
SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency
A cluster of SIMT cores shares one front end in a SIMD manner
Parallel.For (…)
…
…
…
…Else
A SIMT core runs 1 iteration of the parallel loop
SFU 0
SFU 1
Can achieve better power efficiency with more specialized function units given the right workload
| how many cores will we need? | December 4, 2013 | Confidential12
Properties of massively data-parallel workloads
• Problem size N of the parallel workload can keep growing
• Visible serial workload s can be kept constant
• Communication overhead is proportional to log P (by a factor of r)
• Parallel workload is speeded up linearly by P, the number of cores
• "Embarrassingly" parallel, when there is no communication overhead (r=0)
N/PN/Pr log Pr log P
NN
ss
ss
Time saved by P cores
| how many cores will we need? | December 4, 2013 | Confidential13
1log +++=Prs
PsSpeedup
PNPrs
NsSpeedup
/log +++=
Revisiting Amdahl's law for trend prediction
| how many cores will we need? | December 4, 2013 | Confidential14
Mediatek face beautificationWhen it comes to beauty, there seems to be no limit
BeforeSkin tone adjustmentWrinkle removal Thinner face, bigger eyes
| how many cores will we need? | December 4, 2013 | Confidential15
graphics keeps moving
Pac-man, 1980
GL benchmark 2.1 Egypt, 2011
GL benchmark 2.5 Egypt, 2012
GFX bench 2.7 T-Rex, 2013
GFX bench 3.0 Manhattan, 2013
Mobile 3D Graphics
Recognized by 94% of American Consumers
Highest grossing video game of all-time
| how many cores will we need? | December 4, 2013 | Confidential16
HPC from 1993 to 2012‒GFLOPS ~ 130,000x‒Cores ~ 11,000x‒GHz ~ 10x
High-performance computing (HPC) keeps scaling out
Higher grid resolution
More time steps
More atoms
| how many cores will we need? | December 4, 2013 | Confidential17
More coresMore cores Higher Frequency Higher Frequency
parallel killer apps are just around the corner
Moore’s lawMoore’s law
Bigger problemsBigger problems
DataDataBetter user experience
Better user experience
More complex software
More complex software
What bigger problems to solve with bigger data?
How solving bigger problems leads to better user experience?
Mining bigger data with Machine
Learning
Mining bigger data with Machine
Learning
completing the positive feedback loop
Bigger data-parallel workloads in Graphics
and HPC
Bigger data-parallel workloads in Graphics
and HPC
| how many cores will we need? | December 4, 2013 | Confidential18
How to distinguish cat photos from dog ones?
ASIRRAAnimal Species Image Recognition for Restricting Access (from Microsoft Research)
| how many cores will we need? | December 4, 2013 | Confidential19
Why is it hard?
Source: training set of Kaggle.com Dogs vs. Cats competition
| how many cores will we need? | December 4, 2013 | Confidential20
is there a solution to relate photos from the same dog?
Prancer, a 5-years-old toy poodle, before and after grooming
| how many cores will we need? | December 4, 2013 | Confidential21
MINE the solutions from the data
Dog-Cat
classifierD
og-Cat classifier
Theory of the differences between dogs and cats?
Theory of the differences between dogs and cats?
Learn from many (12,500) photos labeled as dogs or cats
Learn from many (12,500) photos labeled as dogs or cats
Machine LearningMachine Learning
| how many cores will we need? | December 4, 2013 | Confidential22
machine learning: prediction with powerful models
More powerful have more knobs, which need to be determined with a bigger data set
The explosive growth of data has made very powerful models feasible
6th-order polynomial over-fits the 4 samples
| how many cores will we need? | December 4, 2013 | Confidential23
From data to user experience
),( nn yx
{ }ia
x y
Knobs
Web-scale Data
Machine Learning
Determine to minimize the error between
nyand
{ }ia
nx { }iaModel
f
dog/cat photos dog or catSensor readings jogging, walking or climbingDepth images body motion
Bigger data lead to more powerful models
Bigger data lead to more powerful models
Examples:
x { }iaModel
fClient
Cloud
Powerful models with more knobs lead to better user experience
Powerful models with more knobs lead to better user experience
| how many cores will we need? | December 4, 2013 | Confidential24
Smarter ClientSmarter ClientClientClient
SensingSensingBetter SensingBetter Sensing
ConnectivityConnectivityBetter
ConnectivityBetter
ConnectivityCloud
User Experience
User ExperiencePowerful ModelPowerful ModelData MiningData Mining Better User Experience
Better User Experience
Bigger Data Mining
Bigger Data Mining
More powerfulModel
More powerfulModel
Smart clients in the era of data
Big Training SetBig Training Set
Inputdata
Inputdata
Bigger Training Set
Bigger Training Set
In the cloud or the clients
Local Machine Learning
Local Machine Learning
| how many cores will we need? | December 4, 2013 | Confidential25
The future is here‒There are already massively parallel
heterogeneous processors
There is no shame in being data-parallel‒One of the smartest things achieved
in computing is data parallel
Looking forward
Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning
Carbon footprint of US datacenters is at the same level as the airline industry
Go parallel and go heterogeneous to keep Mobile device cool in our palms Data centers clean for our
environment
| how many cores will we need? | December 4, 2013 | Confidential26
Disclaimer & Attribution
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.