Upload
beatrix-skinner
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
C++ Accelerated Massive Parallelism in Visual C++ 2012Kate GregoryGregory Consultingwww.gregcons.com/kateblog, @gregcons
DEV334
C++ is the language for performance
If you need speed at runtime, you use C++Frameworks and libraries can make your code faster
Eg PPL: use all the CPU coresWith little or no change to your logic and code
Experienced C++ developers are productive in C++Don’t want to go back to C or C-like languageEnjoy the tool support in Visual Studio
Many C++ developers value portabilityWrite standard C++, compile with anythingUse portable libraries, run anywhereEven in all-Microsoft universe, simple deployment is important
demo
Cartoonizer
What is C++ AMP?Accelerated Massive Parallelism
Run your calculations on one or more acceleratorsToday, GPU is the accelerator you useEventually: other kinds of accelerators
Write your whole application in C++Not a “C-like” language or a separate resource you link inUse Visual Studio and familiar toolsSpeed up 20x, 50x, or more
Basically a libraryComes with Visual Studio 2012, included in vcredistSpec is open – other platforms/compilers can implement it too
Agenda
Hardware ReviewC++ AMP FundamentalsTilingDebugging and VisualizingCall to Action
CPUs vs GPUs today
CPU
Low memory bandwidthHigher power consumptionMedium level of parallelismDeep execution pipelinesRandom accessesSupports general codeMainstream programming
GPU
High memory bandwidthLower power consumptionHigh level of parallelismShallow execution pipelinesSequential accessesSupports data-parallel codeNiche programming
images source: AMD
C++ AMP is fundamentally a library
Comes with Visual C++ 2012#include <amp.h>Namespace: concurrencyNew classes:
array, array_viewextent, indexaccelerator, accelerator_view
New function(s): parallel_for_each()New (use of) keyword: restrict
Asks compiler to check your code is ok for GPU (DirectX)
parallel_for_each
Entry point to the libraryTakes number (and shape) of threads neededTakes function or lambda to be done by each thread
Must be restrict(amp) Sends the work to the accelerator
Scheduling etc handled thereReturns – no blocking/waiting
void AddArrays(int n, int * pA, int * pB, int * pSum){
for (int i=0; i<n; i++)
{ pSum[i] = pA[i] + pB[i]; }
}
#include <amp.h>using namespace concurrency;
void AddArrays(int n, int * pA, int * pB, int * pSum){ array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pSum); parallel_for_each( sum.extent, [=](index<1> i) restrict(amp) { sum[i] = a[i] + b[i]; } );}
Hello World: Array Addition
void AddArrays(int n, int * pA, int * pB, int * pSum){
for (int i=0; i<n; i++)
{ pSum[i] = pA[i] + pB[i]; }
}
Basic Elements of C++ AMP codingvoid AddArrays(int n, int * pA, int * pB, int * pSum){ array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pSum); parallel_for_each(
sum.extent, [=](index<1> i) restrict(amp) { sum[i] = a[i] + b[i];
} );}
array_view variables captured and associated data copied to accelerator (on demand)
restrict(amp): tells the compiler to check that this code conforms to C++ AMP language restrictions
parallel_for_each: execute the lambda on the accelerator once per thread
extent: the number and shape of threads to execute the lambda
index: the thread ID that is running the lambda, used to index into data
array_view: wraps the data to operate on the accelerator
extent<N> - size of an N-dim space
index<N> - an N-dimensional point
View on existing data on the CPU or GPUDense in least significant dimensionOf element T and rank NRequires extentRectangularAccess anywhere (implicit sync)
vector<int> v(10);
extent<2> e(2,5); array_view<int,2> a(e, v);
array_view<T,N>
//above two lines can also be written//array_view<int,2> a(2,5,v);
index<2> i(1,3);
int o = a[i]; // or a[i] = 16;//or int o = a(1, 3);
demo
Matrix Multiplication
Matrix Multiplication
C00 = A00 * B00 + A01 * B10 + A02 * B20 + A03 * B30
restrict(amp) restrictions
Can only call other restrict(amp) functionsAll functions must be inlinableOnly amp-supported types
int, unsigned int, float, double, boolstructs & arrays of these types
Pointers and ReferencesLambdas cannot capture by reference, nor capture pointersReferences and single-indirection pointers supported only as local variables and function arguments
restrict(amp) restrictions
No recursion'volatile'virtual functionspointers to functionspointers to member functionspointers in structspointers to pointersbitfields
No goto or labeled statementsthrow, try, catchglobals or staticsdynamic_cast or typeidasm declarationsvarargsunsupported types
e.g. char, short, long double
vector<int> v(8 * 12);extent<2> e(8,12);accelerator acc = …array<int,2> a(e,acc.default_view);copy_async(v.begin(), v.end(), a);
array<T,N>
Multi-dimensional array of rank N with element TContainer whose storage lives on a specific acceleratorCapture by reference [&] in the lambdaExplicit copyNearly identical interface to array_view<T,N>
parallel_for_each(e, [&](index<2> idx) restrict(amp) { a[idx] += 1;});copy(a, v.begin());
Tiling
Rearrange algorithm to do the calculation in tilesEach thread in a tile shares a programmable cache
tile_static memoryAccess 100x as fast as global memoryExcellent for algorithms that use each piece of information again and again
Overload of parallel_for_each that takes a tiled extent
Race Conditions in the Cache
Because a tile of threads shares the programmable cache, you must prevent race conditions
Tile barrier can ensure a waitTypical pattern:
Each thread does a share of the work to fill the cacheThen waits until all threads have done that workThen uses the cache to calculate a share of the answer
Visual Studio 2012
DebuggingEverything you had before, plus: GPU ThreadsParallel StacksParallel Watch
Visualizing
Debugging
Can hit either CPU breakpoints or GPU breakpointsNew UI in VS 2012 to control that
GPU breakpoints: Only on Windows 8 today
Debugging Properties
Values, Call Stacks, etc
GPU Threads Window
Shows progress through the calculation
Parallel Watch
Shows values across multiple threads
And more!
Race Condition Detection Parallel StacksFlagging, Filtering, and GroupingFreezing and ThawingRun Tile to Cursor
Concurrency Visualizer
Shows activity on CPU and GPUCan highlight relative times for specific parts of a calculationOr copy times to/from the accelerator
C++ AMP is…
C++The language you knowExcellent productivity The language you choose when performance matters
Implemented as (mostly) a libraryVariety of application types
Well supported by Visual Studio 2012DebuggerConcurrency VisualizerEverything else you already use
Can be supported by other compilers and platformsOpen spec
Learn C++ AMP
book http://www.gregcons.com/cppamp/
training http://www.acceleware.com/cpp-amp-training
videos http://channel9.msdn.com/Tags/c++-accelerated-massive-parallelism
articles http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/05/c-amp-articles-in-msdn-magazine-april-issue.aspx
samples http://blogs.msdn.com/b/nativeconcurrency/archive/2012/01/30/c-amp-sample-projects-for-download.aspx
guides http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/11/c-amp-for-the-cuda-programmer.aspx
spec http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/03/c-amp-open-spec-published.aspx
forum http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads
http://blogs.msdn.com/nativeconcurrency/
Call to Action
Get Visual Studio 2012Download some samplesPlay with debugger and other toolsTry writing a C++ AMP application of your own
Console (command prompt)WindowsMetro style for Windows 8
Measure your performance and see the difference
DEV Track Resources
Visual Studio Home Page :: http://www.microsoft.com/visualstudio/en-us
Jason Zander’s Blog :: http://blogs.msdn.com/b/jasonz/
Facebook :: http://www.facebook.com/visualstudio
Twitter :: http://twitter.com/#!/visualstudio
Somasegar’s Blog :: http://blogs.msdn.com/b/somasegar/
Resources
Connect. Share. Discuss.
http://europe.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
Evaluations
http://europe.msteched.com/sessions
Submit your evals online
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.