40

C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting @gregcons

Embed Size (px)

Citation preview

Page 1: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons
Page 2: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

C++ AMP: Accelerated Massive Parallelism in Visual C++ 11Kate GregoryGregory Consultingwww.gregcons.com/kateblog, @gregcons

Page 3: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

C++ is the language for performance

If you need speed at runtime, you use C++Frameworks and libraries can make your code faster

Eg PPL: use all the CPU coresWith little or no change to your logic and code

Experienced C++ developers are productive in C++Don’t want to go back to C or C-like languageEnjoy the tool support in Visual Studio

Many C++ developers value portabilityWrite standard C++, compile with anythingUse portable libraries, run anywhereEven in all-Microsoft universe, simple deployment is important

Page 4: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

demo

Cartoonizer

Page 5: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

What is C++ AMP?Accelerated Massive Parallelism

Run your calculations on one or more acceleratorsToday, GPU is the accelerator you useEventually: other kinds of accelerators

Write your whole application in C++Not a “C-like” language or a separate resource you link inUse Visual Studio and familiar toolsSpeed up 20x, 50x, or more

Basically a libraryComes with Visual Studio 2012, included in vcredistSpec is open – other platforms/compilers can implement it too

Page 6: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Agenda

Hardware ReviewC++ AMP FundamentalsTilingDebugging and VisualizingCall to Action

Page 7: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

CPUs vs GPUs today

CPU

Low memory bandwidthHigher power consumptionMedium level of parallelismDeep execution pipelinesRandom accessesSupports general codeMainstream programming

GPU

High memory bandwidthLower power consumptionHigh level of parallelismShallow execution pipelinesSequential accessesSupports data-parallel codeNiche programming

images source: AMD

Page 8: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

C++ AMP is fundamentally a library

Comes with Visual C++ 2012#include <amp.h>Namespace: concurrencyNew classes:

array, array_viewextent, indexaccelerator, accelerator_view

New function(s): parallel_for_each()New (use of) keyword: restrict

Asks compiler to check your code is ok for GPU (DirectX)

Page 9: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

parallel_for_each

Entry point to the libraryTakes number (and shape) of threads neededTakes function or lambda to be done by each thread

Must be restrict(amp) Sends the work to the accelerator

Scheduling etc handled thereReturns – no blocking/waiting

Page 10: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

void AddArrays(int n, int * pA, int * pB, int * pSum){

for (int i=0; i<n; i++)

{ pSum[i] = pA[i] + pB[i]; }

}

#include <amp.h>using namespace concurrency;

void AddArrays(int n, int * pA, int * pB, int * pSum){ array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pSum); parallel_for_each( sum.extent, [=](index<1> i) restrict(amp) { sum[i] = a[i] + b[i]; } );}

Hello World: Array Addition

void AddArrays(int n, int * pA, int * pB, int * pSum){

for (int i=0; i<n; i++)

{ pSum[i] = pA[i] + pB[i]; }

}

Page 11: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Basic Elements of C++ AMP codingvoid AddArrays(int n, int * pA, int * pB, int * pSum){ array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pSum); parallel_for_each(

sum.extent, [=](index<1> i) restrict(amp) { sum[i] = a[i] + b[i];

} );}

array_view variables captured and associated data copied to accelerator (on demand)

restrict(amp): tells the compiler to check that this code conforms to C++ AMP language restrictions

parallel_for_each: execute the lambda on the accelerator once per thread

extent: the number and shape of threads to execute the lambda

index: the thread ID that is running the lambda, used to index into data

array_view: wraps the data to operate on the accelerator

Page 12: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

extent<N> - size of an N-dim space

Page 13: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

index<N> - an N-dimensional point

Page 14: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

View on existing data on the CPU or GPUDense in least significant dimensionOf element T and rank NRequires extentRectangularAccess anywhere (implicit sync)

vector<int> v(10);

extent<2> e(2,5); array_view<int,2> a(e, v);

array_view<T,N>

//above two lines can also be written//array_view<int,2> a(2,5,v);

index<2> i(1,3);

int o = a[i]; // or a[i] = 16;//or int o = a(1, 3);

Page 15: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

demo

Matrix Multiplication

Page 16: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Matrix Multiplication

C00 = A00 * B00 + A01 * B10 + A02 * B20 + A03 * B30

Page 17: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

restrict(amp) restrictions

Can only call other restrict(amp) functionsAll functions must be inlinableOnly amp-supported types

int, unsigned int, float, double, boolstructs & arrays of these types

Pointers and ReferencesLambdas cannot capture by reference, nor capture pointersReferences and single-indirection pointers supported only as local variables and function arguments

Page 18: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

restrict(amp) restrictions

No recursion'volatile'virtual functionspointers to functionspointers to member functionspointers in structspointers to pointersbitfields

No goto or labeled statementsthrow, try, catchglobals or staticsdynamic_cast or typeidasm declarationsvarargsunsupported types

e.g. char, short, long double

Page 19: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

vector<int> v(8 * 12);extent<2> e(8,12);accelerator acc = …array<int,2> a(e,acc.default_view);copy_async(v.begin(), v.end(), a);

array<T,N>

Multi-dimensional array of rank N with element TContainer whose storage lives on a specific acceleratorCapture by reference [&] in the lambdaExplicit copyNearly identical interface to array_view<T,N>

parallel_for_each(e, [&](index<2> idx) restrict(amp) { a[idx] += 1;});copy(a, v.begin());

Page 20: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Tiling

Rearrange algorithm to do the calculation in tilesEach thread in a tile shares a programmable cache

tile_static memoryAccess 100x as fast as global memoryExcellent for algorithms that use each piece of information again and again

Overload of parallel_for_each that takes a tiled extent

Page 21: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Race Conditions in the Cache

Because a tile of threads shares the programmable cache, you must prevent race conditions

Tile barrier can ensure a waitTypical pattern:

Each thread does a share of the work to fill the cacheThen waits until all threads have done that workThen uses the cache to calculate a share of the answer

Page 22: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Visual Studio 2012

DebuggingEverything you had before, plus: GPU ThreadsParallel StacksParallel Watch

Visualizing

Page 23: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Debugging

Can hit either CPU breakpoints or GPU breakpointsNew UI in VS 2012 to control that

GPU breakpoints: Only on Windows 8 today

Page 24: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Debugging Properties

Page 25: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Values, Call Stacks, etc

Page 26: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

GPU Threads Window

Shows progress through the calculation

Page 27: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Parallel Watch

Shows values across multiple threads

Page 28: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

And more!

Race Condition Detection Parallel StacksFlagging, Filtering, and GroupingFreezing and ThawingRun Tile to Cursor

Page 29: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Concurrency Visualizer

Shows activity on CPU and GPUCan highlight relative times for specific parts of a calculationOr copy times to/from the accelerator

Page 30: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons
Page 31: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons
Page 32: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

C++ AMP is…

C++The language you knowExcellent productivity The language you choose when performance matters

Implemented as (mostly) a libraryVariety of application types

Well supported by Visual Studio 2012DebuggerConcurrency VisualizerEverything else you already use

Can be supported by other compilers and platformsOpen spec

Page 33: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Learn C++ AMP

book http://www.gregcons.com/cppamp/

training http://www.acceleware.com/cpp-amp-training

videos http://channel9.msdn.com/Tags/c++-accelerated-massive-parallelism

articles http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/05/c-amp-articles-in-msdn-magazine-april-issue.aspx

samples http://blogs.msdn.com/b/nativeconcurrency/archive/2012/01/30/c-amp-sample-projects-for-download.aspx

guides http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/11/c-amp-for-the-cuda-programmer.aspx

spec http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/03/c-amp-open-spec-published.aspx

forum http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads

http://blogs.msdn.com/nativeconcurrency/

Page 34: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Call to Action

Get Visual Studio 2012Download some samplesPlay with debugger and other toolsTry writing a C++ AMP application of your own

Console (command prompt)WindowsMetro style for Windows 8

Measure your performance and see the difference

Page 35: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Related Content

Breakout Sessions (session codes and titles)

Hands-on Labs (session codes and titles)

Product Demo Stations (demo station title and location)

Related Certification Exam

Find Me Later At…

Required Slide*delete this box when your slide is finalized

Speakers, please list the Breakout Sessions, Labs, Demo Stations and Certification Exams that relate to your session. Also indicate when they can find you staffing in the TLC.

Page 36: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Track Resources

Resource 1

Resource 2

Resource 3

Resource 4

Required Slide *delete this box when your slide is finalized

Track PMs will supply the content for this slide, which will be inserted during the final scrub.

Page 37: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Resources

Connect. Share. Discuss.

http://northamerica.msteched.com

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources for IT Professionals

http://microsoft.com/technet

Resources for Developers

http://microsoft.com/msdn

Page 38: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

Complete an evaluation on CommNet and enter to win!

Page 39: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

MS Tag

Scan the Tagto evaluate thissession now onmyTechEd Mobile

Required Slide *delete this box when your slide is finalized

Your MS Tag will be inserted here during the final scrub.

Page 40: C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting  @gregcons

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.