Upload
sharleen-pearson
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
The Manycore Shift: Making Parallel Computing Mainstream
Bart J.F. De [email protected]://blogs.bartdesmet.net/bartSoftware Development EngineerMicrosoft CorporationSession Code: DTL206
Wishful thinking?
Agenda
The concurrency landscapeLanguage headaches.NET 4.0 facilities
Task Parallel LibraryPLINQCoordination Data StructuresAsynchronous programming
Incubation projectsSummary
Moore’s lawThe number of transistors incorporated in a chip willapproximately double every 24 months.
Gordon Moore – Intel – 1965
Let’s sell processors
Moore’s law todayIt can't continue forever.The nature of exponentials is that you push them out and eventually disaster happens.
Gordon Moore – Intel – 2005
Let’s sell even moreprocessors
Problem statement
Shared mutable stateNeeds synchronization primitivesLocks are problematic
Risk for contentionPoor discoverability (SyncRoot anyone?)Not composableDifficult to get right (deadlocks, etc.)
Coarse-grained concurrencyThreads well-suited for large units of workExpensive context switching
Asynchronous programming
Microsoft Parallel Computing Initiative
Applications
Domain libraries
Programming models & languages
Developer Tooling
Runtime, platform, OS, HyperVisor
Hardware
VB C#
F#
Constructing Parallel Applications
Executing fine-grain Parallel Applications
Coordinating system resources/services
Agenda
The concurrency landscapeLanguage headaches.NET 4.0 facilities
Task Parallel LibraryPLINQCoordination Data StructuresAsynchronous programming
Incubation projectsSummary
Languages: two extremes
LISP heritage(Haskell, ML)
No mutable stateMutable state
Fortran heritage(C, C++, C#, VB)
Fundamentalistfunctional programming
F#
Mutability
Mutable by default (C# et al)
Immutable by default (F# et al)
int x = 5;// Share out xx++;
let x = 5// Share out x// Can’t mutate x
let mutable x = 5// Share out xx <- x + 1
Synchronization required
No locking required
Explicit opt-in
Side-effects will kill you
Elimination of common sub-expressions?
Runtime out of controlCan’t optimize codeTypes don’t reveal side-effectsHaskell concept of IO monad
Did you know? LINQ is a monad!Source: www.cse.chalmers.se
let now = DateTime.Nowin (now, now) (DateTime.Now, DateTime.Now)
static DateTime Now { get; }
Languages: two roadmaps?
Making C# betterAdd safety nets?
ImmutabilityPurity constructsLinear types
Software Transactional MemoryKamikaze-style of concurrency
Simplify common patternsMaking Haskell mainstream
Just right? Too academic?Not a smooth upgrade path?
C#
Haskell
Nirvana
Agenda
The concurrency landscapeLanguage headaches.NET 4.0 facilities
Task Parallel LibraryPLINQCoordination Data StructuresAsynchronous programming
Incubation projectsSummary
Parallel Extensions Architecture.NET Program
Proc 1 …
PLINQ Execution Engine
C# Compiler
VB Compiler
C++ Compiler
IL
OS Scheduling Primitives(also UMS in Windows 7 and up)
DeclarativeQueries Data Partitioning
• Chunk• Range• Hash
• Striped• Repartitioning
Operator Types• Map• Scan• Build
• Search• Reduction
Merging• Async (pipeline)
• Synch• Order Preserving
• Sorting• ForAll
Proc p
Parallel Algorithms
Query Analysis
Task Parallel Library (TPL) Coordination Data Structures
Thread-safe CollectionsSynchronization Types
Coordination Types
Task APIsTask Parallelism
FuturesScheduling
PLINQ
TPL or CDS
F# Compiler
Other .NET Compiler
Task Parallel Library – Tasks
System.Threading.TasksTask
Parent-child relationshipsExplicit groupingWaiting and cancelation
Task<T>Tasks that produce valuesAlso known as futures
Parallel
Task 1
Task 2
…
Task N
Work StealingInternally, the runtime uses
Work stealing techniquesLock-free concurrent task queues
Work stealing has provablyGood localityWork distribution properties
p1 p2 p3 4321 4
22
Example code to parallelize
void MultiplyMatrices(int size, double[,] m1, double[,] m2, double[,] result){ for (int i = 0; i < size; i++) { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } }}
23
Solution today int N = size; int P = 2 * Environment.ProcessorCount; int Chunk = N / P; // size of a work chunk ManualResetEvent signal = new ManualResetEvent(false); int counter = P; // counter limits kernel transitions for (int c = 0; c < P; c++) { // for each chunk ThreadPool.QueueUserWorkItem(o => { int lc = (int)o; for (int i = lc * Chunk; // process one chunk i < (lc + 1 == P ? N : (lc + 1) * Chunk); // respect upper bound i++) { // original loop body for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } } if (Interlocked.Decrement(ref counter) == 0) { // efficient interlocked ops signal.Set(); // and kernel transition only when done } }, c); } signal.WaitOne();
Error Prone
High Overhead
Tricks
Static Work Distribution
Knowledge of Synchronization
Primitives
Heavy Synchronization
Lack of Thread Reuse
24
Solution with Parallel Extensions
void MultiplyMatrices(int size, double[,] m1, double[,] m2, double[,] result){ Parallel.For (0, size, i => { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } });}
Structured parallelism
Task Parallel Library – LoopsCommon source of work in programs
System.Threading.Parallel classParallelism when iterations are independent
Body doesn’t depend on mutable state E.g. static variables, writing to local variables used in subsequent iterations
SynchronousAll iterations finish, regularly or exceptionally
for (int i = 0; i < n; i++) work(i);…foreach (T e in data) work(e);
Parallel.For(0, n, i => work(i));…Parallel.ForEach(data, e => work(e));
Why immutability
gains attention
Task Parallel LibraryBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demo
Amdahl’s law by example
1 2 4 8 160
20
40
60
80
100
120
Non-linearLinear
Number of processors
Tota
l exe
cutio
n tim
e Theoretical maximum speedup determined by amount of linear code
Performance TipsCompute intensive and/or large data sets
Work done should be at least 1,000s of cycles
Do not be gratuitous in task creationLightweight, but still requires object allocation, etc.
Parallelize only outer loops where possibleUnless N is insufficiently large to offer enough parallelism
Prefer isolation & immutability over synchronizationSynchronization == !ScalableTry to avoid shared data
Have realistic expectationsAmdahl’s Law
Speedup will be fundamentally limited by the amount of sequential computationGustafson’s Law
But what if you add more data, thus increasing the parallelizable percentage of the application?
Enable LINQ developers to leverage parallel hardwareFully supports all .NET Standard Query OperatorsAbstracts away the hard work of using parallelism
Partitions and merges data intelligently (classic data parallelism)
Minimal impact to existing LINQ programming modelAsParallel extension method
Optional preservation of input ordering (AsOrdered)Query syntax enables runtime to auto-parallelize
Automatic way to generate more Tasks, like ParallelGraph analysis determines how to do itVery little synchronization internally: highly efficient
Parallel LINQ (PLINQ)
var q = from p in people where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd orderby p.Year ascending select p;
.AsParallel() Query
Task 1
…
Task N
PLINQBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demo
Coordination Data Structures
New synchronization primitives (System.Threading)Barrier
Multi-phased algorithmTasks signal and wait for phases
CountdownEventHas an initial counter valueGets signaled when count reaches zero
LazyInitializerLazy initialization routinesReference type variable gets initialized lazily
SemaphoreSlimSlim brother to Semaphore (goes kernel mode)
SpinLock, SpinWaitLoop-based wait (“spinning”)Avoids context switch or kernel mode transition
Coordination Data Structures
Concurrent collections (System.Collections.Concurrent)BlockingCollection<T>
Producer/consumer scenariosBlocks when no data is available (consumer)Blocks when no space is available (producer)
ConcurrentBag<T>ConcurrentDictionary<TKey, TElement>ConcurrentQueue<T>, ConcurrentStack<T>
Thread-safe and scalable collectionsAs lock-free as possible
Partitioner<T>Facilities to partition data in chunksE.g. PLINQ partitioning problems
Coordination Data StructuresBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demo
Asynchronous workflows in F#
Language feature unique to F#Based on theory of monads
But much more exhaustive compared to LINQ…Overloadable meaning for specific keywords
Continuation passing styleNot: ‘a -> ‘bBut: ‘a -> (‘b -> unit) -> unitIn C# style: Action<T, Action<R>>
Core concept: async { /* code */ }Syntactic sugar for keywords inside blockE.g. let!, do!, use!
Function takes computation result
36
Asynchronous workflows in F#
let processAsync i = async { use stream = File.OpenRead(sprintf "Image%d.tmp" i) let! pixels = stream.AsyncRead(numPixels) let pixels' = transform pixels i use out = File.OpenWrite(sprintf "Image%d.done" i) do! out.AsyncWrite(pixels') }
let processAsyncDemo = printfn "async demo..." let tasks = [ for i in 1 .. numImages -> processAsync i ] Async.RunSynchronously (Async.Parallel tasks) |> ignore printfn "Done!"
Run tasks in parallel
stream.Read(numPixels, pixels -> let pixels' = transform pixels i use out = File.OpenWrite(sprintf "Image%d.done" i) do! out.AsyncWrite(pixels'))
Asynchronous workflows in F#Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demo
Reactive Fx
First-class events in .NETDualism of IEnumerable<T> interface
IObservable<T>Pull versus push
Pull (active): IEnumerable<T> and foreachPush (passive): raise events and event handlers
Events based on functionsComposition at its bestDefinition of operators: LINQ to Events
Realization of the continuation monad
39
IObservable<T> and IObserver<T>
// Dual of IEnumerable<out T>public interface IObservable<out T>{ IDisposable Subscribe(IObserver<T> observer);}
// Dual of IEnumerator<out T>public interface IObserver<in T>{ // IEnumerator<T>.MoveNext return value void OnCompleted();
// IEnumerator<T>.MoveNext exceptional return void OnError(Exception error);
// IEnumerator<T>.Current property void OnNext(T value);}
Way to unsubscribe
Signaling the last event
Virtually two return types
Contra-variance
Co-variance
ReactiveFxBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demo Visit channel9.msdn.com for info
Agenda
The concurrency landscapeLanguage headaches.NET 4.0 facilities
Task Parallel LibraryPLINQCoordination Data StructuresAsynchronous programming
Incubation projectsSummary
DevLabs project (previously “Maestro”)Coordination between components
“Disciplined sharing”Actor model
Agents communicate via messagesChannels to exchange data via ports
Language features (based on C#)Declarative data pipelines and protocolsSide-effect-free functionsAsynchronous methodsIsolated methods
Also suitable in distributed setting
43
Channels for message exchange
agent Program : channel Microsoft.Axum.Application { public Program() { string[] args = receive(PrimaryChannel::CommandLine); PrimaryChannel::ExitCode <-- 0; } }
44
Agents and channels
channel Adder{ input int Num1; input int Num2; output int Sum; } agent AdderAgent : channel Adder { public AdderAgent() { int result = receive(PrimaryChannel::Num1) + receive(PrimaryChannel::Num2); PrimaryChannel::Sum <-- result; } }
Send / receive primitives
45
Protocols
channel Adder{ input int Num1; input int Num2; output int Sum;
Start: { Num1 -> GotNum1; } GotNum1: { Num2 -> GotNum2; } GotNum2: { Sum -> End; } }
State transition diagram
46
Use of pipelines
agent MainAgent : channel Microsoft.Axum.Application { function int Fibonacci(int n) { if (n <= 1) return n; return Fibonacci(n - 1) + Fibonacci(n - 2); }
int c = 10; void ProcessResult(int n) { Console.WriteLine(n); if (--c == 0) PrimaryChannel::ExitCode <-- 0; }
public MainAgent() { var nums = new OrderedInteractionPoint<int>();
nums ==> Fibonacci ==> ProcessResult; for (int i = 0; i < c; i++) nums <-- 42 - i; }}
Description of data flow
Mathematical function
47
Domains
domain Chatroom { private string m_Topic; private int m_UserCount; reader agent User : channel UserCommunication { // ... } writer agent Administrator : channel AdminCommunication { // ... } }
Unit of sharing between agents
Axum in a nutshellBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demo
Another DevLabs projectCutting edge, released 7/28Specialized fork from .NET 4.0 Beta 1
CLR modifications required
First-class transactions on memoryAs an alternative to locking“Optimistic” concurrency methodology
Make modificationsRollback changes on conflict
Core concept: atomic { /* code */ }
Transactional memory
Subtle difference
Problems with locks:Potential for deadlocks…
…and more uglinessGranularity matters a lotDon’t compose well
atomic { m_x++; m_y--; throw new MyException() }
lock (GlobalStmLock) { m_x++; m_y--; throw new MyException() }
52
Bank account sample
public static void Transfer(BankAccount from, BankAccount backup, BankAccount to, int amount) { Atomic.Do(() => { // Be optimistic, credit the beneficiary first to.ModifyBalance(amount); // Find the appropriate funds in source accounts try { from.ModifyBalance(-amount); } catch (OverdraftException) { backup.ModifyBalance(-amount); } }); }
The hard truth about STM
Great featuresACIDOptimistic concurrencyTransparent rollback and re-executeSystem.Transactions (LTM) and DTC support
ImplementationInstrumentation of shared state accessJIT compiler modificationNo hardware support currently
Result:2x to 7x serial slowdown (in alpha prototype)But improved parallel scalability
STM.NETBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demoVisit msdn.microsoft.com/devlabs
DryadLINQ
DryadInfrastructure for cluster computationConcept of job
DryadLINQLINQ over Dryad
Decomposition of queryDistribution over computation nodesRoughly similar to PLINQA la “map-reduce”
Declarative approach works
DryadLINQ = LINQ + Dryad
C# C# C# C#
Vertexcode
Queryplan(Dryad job)Data
collection
results
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);var results = from c in collection
where IsLegal(c.key) select new { Hash(c.key),
c.value};
DryadLINQBart J.F. De SmetSoftware Development EngineerMicrosoft Corporation
demoVisit research.microsoft.com/dryad
Agenda
The concurrency landscapeLanguage headaches.NET 4.0 facilities
Task Parallel LibraryPLINQCoordination Data StructuresAsynchronous programming
Incubation projectsSummary
Summary
Parallel programming requires thinkingAvoid side-effectsPrefer immutability
Act 1 = Library approach in .NET 4.0Task Parallel LibraryParallel LINQCoordination Data StructuresAsynchronous patterns (+ a bit of language sugar)
Act 2 = Different approaches are lurkingSoftware Transactional MemoryPurification of languages
question & answer
www.microsoft.com/teched
Sessions On-Demand & Community
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
Complete an evaluation on CommNet and enter to win!
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.