Upload
pksowjanya
View
223
Download
0
Embed Size (px)
Citation preview
7/30/2019 Intel Threading Tools
1/74
Threading for IntelPlatforms:
Process and Tools
Paul Petersen
Sr. Principal EngineerPerformance Analysis and Threading Tools Lab
Software and Solutions Group (SSG)Intel Corporation, Champaign, IL
7/30/2019 Intel Threading Tools
2/74
2Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
INSTRUCTION FETCHINSTRUCTION FETCH
AND PRE-DECODEAND PRE-DECODE
INSTRUCTION QUEUEINSTRUCTION QUEUE
RETIREMENT UNITRETIREMENT UNIT
(REORDER BUFFER)(REORDER BUFFER)
DECODEDECODE
RENAME / ALLOCRENAME / ALLOC
SCHEDULERSSCHEDULERS
EXECUTEEXECUTE
INSTRUCTION FETCHINSTRUCTION FETCH
AND PRE-DECODEAND PRE-DECODE
INSTRUCTION QUEUEINSTRUCTION QUEUE
RETIREMENT UNITRETIREMENT UNIT
(REORDER BUFFER)(REORDER BUFFER)
DECODEDECODE
RENAME / ALLOCRENAME / ALLOC
SCHEDULERSSCHEDULERS
EXECUTEEXECUTE
Multiple cores. Multiple instruction issue.
Multiple, externally visible processors on a singledie where the processors have independent
control-flow, separate internal state and no criticalresource sharing
7/30/2019 Intel Threading Tools
3/74
3Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Many-core array
CMP with 10s-100s low
power cores
Scalar cores
Capable of TFLOPS+
Full System-on-Chip
Servers, workstations,
embedded
Dual core
Symmetric multithreading
Multi-core array
CMP with ~10 cores
Evolution
Large, Scalar cores for
high single-thread
performance
Scalar plus many core for
highly threaded workloads
Evolutionary configurable architecture
ro2015: Evolving Processor Architecture, Intel Developer Forum, Mar 2005ro2015: Evolving Processor Architecture, Intel Developer Forum, Mar 2005
7/30/2019 Intel Threading Tools
4/74
4Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Core Microarchitecture
*Graphics not representative of actual die photo or relative size*Graphics not representative of actual die photo or relative size
ScalableScalableLow PowerLow Power High PerformanceHigh Performance
MeromMerom
ConroeConroe
WoodcrestWoodcrest
65nm65nm
ServerServer
OptimizedOptimized
DesktopDesktop
OptimizedOptimized
MobileMobile
OptimizedOptimized
IntelIntel
WideWideDynamicDynamic
ExecutionExecution
IntelIntel
IntelligentIntelligent
PowerPower
CapabilityCapability
IntelIntelAdvancedAdvanced
Smart CacheSmart Cache
IntelIntel
SmartSmart
MemoryMemory
AccessAccess
IntelIntel
AdvancedAdvanced
Digital MediaDigital Media
BoostBoost
Applications must consider Multi-core Architectures
Applications must consider Multi-core Architectures
7/30/2019 Intel Threading Tools
5/74
5Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
The Importance of Parallelism & Threading
Do Nothing: Benefits Still Visible Operating systems ready for multi-processing/tasking
Background tasks benefit from more compute resources
Larger caches
But lose opportunity to gain magnitudes of performance benefits
Parallelize: Unleash the Potential!
Path to performance Native threads
Threaded libraries
Compiler generated threads
competitive pressures = demand for parallelapplications
competitive pressures = demand for parallelapplications
7/30/2019 Intel Threading Tools
6/74
6Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Vision of the Future
Parallelism for Everyone
Parallelism changes the game
A large percentage of people who provide applications are going tohave to care about parallelism in order to match the capabilities oftheir competitors.
Performance
GHz Era
Time
Multi-core Era
Activ
eISV
Active
ISV
Passiv
eISV
Passiv
eISV
Platfo
rmPo
tential
Platfo
rmPo
tential Growing gap!Growing gap!
Fixed
gap
Fixed
gap
7/30/2019 Intel Threading Tools
7/74
7Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Parallelism occurs at Many Levels
Instruction-level Parallelism (Hardware/Compiler)
Vectorization (Compiler)
Loop-level Parallelism (Open MP)
Object-level Parallelism (TBB)
Thread-level (pthreads, Win32 threads)
Process-level (MPI)
7/30/2019 Intel Threading Tools
8/74
8Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Parallel Programming Tools
Highly optimizing
compilers delivering
scalable solutions
Detect latent
Programming bugs
Tune for performance
and scalability
Visualization of applicationsand the system
ArchitecturalAnalysis
Introduce
Parallelism
Confidence /Correctness
Optimize /Tune
7/30/2019 Intel Threading Tools
9/74
9Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Multi-core Software Support
Use of multiple cores
very important to us and our customers
Three key challenges: Scalability
Correctness
Ease of programming
7/30/2019 Intel Threading Tools
10/74
10Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Multi-core Software SupportHow will we help more developers use parallelism?
Intel Thread Checkerpinpoints latent threading errors
Intel Thread ProfilerIntel VTune Performance
Analyzertwo ways to gain
precise insight into threadedcode application/lock levelor application/system level
Intel Threading Building
BlocksC++ template task programming libraryWe help make it easier
7/30/2019 Intel Threading Tools
11/74
11Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
A Generic Development CycleAnalyze
VTune Performance Analyzer
Design (Introduce Threads)Intel Performance libraries: IPP and MKL
OpenMP* (Intel Compiler)
[Explicit threading (Win32*, Pthreads*)]
Threading Building Blocks (C++ templates)
Debug for correctness
Intel Thread Checker, Message Checker
Intel Debugger
Tune for performanceIntel Compilers
Intel Thread Profiler, Trace Analyzer
VTune Performance Analyzer
*Other names and brands may be claimed as the property of others
E l P i N b G ti
7/30/2019 Intel Threading Tools
12/74
12Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Example: Prime Number Generation
Outer loop
steps throughnumbers thatmay be prime
TestForPrimetests factorsofnumber
#include const long N = 18;long primes[N], number_of_primes = 0;main(){
printf( "Determining primes from 1-%d \n", N );primes[ number_of_primes++ ] = 2; // special case
for (long number = 3; number
7/30/2019 Intel Threading Tools
13/74
13Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Prime Number GenerationCase Study
FindPrimes(start,end)
TestForPrimes(number)
Increment Count
NO
YES
ShowProgress
main
See backup foils for system configuration
7/30/2019 Intel Threading Tools
14/74
14Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
AnalysisWhere to thread?
7/30/2019 Intel Threading Tools
15/74
15Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
AnalysisWhat is the expected speedup?
Scaling(2P)=100/(99.7/2+0.3)= ~1.99X
7/30/2019 Intel Threading Tools
16/74
16Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
AnalysisWhich threading model?
Rapid prototyping with OpenMP*Fork-join parallelism
Parallel Regions
Master
Thread
*Other names and brands may be claimed as the property of others
7/30/2019 Intel Threading Tools
17/74
17Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Design( Introduce Threads)
#pragma omp parallel forfor( int i = start; i
7/30/2019 Intel Threading Tools
18/74
18Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
#pragma omp parallel forfor( int i = start; i
7/30/2019 Intel Threading Tools
19/74
19Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
OpenMP Contents
OpenMP constructs fall into five categories Runtime functions/environment variables
Parallel regions
Worksharing
Data environment Synchronization
OpenMP is fundamentally the same betweenFortran and C/C++
7/30/2019 Intel Threading Tools
20/74
20Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
A Few Syntax Details to GetStarted
Most of the constructs in OpenMP are compilerdirectives or pragmas
For C and C++, the pragmas take the form:#pragma omp construct [clause [clause]]
For Fortran, the directives take one of the forms:C$OMP construct [clause [clause]]
!$OMP construct [clause [clause]]
*$OMP construct [clause [clause]]
Header file or Fortran 90 module#include omp.huse omp_lib
7/30/2019 Intel Threading Tools
21/74
21Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
24 Library Routines
Runtime environment routines:
Modify/check the number of threadsomp_[set|get]_num_threads()
omp_get_thread_num()
omp_get_max_threads()
Are we in a parallel region?omp_in_parallel()
How many processors in the system?
omp_get_num_procs()
Explicit locksomp_[set|unset]_lock()
And many more...
7/30/2019 Intel Threading Tools
22/74
22Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Design (Introduce Threads)
Scaling of 1.4X
7/30/2019 Intel Threading Tools
23/74
23Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Debug for Correctness
7/30/2019 Intel Threading Tools
24/74
24Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Data Race
Time
Thread 1 Thread 2
The updates to number_of_primes are not atomic!
number_of_primes++ number_of_primes++
7/30/2019 Intel Threading Tools
25/74
25Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Data Race
Thread 1 Thread 2
The updates to number_of_primes are not atomic!
t1 = LOAD n_of_p
t1++
STORE n_of_p = t1
t2 = LOAD n_of_p
t2++
STORE n_of_p = t2
Time
7/30/2019 Intel Threading Tools
26/74
26Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Data Race
Thread 1 Thread 2
Lose one increment to number of primes!
1: t1= LOAD n_of_p
2: t2 = LOAD n_of_p3: t1++
4: t2++
5: STORE n_of_p = t2
6: STORE n_of_p = t1
Time
7/30/2019 Intel Threading Tools
27/74
27Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Data Race
Thread 1 Thread 2
Add synchronization to remove data race
CRITICAL {
t1= LOAD n_of_p
t1++
STORE n_of_p = t1
}
CRITICAL {
t2 = LOAD n_of_p
t2++
STORE n_of_p = t2
}
Time
7/30/2019 Intel Threading Tools
28/74
28Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Thread CheckerCreate Correct Threads Faster
Key Features Detects challenging data races and deadlocks
Pinpoints errors to the source code line
Binary Instrumentation; Works on standarddebug builds without recompiling
Recommends modules to instrument by usage(windows* product)
Scriptable interface for test environmentintegration
IA32/EM64T, C/C++/Fortran, GUI CLI & RDC onWindows, RDC & CLI on Linux, Win32threads/pthreads
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Intel Confidential NDA Required
7/30/2019 Intel Threading Tools
29/74
29Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Debug for Correctness
Intel Thread Checker pinpoints notorious threading bugs like data races, stallsand deadlocks
Intel Thread Checker
VTune Performance Analyzer
+DLLs (Instrumented)
(Instrumented)
7/30/2019 Intel Threading Tools
30/74
30Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
How Does Thread Checker Work?
Thread 1
n_of_p++
Lock(L);
Unlock(L);
Thread 2
n_of_p++;
Lock(L);
Unlock(L);
Time
7/30/2019 Intel Threading Tools
31/74
31Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
How Does Thread Checker Work?
Thread 1
n_of_p++
Lock(L);
Unlock(L);
Thread 2
n_of_p++;
Lock(L);
Unlock(L);
record read(n_of_p)
record lock(L)
record unlock(L)record lock(L)
record unlock(L)
Use binary instrumentation
record write(n_of_p)
record read(n_of_p)record write(n_of_p)
Time
7/30/2019 Intel Threading Tools
32/74
32Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
How Does Thread Checker Work?
Thread 1
n_of_p++
Lock(L);
Unlock(L);
Thread 2
n_of_p++
Lock(L);
Unlock(L);
Analysis reveals thatsegment 1happens
beforesegment 2. So, no data race problem.
Segment 1
Segment 2
Analysis based on [Lamport 1978]
Time
7/30/2019 Intel Threading Tools
33/74
33Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
How Does Thread Checker Work?
Thread 1
n_of_p++
Thread 2
n_of_p++
Segment 1 Segment 2
With no locks, neither segment happens beforethe other. So there is a data race!
Time
7/30/2019 Intel Threading Tools
34/74
34Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Debug for Correctness
7/30/2019 Intel Threading Tools
35/74
35Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Debug for Correctness
7/30/2019 Intel Threading Tools
36/74
36Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Debug for Correctness
#pragma omp parallel forfor( int i = start; i
7/30/2019 Intel Threading Tools
37/74
37Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Debug for Correctness
7/30/2019 Intel Threading Tools
38/74
38Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Thread ProfilerOptimize Threads Faster
Key Features Identifies synchronization objects that
impact performance
Highlights thread workload imbalances
Shows the number of cores utilized
Pinpoints issues to the source code line
Maximizes application time spent in parallelregions
IA32/EM64T, C/C++/Fortran, GUI RDC & CLI
on Windows, RDC & CLI on Linux, Win32threads/pthreads
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Intel Confidential NDA Required
7/30/2019 Intel Threading Tools
39/74
39Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance- Combination View
7/30/2019 Intel Threading Tools
40/74
40Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
Threads are not
balanced
7/30/2019 Intel Threading Tools
41/74
41 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
Threads are not
balanced
Thread 1Thread 2
Reschedule from block to block cyclic
7/30/2019 Intel Threading Tools
42/74
42 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
Threads are not
balanced
Thread 1
Thread 2
7/30/2019 Intel Threading Tools
43/74
43 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
Balance the workload for the threads
#pragma omp parallel for schedule (static, 8)for( int i = start; i
7/30/2019 Intel Threading Tools
44/74
44 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
Balance the workload for the threads
Scaling achieved is 1.77XScaling achieved is 1.77X
7/30/2019 Intel Threading Tools
45/74
45 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune For Performance balanced
7/30/2019 Intel Threading Tools
46/74
46 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune For Performance Object focus
T F P f Obj t Vi
7/30/2019 Intel Threading Tools
47/74
47 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune For Performance Object View
T f P f
7/30/2019 Intel Threading Tools
48/74
48 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
P f R i it d
7/30/2019 Intel Threading Tools
49/74
49 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
#pragma omp parallel for schedule (static,8)for( int i = start; i
7/30/2019 Intel Threading Tools
50/74
50 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Performance Re-visited
Lets look at the OpenMP locks
#pragma omp parallel for schedule (static,8)for( int i = start; i
7/30/2019 Intel Threading Tools
51/74
51 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
All locks have been taken care of
Final Scaling 1.86
Tune for Performance
7/30/2019 Intel Threading Tools
52/74
52 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Tune for Performance
I t l Th di B ildi Bl k
7/30/2019 Intel Threading Tools
53/74
53 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Threading Building Blocks
It is a C++ template library
Compatible with other threading packages
Portable across Linux*, Mac OS*, and Windows*
You specify task patterns instead of threads
Library maps your logical tasks onto physical threads,efficiently using cache and balancing load
Targets threading for robust performance
Does well withnested parallelism
Emphasizes scalable, data parallelprogramming
Generic programming enables distribution of broadly-useful high-quality algorithms and data structures.
Intel Threading Building Blocks
7/30/2019 Intel Threading Tools
54/74
54 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Threading Building Blocks(C++ template library)
Low-Level Synchronization Primitivesatomicmutex
spin_mutexqueuing_mutexspin_rw_mutex
queuing_rw_mutex
Generic Parallel Algorithms
parallel_forparallel_reduce
pipelineparallel_sortparallel_whileparallel_scan
Concurrent Containers
concurrent_hash_mapconcurrent_queueconcurrent_vector
Task Scheduler
Timingtick_count
Memory Allocationcache_aligned_allocator
scalable_allocator
Problem
7/30/2019 Intel Threading Tools
55/74
55 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Problem
Gaining performance from multicore requires parallel programming
Even a simple parallel for is tricky for a non-expert to write well withthreads.
Two aspects to parallel programming
Correctness: avoiding race conditions and deadlock
Performance: efficient use of resources
Hardware threads (match parallelism to hardware threads)
Memory space (choose right evaluation order)
Memory bandwidth (reuse cache)
Limitations
7/30/2019 Intel Threading Tools
56/74
56 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Limitations
Intel TBB is not intended for
I/O bound processing
Hard real-time processing
However, it is compatible with other threading packages
It can be used in concert with other threading packages such asnative threads and OpenMP.
Serial Example
7/30/2019 Intel Threading Tools
57/74
57 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Serial Example
static void SerialUpdateVelocity() {
for( int i=1; i
7/30/2019 Intel Threading Tools
58/74
58 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Parallel Version
struct UpdateVelocityBody {
void operator()( constblocked_range& range ) const { int end = range.end();
for( int i= range.begin(); i
7/30/2019 Intel Threading Tools
59/74
59 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Range is Generic
Requirements for parallel_for Range
Library provides blocked_range and blocked_range2d
You can define your own ranges
Puzzle: Write parallel quicksort using parallel_for, without recursion!
R::R (const R&) Copy constructor
R::~R() Destructor
bool R::empty() const True if range is empty
bool R::is_divisible() const True if range can be partitioned
R::R (R& r, Split) Split r into two subranges
How this works on blocked range2d
7/30/2019 Intel Threading Tools
60/74
60 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
tasks available to thieves
How this works onblocked_range2d
Split range...
.. recursively...
...until grainsize.
Two Execution Orders
7/30/2019 Intel Threading Tools
61/74
61 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Two Execution Orders
Depth First
(stack)
Small space
Excellent cache locality
No parallelism
Breadth First
(queue)
Large space
Poor cache locality
Maximum parallelism
Work Dept First; Steal Breadth First
7/30/2019 Intel Threading Tools
62/74
62 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Work Dept First; Steal Breadth First
L1
L2
victim thread
Best choice for theft!big piece of workdata far from victims hot data.
Second best choice.
Summary of Intel Threading Building Blocks
7/30/2019 Intel Threading Tools
63/74
63 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Summary of Intel Threading Building Blocks
It is a library
You specify task patterns, not threads
Targets threading for robust performance
Does well withnested parallelism
Compatible with other threading packages
Emphasizes scalable, data parallelprogrammingGeneric programming enables distribution of broadly-useful high-quality algorithms and data structures.
Intel Software Development Products
7/30/2019 Intel Threading Tools
64/74
64 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Software Development ProductsIntroduction
Intel Software Development Products help you use the power of your
software to unleash the full potential of the hardwarehttp://www.intel.com/software/products
Performance Extract the maximum application performance from Intel processor based systems
Simplifies taking advantage of new capabilities like multi-core and Intel EM64T
Compatibility Compatible with popular development environments including Microsoft VisualStudio on Windows*, GCC on Linux*, XCode on Mac OS*
32-bit and 64-bit processor support with one package
Linux*, Windows*, Mac OS* support
Support
One year of unlimited technical support and upgrades with purchase Get answers from the engineers that know how to develop software on IntelArchitecture
Intel Software Development Products offer
Performance, Compatibility, and Support
Intel Software Development Products offer
Performance, Compatibility, and Support
Intel Premier SupportRegistering for support waseasy, and we value the security
f k i th t I t l i th t
7/30/2019 Intel Threading Tools
65/74
65 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Intel Premier Support
Purchase of Intel Software Development Products includes one yearof unlimited premier support
Intel Premier Support includes:
Primary support for all Intel Software Development Products
Online access to Intel Premier Support Website
Issue submission & tracking
Product updates & related downloads
FAQs, user forums, & other proactive notices
of knowing that Intel is there tohelp, even though we haventneeded it so far.
Rob Hoffmann - Director ofMarketing, NewTek, Inc.
Support Comes Directly fromExperts in Software Development at Intel
Support Comes Directly fromExperts in Software Development at Intel
Learn moreIntel books
7/30/2019 Intel Threading Tools
66/74
66 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Programming with Hyper-Threading TechnologyHow to Write Multithreaded Software forIntel IA-32 Processors
Multi-Core ProgrammingIncreasing Performance throughSoftware Multithreading
The SoftwareOptimization CookbookHigh-Performance Recipes for theIntel Architecture
VTune PerformanceAnalyzer EssentialsMeasurement and Tuning
Techniques for SoftwareDevelopers
Intel IntegratedPerformance PrimitivesHow to Optimize Software
Applications Using Intel IPP
Scientific Computing onItanium-based Systems
S
7/30/2019 Intel Threading Tools
67/74
67 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Summary
Applications must consider Multi-core Architectures
Intel Software Development Products offer
Performance, Compatibility, and Support
Applications take advantage of Multi-core by using multipleexecution threads
competitive pressures = demand for parallel applications
Intel Software Development Products help deliver Multi-coreapplication performance
Intel C++ and Fortran Compilers
Intel Vtune Performance Analyzers
Intel Performance Libraries
Intel Thread Checker and Intel Thread Profiler
Intel Threading Tools offer unique features for developers,considerably reducing the effort to fully exploit multi-corearchitectures
7/30/2019 Intel Threading Tools
68/74
68 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Amdahls Law
7/30/2019 Intel Threading Tools
69/74
69 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
P/n
Describes the upper bound of parallel speedup (scaling)
(1-P)
P
T
serial
(1-P)
n = number of processors
Tparallel = {(1-P) + P/n}Tserial + O
Scaling = Tserial / Tparallel
Amdahl s Law
Amdahls Law
7/30/2019 Intel Threading Tools
70/74
70 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Describes the upper bound of parallel speedup (scaling)
(1-P)
P
T
serial
(1-P)
P/2 n = number of processors
Tparallel = {(1-P) + P/n}Tserial + O
Scaling = Tserial / Tparallel
0.5 +0.5 + 0.250.25
1.0/0.75 =1.0/0.75 = 1.331.33
n = 2n = 2
Amdahl s Law
Amdahls Law
7/30/2019 Intel Threading Tools
71/74
71 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Describes the upper bound of parallel speedup (scaling)
(1-P)
P
T
serial
(1-P)
n = number of processors
Tparallel = {(1-P) + P/n}Tserial + O
Scaling = Tserial / Tparallel
n =n =
P/
0.5 +0.5 + 0.00.0
1.0/0.5 =1.0/0.5 = 2.02.0
Amdahl s Law
More parallel code improves scalingMore parallel code improves scalingMore parallel code improves scalingMore parallel code improves scaling
Amdahls Law
7/30/2019 Intel Threading Tools
72/74
72 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
Amdahl s Law
Runtime = serial + parallel
With perfect scaling:
Runtime = serial + parallel/P
To get 50x speedup on 65processors, serial time must be
less than 0.5%
Emerging class of highly parallel workloads: games, mediaprocessing, personal data search, and more
Ideal performance scaling, varying ser ial
percentage
0
00
00
00
00
00
00
00
0 0 00 00 00 00 00 00 00
processors
speedup
. %000
. %000
. %000
. %000
. %0000
. %0000
Examples capabilities and performance
Intels ThreadingI
Taking advantageof Intels broad
f ft
7/30/2019 Intel Threading Tools
73/74
73 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners
a p es capa es a d pe o a ce
Adobe Premiere Pro 2.0
Professional video editing High Definition playback scales to 4 or more threads
Pixar - RenderMan Pro Server
Production rendering
Scale to 5x speedup on 8 cores
MainConcept H.264/AVC
Advanced Video Encoding
Scale to 3.25x speedup on 4 cores
Thomson and Fraunhofer IIS Mp3 and mp3 surround codec cross platform
1.7x speedup on 2 cores
Threading can bea challenging task,
and timeconsuming as
well. Intel reallyunderstands thisand has greatproducts for
softwaredevelopers.
Bill HenslerVice President of
engineering Video Solutionsfor Adobe
gTools have
accelerated ourdevelopment cycledramatically. The
Intel ThreadingTools are now an
integral part of ourdevelopment
process.
Dana BataliDirector of Pixar
RenderManDevelopment
In ourdevelopment
efforts, we hugelydepend on Intel
tools. We use theIntel compiler, theThreading Tools,and also VTune to
make sure ourcodecs exploit theIntel platform as
best as possible
Markus MonigCEO
MainConcept
range of softwaredevelopment toolsfor threading andtheir extensive
expertise, we wereable to see a 70%performanceincrease when ourmp3 and mp3SURROUND codecsran on Intel
Centrino Duomobile technology.This results in agreater listeningexperience forconsumers.
Rocky Caldwell, General Managermp3 Licensing, Thomson
Customer Testimonials
7/30/2019 Intel Threading Tools
74/74