Intel Threading Tools

Embed Size (px)

Citation preview

  • 7/30/2019 Intel Threading Tools

    1/74

    Threading for IntelPlatforms:

    Process and Tools

    Paul Petersen

    Sr. Principal EngineerPerformance Analysis and Threading Tools Lab

    Software and Solutions Group (SSG)Intel Corporation, Champaign, IL

    [email protected]

  • 7/30/2019 Intel Threading Tools

    2/74

    2Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    INSTRUCTION FETCHINSTRUCTION FETCH

    AND PRE-DECODEAND PRE-DECODE

    INSTRUCTION QUEUEINSTRUCTION QUEUE

    RETIREMENT UNITRETIREMENT UNIT

    (REORDER BUFFER)(REORDER BUFFER)

    DECODEDECODE

    RENAME / ALLOCRENAME / ALLOC

    SCHEDULERSSCHEDULERS

    EXECUTEEXECUTE

    INSTRUCTION FETCHINSTRUCTION FETCH

    AND PRE-DECODEAND PRE-DECODE

    INSTRUCTION QUEUEINSTRUCTION QUEUE

    RETIREMENT UNITRETIREMENT UNIT

    (REORDER BUFFER)(REORDER BUFFER)

    DECODEDECODE

    RENAME / ALLOCRENAME / ALLOC

    SCHEDULERSSCHEDULERS

    EXECUTEEXECUTE

    Multiple cores. Multiple instruction issue.

    Multiple, externally visible processors on a singledie where the processors have independent

    control-flow, separate internal state and no criticalresource sharing

  • 7/30/2019 Intel Threading Tools

    3/74

    3Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Many-core array

    CMP with 10s-100s low

    power cores

    Scalar cores

    Capable of TFLOPS+

    Full System-on-Chip

    Servers, workstations,

    embedded

    Dual core

    Symmetric multithreading

    Multi-core array

    CMP with ~10 cores

    Evolution

    Large, Scalar cores for

    high single-thread

    performance

    Scalar plus many core for

    highly threaded workloads

    Evolutionary configurable architecture

    ro2015: Evolving Processor Architecture, Intel Developer Forum, Mar 2005ro2015: Evolving Processor Architecture, Intel Developer Forum, Mar 2005

  • 7/30/2019 Intel Threading Tools

    4/74

    4Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Core Microarchitecture

    *Graphics not representative of actual die photo or relative size*Graphics not representative of actual die photo or relative size

    ScalableScalableLow PowerLow Power High PerformanceHigh Performance

    MeromMerom

    ConroeConroe

    WoodcrestWoodcrest

    65nm65nm

    ServerServer

    OptimizedOptimized

    DesktopDesktop

    OptimizedOptimized

    MobileMobile

    OptimizedOptimized

    IntelIntel

    WideWideDynamicDynamic

    ExecutionExecution

    IntelIntel

    IntelligentIntelligent

    PowerPower

    CapabilityCapability

    IntelIntelAdvancedAdvanced

    Smart CacheSmart Cache

    IntelIntel

    SmartSmart

    MemoryMemory

    AccessAccess

    IntelIntel

    AdvancedAdvanced

    Digital MediaDigital Media

    BoostBoost

    Applications must consider Multi-core Architectures

    Applications must consider Multi-core Architectures

  • 7/30/2019 Intel Threading Tools

    5/74

    5Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    The Importance of Parallelism & Threading

    Do Nothing: Benefits Still Visible Operating systems ready for multi-processing/tasking

    Background tasks benefit from more compute resources

    Larger caches

    But lose opportunity to gain magnitudes of performance benefits

    Parallelize: Unleash the Potential!

    Path to performance Native threads

    Threaded libraries

    Compiler generated threads

    competitive pressures = demand for parallelapplications

    competitive pressures = demand for parallelapplications

  • 7/30/2019 Intel Threading Tools

    6/74

    6Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Vision of the Future

    Parallelism for Everyone

    Parallelism changes the game

    A large percentage of people who provide applications are going tohave to care about parallelism in order to match the capabilities oftheir competitors.

    Performance

    GHz Era

    Time

    Multi-core Era

    Activ

    eISV

    Active

    ISV

    Passiv

    eISV

    Passiv

    eISV

    Platfo

    rmPo

    tential

    Platfo

    rmPo

    tential Growing gap!Growing gap!

    Fixed

    gap

    Fixed

    gap

  • 7/30/2019 Intel Threading Tools

    7/74

    7Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Parallelism occurs at Many Levels

    Instruction-level Parallelism (Hardware/Compiler)

    Vectorization (Compiler)

    Loop-level Parallelism (Open MP)

    Object-level Parallelism (TBB)

    Thread-level (pthreads, Win32 threads)

    Process-level (MPI)

  • 7/30/2019 Intel Threading Tools

    8/74

    8Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Parallel Programming Tools

    Highly optimizing

    compilers delivering

    scalable solutions

    Detect latent

    Programming bugs

    Tune for performance

    and scalability

    Visualization of applicationsand the system

    ArchitecturalAnalysis

    Introduce

    Parallelism

    Confidence /Correctness

    Optimize /Tune

  • 7/30/2019 Intel Threading Tools

    9/74

    9Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Multi-core Software Support

    Use of multiple cores

    very important to us and our customers

    Three key challenges: Scalability

    Correctness

    Ease of programming

  • 7/30/2019 Intel Threading Tools

    10/74

    10Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Multi-core Software SupportHow will we help more developers use parallelism?

    Intel Thread Checkerpinpoints latent threading errors

    Intel Thread ProfilerIntel VTune Performance

    Analyzertwo ways to gain

    precise insight into threadedcode application/lock levelor application/system level

    Intel Threading Building

    BlocksC++ template task programming libraryWe help make it easier

  • 7/30/2019 Intel Threading Tools

    11/74

    11Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    A Generic Development CycleAnalyze

    VTune Performance Analyzer

    Design (Introduce Threads)Intel Performance libraries: IPP and MKL

    OpenMP* (Intel Compiler)

    [Explicit threading (Win32*, Pthreads*)]

    Threading Building Blocks (C++ templates)

    Debug for correctness

    Intel Thread Checker, Message Checker

    Intel Debugger

    Tune for performanceIntel Compilers

    Intel Thread Profiler, Trace Analyzer

    VTune Performance Analyzer

    *Other names and brands may be claimed as the property of others

    E l P i N b G ti

  • 7/30/2019 Intel Threading Tools

    12/74

    12Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Example: Prime Number Generation

    Outer loop

    steps throughnumbers thatmay be prime

    TestForPrimetests factorsofnumber

    #include const long N = 18;long primes[N], number_of_primes = 0;main(){

    printf( "Determining primes from 1-%d \n", N );primes[ number_of_primes++ ] = 2; // special case

    for (long number = 3; number

  • 7/30/2019 Intel Threading Tools

    13/74

    13Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Prime Number GenerationCase Study

    FindPrimes(start,end)

    TestForPrimes(number)

    Increment Count

    NO

    YES

    ShowProgress

    main

    See backup foils for system configuration

  • 7/30/2019 Intel Threading Tools

    14/74

    14Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    AnalysisWhere to thread?

  • 7/30/2019 Intel Threading Tools

    15/74

    15Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    AnalysisWhat is the expected speedup?

    Scaling(2P)=100/(99.7/2+0.3)= ~1.99X

  • 7/30/2019 Intel Threading Tools

    16/74

    16Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    AnalysisWhich threading model?

    Rapid prototyping with OpenMP*Fork-join parallelism

    Parallel Regions

    Master

    Thread

    *Other names and brands may be claimed as the property of others

  • 7/30/2019 Intel Threading Tools

    17/74

    17Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Design( Introduce Threads)

    #pragma omp parallel forfor( int i = start; i

  • 7/30/2019 Intel Threading Tools

    18/74

    18Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    #pragma omp parallel forfor( int i = start; i

  • 7/30/2019 Intel Threading Tools

    19/74

    19Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    OpenMP Contents

    OpenMP constructs fall into five categories Runtime functions/environment variables

    Parallel regions

    Worksharing

    Data environment Synchronization

    OpenMP is fundamentally the same betweenFortran and C/C++

  • 7/30/2019 Intel Threading Tools

    20/74

    20Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    A Few Syntax Details to GetStarted

    Most of the constructs in OpenMP are compilerdirectives or pragmas

    For C and C++, the pragmas take the form:#pragma omp construct [clause [clause]]

    For Fortran, the directives take one of the forms:C$OMP construct [clause [clause]]

    !$OMP construct [clause [clause]]

    *$OMP construct [clause [clause]]

    Header file or Fortran 90 module#include omp.huse omp_lib

  • 7/30/2019 Intel Threading Tools

    21/74

    21Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    24 Library Routines

    Runtime environment routines:

    Modify/check the number of threadsomp_[set|get]_num_threads()

    omp_get_thread_num()

    omp_get_max_threads()

    Are we in a parallel region?omp_in_parallel()

    How many processors in the system?

    omp_get_num_procs()

    Explicit locksomp_[set|unset]_lock()

    And many more...

  • 7/30/2019 Intel Threading Tools

    22/74

    22Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Design (Introduce Threads)

    Scaling of 1.4X

  • 7/30/2019 Intel Threading Tools

    23/74

    23Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Debug for Correctness

  • 7/30/2019 Intel Threading Tools

    24/74

    24Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Data Race

    Time

    Thread 1 Thread 2

    The updates to number_of_primes are not atomic!

    number_of_primes++ number_of_primes++

  • 7/30/2019 Intel Threading Tools

    25/74

    25Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Data Race

    Thread 1 Thread 2

    The updates to number_of_primes are not atomic!

    t1 = LOAD n_of_p

    t1++

    STORE n_of_p = t1

    t2 = LOAD n_of_p

    t2++

    STORE n_of_p = t2

    Time

  • 7/30/2019 Intel Threading Tools

    26/74

    26Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Data Race

    Thread 1 Thread 2

    Lose one increment to number of primes!

    1: t1= LOAD n_of_p

    2: t2 = LOAD n_of_p3: t1++

    4: t2++

    5: STORE n_of_p = t2

    6: STORE n_of_p = t1

    Time

  • 7/30/2019 Intel Threading Tools

    27/74

    27Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Data Race

    Thread 1 Thread 2

    Add synchronization to remove data race

    CRITICAL {

    t1= LOAD n_of_p

    t1++

    STORE n_of_p = t1

    }

    CRITICAL {

    t2 = LOAD n_of_p

    t2++

    STORE n_of_p = t2

    }

    Time

  • 7/30/2019 Intel Threading Tools

    28/74

    28Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Thread CheckerCreate Correct Threads Faster

    Key Features Detects challenging data races and deadlocks

    Pinpoints errors to the source code line

    Binary Instrumentation; Works on standarddebug builds without recompiling

    Recommends modules to instrument by usage(windows* product)

    Scriptable interface for test environmentintegration

    IA32/EM64T, C/C++/Fortran, GUI CLI & RDC onWindows, RDC & CLI on Linux, Win32threads/pthreads

    * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

    Intel Confidential NDA Required

  • 7/30/2019 Intel Threading Tools

    29/74

    29Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Debug for Correctness

    Intel Thread Checker pinpoints notorious threading bugs like data races, stallsand deadlocks

    Intel Thread Checker

    VTune Performance Analyzer

    +DLLs (Instrumented)

    (Instrumented)

  • 7/30/2019 Intel Threading Tools

    30/74

    30Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    How Does Thread Checker Work?

    Thread 1

    n_of_p++

    Lock(L);

    Unlock(L);

    Thread 2

    n_of_p++;

    Lock(L);

    Unlock(L);

    Time

  • 7/30/2019 Intel Threading Tools

    31/74

    31Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    How Does Thread Checker Work?

    Thread 1

    n_of_p++

    Lock(L);

    Unlock(L);

    Thread 2

    n_of_p++;

    Lock(L);

    Unlock(L);

    record read(n_of_p)

    record lock(L)

    record unlock(L)record lock(L)

    record unlock(L)

    Use binary instrumentation

    record write(n_of_p)

    record read(n_of_p)record write(n_of_p)

    Time

  • 7/30/2019 Intel Threading Tools

    32/74

    32Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    How Does Thread Checker Work?

    Thread 1

    n_of_p++

    Lock(L);

    Unlock(L);

    Thread 2

    n_of_p++

    Lock(L);

    Unlock(L);

    Analysis reveals thatsegment 1happens

    beforesegment 2. So, no data race problem.

    Segment 1

    Segment 2

    Analysis based on [Lamport 1978]

    Time

  • 7/30/2019 Intel Threading Tools

    33/74

    33Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    How Does Thread Checker Work?

    Thread 1

    n_of_p++

    Thread 2

    n_of_p++

    Segment 1 Segment 2

    With no locks, neither segment happens beforethe other. So there is a data race!

    Time

  • 7/30/2019 Intel Threading Tools

    34/74

    34Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Debug for Correctness

  • 7/30/2019 Intel Threading Tools

    35/74

    35Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Debug for Correctness

  • 7/30/2019 Intel Threading Tools

    36/74

    36Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Debug for Correctness

    #pragma omp parallel forfor( int i = start; i

  • 7/30/2019 Intel Threading Tools

    37/74

    37Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Debug for Correctness

  • 7/30/2019 Intel Threading Tools

    38/74

    38Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Thread ProfilerOptimize Threads Faster

    Key Features Identifies synchronization objects that

    impact performance

    Highlights thread workload imbalances

    Shows the number of cores utilized

    Pinpoints issues to the source code line

    Maximizes application time spent in parallelregions

    IA32/EM64T, C/C++/Fortran, GUI RDC & CLI

    on Windows, RDC & CLI on Linux, Win32threads/pthreads

    * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.

    Intel Confidential NDA Required

  • 7/30/2019 Intel Threading Tools

    39/74

    39Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance- Combination View

  • 7/30/2019 Intel Threading Tools

    40/74

    40Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    Threads are not

    balanced

  • 7/30/2019 Intel Threading Tools

    41/74

    41 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    Threads are not

    balanced

    Thread 1Thread 2

    Reschedule from block to block cyclic

  • 7/30/2019 Intel Threading Tools

    42/74

    42 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    Threads are not

    balanced

    Thread 1

    Thread 2

  • 7/30/2019 Intel Threading Tools

    43/74

    43 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    Balance the workload for the threads

    #pragma omp parallel for schedule (static, 8)for( int i = start; i

  • 7/30/2019 Intel Threading Tools

    44/74

    44 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    Balance the workload for the threads

    Scaling achieved is 1.77XScaling achieved is 1.77X

  • 7/30/2019 Intel Threading Tools

    45/74

    45 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune For Performance balanced

  • 7/30/2019 Intel Threading Tools

    46/74

    46 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune For Performance Object focus

    T F P f Obj t Vi

  • 7/30/2019 Intel Threading Tools

    47/74

    47 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune For Performance Object View

    T f P f

  • 7/30/2019 Intel Threading Tools

    48/74

    48 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    P f R i it d

  • 7/30/2019 Intel Threading Tools

    49/74

    49 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    #pragma omp parallel for schedule (static,8)for( int i = start; i

  • 7/30/2019 Intel Threading Tools

    50/74

    50 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Performance Re-visited

    Lets look at the OpenMP locks

    #pragma omp parallel for schedule (static,8)for( int i = start; i

  • 7/30/2019 Intel Threading Tools

    51/74

    51 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    All locks have been taken care of

    Final Scaling 1.86

    Tune for Performance

  • 7/30/2019 Intel Threading Tools

    52/74

    52 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Tune for Performance

    I t l Th di B ildi Bl k

  • 7/30/2019 Intel Threading Tools

    53/74

    53 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Threading Building Blocks

    It is a C++ template library

    Compatible with other threading packages

    Portable across Linux*, Mac OS*, and Windows*

    You specify task patterns instead of threads

    Library maps your logical tasks onto physical threads,efficiently using cache and balancing load

    Targets threading for robust performance

    Does well withnested parallelism

    Emphasizes scalable, data parallelprogramming

    Generic programming enables distribution of broadly-useful high-quality algorithms and data structures.

    Intel Threading Building Blocks

  • 7/30/2019 Intel Threading Tools

    54/74

    54 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Threading Building Blocks(C++ template library)

    Low-Level Synchronization Primitivesatomicmutex

    spin_mutexqueuing_mutexspin_rw_mutex

    queuing_rw_mutex

    Generic Parallel Algorithms

    parallel_forparallel_reduce

    pipelineparallel_sortparallel_whileparallel_scan

    Concurrent Containers

    concurrent_hash_mapconcurrent_queueconcurrent_vector

    Task Scheduler

    Timingtick_count

    Memory Allocationcache_aligned_allocator

    scalable_allocator

    Problem

  • 7/30/2019 Intel Threading Tools

    55/74

    55 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Problem

    Gaining performance from multicore requires parallel programming

    Even a simple parallel for is tricky for a non-expert to write well withthreads.

    Two aspects to parallel programming

    Correctness: avoiding race conditions and deadlock

    Performance: efficient use of resources

    Hardware threads (match parallelism to hardware threads)

    Memory space (choose right evaluation order)

    Memory bandwidth (reuse cache)

    Limitations

  • 7/30/2019 Intel Threading Tools

    56/74

    56 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Limitations

    Intel TBB is not intended for

    I/O bound processing

    Hard real-time processing

    However, it is compatible with other threading packages

    It can be used in concert with other threading packages such asnative threads and OpenMP.

    Serial Example

  • 7/30/2019 Intel Threading Tools

    57/74

    57 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Serial Example

    static void SerialUpdateVelocity() {

    for( int i=1; i

  • 7/30/2019 Intel Threading Tools

    58/74

    58 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Parallel Version

    struct UpdateVelocityBody {

    void operator()( constblocked_range& range ) const { int end = range.end();

    for( int i= range.begin(); i

  • 7/30/2019 Intel Threading Tools

    59/74

    59 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Range is Generic

    Requirements for parallel_for Range

    Library provides blocked_range and blocked_range2d

    You can define your own ranges

    Puzzle: Write parallel quicksort using parallel_for, without recursion!

    R::R (const R&) Copy constructor

    R::~R() Destructor

    bool R::empty() const True if range is empty

    bool R::is_divisible() const True if range can be partitioned

    R::R (R& r, Split) Split r into two subranges

    How this works on blocked range2d

  • 7/30/2019 Intel Threading Tools

    60/74

    60 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    tasks available to thieves

    How this works onblocked_range2d

    Split range...

    .. recursively...

    ...until grainsize.

    Two Execution Orders

  • 7/30/2019 Intel Threading Tools

    61/74

    61 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Two Execution Orders

    Depth First

    (stack)

    Small space

    Excellent cache locality

    No parallelism

    Breadth First

    (queue)

    Large space

    Poor cache locality

    Maximum parallelism

    Work Dept First; Steal Breadth First

  • 7/30/2019 Intel Threading Tools

    62/74

    62 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Work Dept First; Steal Breadth First

    L1

    L2

    victim thread

    Best choice for theft!big piece of workdata far from victims hot data.

    Second best choice.

    Summary of Intel Threading Building Blocks

  • 7/30/2019 Intel Threading Tools

    63/74

    63 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Summary of Intel Threading Building Blocks

    It is a library

    You specify task patterns, not threads

    Targets threading for robust performance

    Does well withnested parallelism

    Compatible with other threading packages

    Emphasizes scalable, data parallelprogrammingGeneric programming enables distribution of broadly-useful high-quality algorithms and data structures.

    Intel Software Development Products

  • 7/30/2019 Intel Threading Tools

    64/74

    64 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Software Development ProductsIntroduction

    Intel Software Development Products help you use the power of your

    software to unleash the full potential of the hardwarehttp://www.intel.com/software/products

    Performance Extract the maximum application performance from Intel processor based systems

    Simplifies taking advantage of new capabilities like multi-core and Intel EM64T

    Compatibility Compatible with popular development environments including Microsoft VisualStudio on Windows*, GCC on Linux*, XCode on Mac OS*

    32-bit and 64-bit processor support with one package

    Linux*, Windows*, Mac OS* support

    Support

    One year of unlimited technical support and upgrades with purchase Get answers from the engineers that know how to develop software on IntelArchitecture

    Intel Software Development Products offer

    Performance, Compatibility, and Support

    Intel Software Development Products offer

    Performance, Compatibility, and Support

    Intel Premier SupportRegistering for support waseasy, and we value the security

    f k i th t I t l i th t

  • 7/30/2019 Intel Threading Tools

    65/74

    65 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Intel Premier Support

    Purchase of Intel Software Development Products includes one yearof unlimited premier support

    Intel Premier Support includes:

    Primary support for all Intel Software Development Products

    Online access to Intel Premier Support Website

    Issue submission & tracking

    Product updates & related downloads

    FAQs, user forums, & other proactive notices

    of knowing that Intel is there tohelp, even though we haventneeded it so far.

    Rob Hoffmann - Director ofMarketing, NewTek, Inc.

    Support Comes Directly fromExperts in Software Development at Intel

    Support Comes Directly fromExperts in Software Development at Intel

    Learn moreIntel books

  • 7/30/2019 Intel Threading Tools

    66/74

    66 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Programming with Hyper-Threading TechnologyHow to Write Multithreaded Software forIntel IA-32 Processors

    Multi-Core ProgrammingIncreasing Performance throughSoftware Multithreading

    The SoftwareOptimization CookbookHigh-Performance Recipes for theIntel Architecture

    VTune PerformanceAnalyzer EssentialsMeasurement and Tuning

    Techniques for SoftwareDevelopers

    Intel IntegratedPerformance PrimitivesHow to Optimize Software

    Applications Using Intel IPP

    Scientific Computing onItanium-based Systems

    S

  • 7/30/2019 Intel Threading Tools

    67/74

    67 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Summary

    Applications must consider Multi-core Architectures

    Intel Software Development Products offer

    Performance, Compatibility, and Support

    Applications take advantage of Multi-core by using multipleexecution threads

    competitive pressures = demand for parallel applications

    Intel Software Development Products help deliver Multi-coreapplication performance

    Intel C++ and Fortran Compilers

    Intel Vtune Performance Analyzers

    Intel Performance Libraries

    Intel Thread Checker and Intel Thread Profiler

    Intel Threading Tools offer unique features for developers,considerably reducing the effort to fully exploit multi-corearchitectures

  • 7/30/2019 Intel Threading Tools

    68/74

    68 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Amdahls Law

  • 7/30/2019 Intel Threading Tools

    69/74

    69 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    P/n

    Describes the upper bound of parallel speedup (scaling)

    (1-P)

    P

    T

    serial

    (1-P)

    n = number of processors

    Tparallel = {(1-P) + P/n}Tserial + O

    Scaling = Tserial / Tparallel

    Amdahl s Law

    Amdahls Law

  • 7/30/2019 Intel Threading Tools

    70/74

    70 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Describes the upper bound of parallel speedup (scaling)

    (1-P)

    P

    T

    serial

    (1-P)

    P/2 n = number of processors

    Tparallel = {(1-P) + P/n}Tserial + O

    Scaling = Tserial / Tparallel

    0.5 +0.5 + 0.250.25

    1.0/0.75 =1.0/0.75 = 1.331.33

    n = 2n = 2

    Amdahl s Law

    Amdahls Law

  • 7/30/2019 Intel Threading Tools

    71/74

    71 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Describes the upper bound of parallel speedup (scaling)

    (1-P)

    P

    T

    serial

    (1-P)

    n = number of processors

    Tparallel = {(1-P) + P/n}Tserial + O

    Scaling = Tserial / Tparallel

    n =n =

    P/

    0.5 +0.5 + 0.00.0

    1.0/0.5 =1.0/0.5 = 2.02.0

    Amdahl s Law

    More parallel code improves scalingMore parallel code improves scalingMore parallel code improves scalingMore parallel code improves scaling

    Amdahls Law

  • 7/30/2019 Intel Threading Tools

    72/74

    72 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    Amdahl s Law

    Runtime = serial + parallel

    With perfect scaling:

    Runtime = serial + parallel/P

    To get 50x speedup on 65processors, serial time must be

    less than 0.5%

    Emerging class of highly parallel workloads: games, mediaprocessing, personal data search, and more

    Ideal performance scaling, varying ser ial

    percentage

    0

    00

    00

    00

    00

    00

    00

    00

    0 0 00 00 00 00 00 00 00

    processors

    speedup

    . %000

    . %000

    . %000

    . %000

    . %0000

    . %0000

    Examples capabilities and performance

    Intels ThreadingI

    Taking advantageof Intels broad

    f ft

  • 7/30/2019 Intel Threading Tools

    73/74

    73 Copyright 2006, Intel Corporation. All rights reserved.*Other brands and names are the property of their respective owners

    a p es capa es a d pe o a ce

    Adobe Premiere Pro 2.0

    Professional video editing High Definition playback scales to 4 or more threads

    Pixar - RenderMan Pro Server

    Production rendering

    Scale to 5x speedup on 8 cores

    MainConcept H.264/AVC

    Advanced Video Encoding

    Scale to 3.25x speedup on 4 cores

    Thomson and Fraunhofer IIS Mp3 and mp3 surround codec cross platform

    1.7x speedup on 2 cores

    Threading can bea challenging task,

    and timeconsuming as

    well. Intel reallyunderstands thisand has greatproducts for

    softwaredevelopers.

    Bill HenslerVice President of

    engineering Video Solutionsfor Adobe

    gTools have

    accelerated ourdevelopment cycledramatically. The

    Intel ThreadingTools are now an

    integral part of ourdevelopment

    process.

    Dana BataliDirector of Pixar

    RenderManDevelopment

    In ourdevelopment

    efforts, we hugelydepend on Intel

    tools. We use theIntel compiler, theThreading Tools,and also VTune to

    make sure ourcodecs exploit theIntel platform as

    best as possible

    Markus MonigCEO

    MainConcept

    range of softwaredevelopment toolsfor threading andtheir extensive

    expertise, we wereable to see a 70%performanceincrease when ourmp3 and mp3SURROUND codecsran on Intel

    Centrino Duomobile technology.This results in agreater listeningexperience forconsumers.

    Rocky Caldwell, General Managermp3 Licensing, Thomson

    Customer Testimonials

  • 7/30/2019 Intel Threading Tools

    74/74