Upload
angelina-dennis
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Intel® Composer XE for HPC customers
July 2010July 2010
Denis Makoshenko, Intel, SSG.Denis Makoshenko, Intel, SSG.
2Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® C++ and Fortran Composer XE
04/19/23 2
Intel® C++ Composer XE components
Intel® C++ Compiler XE 12.0
Intel® Debugger with parallel debugging support (Linux*)
Intel® Parallel Debugger Extension (Windows*)
12.0
Intel® Math Kernel Library ( Intel® MKL ) 10.3
Intel® Integrated Performance Primitives (Intel® IPP) 7.0
Intel® Threading Building Blocks (Intel® TBB) 3.0
Intel® Composer XE for Fortran components
Intel® Fortran Compiler XE ( Linux*, MacOS* )
Intel® Visual Fortran Compiler XE (Windows*)
12.0
Intel® Debugger with parallel debugging support (Linux*)
Intel® Parallel Debugger Extension (Windows*)
12.0
Intel® Math Kernel Library ( Intel® MKL ) 10.3
3Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Important Notes
• Some of the names used here for features of the future compiler and library products are not finalized yet
• Feature set and functionality might change (slightly) for final product version
4Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Compiler Architecture
Profiler
C++Front End
Interprocedural analysis and optimizations: inlining, constant prop, whole program detect, mod/ref, points-to
Loop optimizations: data deps, prefetch, vectorizer, unroll/interchange/fusion/dist, auto-parallel/OpenMP
Global scalar optimizations: partial redundancy elim, dead store elim, strength reduction, dead code elim
Code generation: instruction selection, scheduling, register allocation, code generation
FORTRANFront End
Disambiguation:types, array,
pointer, structure, directives
5Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Interprocedural OptimizationExtends optimizations across file boundaries
Compile & OptimizeCompile & Optimize
Compile & OptimizeCompile & Optimize
Compile & OptimizeCompile & Optimize
Compile & OptimizeCompile & Optimize
file1.c
file2.c
file3.c
file4.c
Without IPOWithout IPO
Compile & OptimizeCompile & Optimize
file1.c
file4.c file2.c
file3.c
With IPOWith IPO
-ip Only between modules of one source file
-ipo Modules of multiple files/whole application
6Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Profile-Guided Optimizations (PGO)
• Use execution-time feedback to guide (final) optimization
• Helps I-cache, paging, branch-prediction• Enabled optimizations:
– Basic block ordering– Better register allocation– Better decision on which functions to inline– Function ordering– Switch-statement optimization
7Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Instrumented Compilationicc -prof_gen prog.c
Instrumented Executionprog.exe (on a typical dataset)
Feedback Compilationicc -prof_use prog.c
DYN file containingdynamic info: .dyn
Instrumented executable: prog.exe
Merged DYNsummary file: .dpiDelete old dyn files unless you want their info included
Step 1
Step 2
Step 3
PGO Usage: Three Step Process
8Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Some Generic Features
• Compatibility to standards ( ANSI C, ISO C++, ANSI C99, Fortran95, Fortran2003 )
• Compatibility to leading open-source tools ( ICC vs. GCC, IDB vs GBD, ICL to CL, …)
• OpenMP support and Automatic Parallelization• Sophisticated optimizations
– Profile-guided optimization– Multi-file inter-procedural optimization
• Detailed compilation report generation• Support of other Intel® tools
9Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
A few General SwitchesFunctionality Linux*
Disable optimization -O0
Optimize for speed (no code size increase) -O1
Optimize for speed (default) -O2
High-level optimizer ( e.g. loop unroll) -O3
Vectorization for x86, -xSSE2 is default <many options>
Aggressive optimizations (e.g. -ipo, -O3, -no-prec-div, -static -xHost for x86 Linux*)
-fast
Create symbols for debugging -g
Generate assembly files -S
Optimization report generation /opt-report
OpenMP support -openmp
Automatic parallelization for OpenMP* threading -parallel
10Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Optimization Report Options
• opt_report– generate an optimization report to stderr ( or file )
• opt_report_file <file>– specify the filename for the generated report
• opt_report_phase <phase_name>– specify the phase that reports are generated against
• opt_report_routine <name>– reports on routines containing the given name
• opt_report_help– display the optimization phases available for reporting
• vec-report<level>– Generate vectorization report
11Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
New Features of Intel® Composer XE
– GAP – Guided Automatic Parallelization– SIMD Directives
• Provides additional information to compiler to enable vectorization of loops
– Loop Profiler• Report file contains:
– Average, minimum, maximum iteration counter of loops
– Call count of routines
– Self-time and total-time of functions / loops
– Static Security Analyzer• Tool based on compiler technology to statically verify program
– C/C++ specific: Vector Notation, Cilk, C++0x Features – Fortran specific : F2003 Status, CAF, DO-CONCURRENT
12Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Memory Reference DisambiguationOptions/Directives related to Aliasing
• -alias_args[-]
• -ansi-alias[-]
• -fno-alias: No aliasing in whole program
• -fno-fnalias: No aliasing within single units
• -restrict (C99): -restrict and restrict attribute– enables selective pointer disambiguation
There are many more options – different for Windows* and There are many more options – different for Windows* and Linux*, different for C/C++ and FortranLinux*, different for C/C++ and Fortran
13Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
GAP – Guided Automatic Parallelization
Key design ideas: • Use compiler infrastructure to help developer to detect what is
blocking certain optimizations – in particular vectorization, parallelization and data transformations – and to change code correspondingly
• Very specific hints to fix problem • Not a separate tool but feature of C/C++ and Fortran compiler• Exploit multi-year experience brought into the compiler
development• Performance tuning knowledge based on dealing with
numerous applications, benchmarks and compute kernels
It is not: • Automatic vectorizer or parallelizer
• in fact, no code is generated to accelerate analysis
14Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
GAP – How it WorksSelection of most relevant Switches
Multiple compiler switches to activate and fine-tune guidance analysis
• Activate messages individually for vectorization, parallelization, data transformations or all three-guide[=level]
-guide-vec[=level]
-guide-par[=level]
-guide-data-trans[=level]
Optional argument level=1,2,3,4 controls extend of analysis; Intel Composer only supports up to level 3
• Control the source code part for which analysis is done-guide-opts=<arg>
Samples:
-guide-opts=“convert.c,'funca(int)'“
15Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Vectorization Example
void mul(NetEnv* ne, Vector* rslt
Vector* den, Vector* flux1,
Vector* flux2, Vector* num
{
float *r, *d, *n, *s1, *s2;
int i;
r=rslt->data; d=den->data;
n=num->data; s1=flux1->data;
s2=flux2->data;
for (i = 0; i < ne->len; ++i)
r[i] = s1[i]*s2[i] + n[i]*d[i];
}
GAP Messages (simplified):
1. “Use a local variable to hoist the upper-bound of loop at line 29 (variable:ne->len) if the upper-bound does not change during execution of the loop”
2. “Use “#pragma ivdep" to help vectorize the loop at line 29, if these arrays in the loop do not have cross-iteration dependencies: r, s1, s2, n, d”
-> Upon recompilation, the loop will be vectorized
The compiler guides the user on source-change and on what pragma to insertand on how to determine whether that pragma is correct for this case
16Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Data Transformation Examplestruct S3 {
int a;
int b; // hot
double c[100];
struct S2 *s2_ptr;
int d; int e;
struct S1 *s1_ptr;
char *c_p;
int f; // hot
};
peel.c(22): remark #30756: (DTRANS) Splitting the structure 'S3' into two parts will improve data locality and is highly recommended. Frequently accessed fields are 'b, f'; performance may improve by putting these fields into one structure and the remaining fields into another structure. Alternatively, performance may also improve by reordering the fields of the structure. Suggested field order:'b, f, s2_ptr, s1_ptr, a, c, d, e, c_p'. [VERIFY] The suggestion is based on the field references in current compilation …
…
for (ii = 0; ii < N; ii++){
sp->b = ii;
sp->f = ii + 1;
sp++;
}
…
17Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Privatization Directives for “pragma parallel”
• Mark variables in loops to be ‘private’ for automatic parallelization similar to what OpenMP private-clause is doing– Available for Fortran and C/C++– Used by GAP as an advice to add in when appropriate
Syntax for C/C++ : Syntax for C/C++ : #pragma parallel #pragma parallel [ clause [ [[ clause [ [,,] clause ]…]] clause ]…] where where clauseclause can be one of the following: can be one of the following: always always [[ assert assert ]] private( private( var [ var [ ::expr ] [ expr ] [ ,, var [ var [ ::expr ] ]...expr ] ]...)) lastprivate( lastprivate( var [ var [ ::expr ] [ expr ] [ ,, var [ var [ ::expr ] ]...expr ] ]...)) where varwhere var is a variable name. is a variable name. Note: In case <expr> is missing, semantic corresponds to Note: In case <expr> is missing, semantic corresponds to OpenMP 3.0OpenMP 3.0
18Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Concurrency-Safe Function Attribute Used by GAP too Windows syntax:__declspec(concurrency_safe[(profitable | cost(cycle-
count))])
Linux syntax: __attribute__((concurrency_safe[(profitable | cost(cycle-
count))]))
( In Fortran similar functionality via directive )
Semantics:__attribute__(concurrency_safe)
The function has no “unacceptable” side effects when invoked in parallel
profitable clause: The loops or blocks that contain calls to this function are profitable to parallelize.
19Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Questions?
20Copyright © 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Intel® Code Coverage Tool
Example of code coverage summary for a project. The workload applied in this
test exercised 34 of 143 blocks, representing 5 of 19 functions in 2 of 3 modules. In the file, SAMPLE.C, 4 of 5
functions were exercised
Clicking on SAMPLE.C produces a listing that highlights the code that
was exercised. In this example, the pink-highlighted code was
never exercised, the yellow was run but not exercised by any of the tests set up by the developer and
the beige was partially covered.