Upload
annabella-blair
View
233
Download
2
Tags:
Embed Size (px)
Citation preview
Intel Compilers 9.x on the Intel® Core Duo™
ProcessorWindows version
Intel Software College
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
2
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Objectives
At the successful completion of this module, you will be able to:
• Use key compiler optimization switches
• Optimize software for the Architecture
• Enhance performance with vectorization and other techniques
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
3
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Agenda
Introduction
Compiler Switches
Dual Core
Vectorization
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
4
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Key to optimizing: Intel® Core™ Duo
Exploiting Architectural Power requires Sophisticated Compilers
Optimal use of
• Registers & functional units
• Dual-Core/Multi-processor
• SSE instructions
• Cache architecture
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
5
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
C++ Compatibility with Microsoft
Source & binary compatible with VC2003 with /Qvc71,
Source & binary compatible with w/ VC 2005 under /Qvc8.
Microsoft* & Intel OpenMP binaries are not compatible. • Use the one compiler for all modules compiled with OpenMP
For more information, refer to the User’s Guide
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
6
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Use Intel Compiler in Microsoft IDEC++
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
7
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Agenda
Introduction
Compiler Switches• Intel® C++ compiler
Dual Core
Vectorization
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
8
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
General Optimizations
Windows* Linux* Mac*
/Od -O0 -O0 Disables optimizations
/Zi -g -g Creates symbols
/O1 -O1 -O1 Optimize for Binary Size: Server Code
/O2 -O2 -O2 Optimizes for speed (default)
/O3 -O3 -O3 Optimize for Data Cache:
Loopy Floating Point Code
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
9
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Multi-pass Optimization Interprocedural Optimizations (IPO)
ip: Enables interproceduraloptimizations for single file compilation
ipo: Enables interproceduraloptimizations across files
Can inline functions in separate files
Enhances optimization when used in combination with other compiler features
Windows* Linux* Mac*
/Qip -ip -ip
/Qipo -ipo -ipo
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
10
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Multi-pass Optimization - IPOUsage: Two-Step Process
Linking
Windows* icl /Qipo main.o func1.o func2.o
Linux* icc -ipo main.o func1.o func2.o
Mac* icc -ipo main.o func1.o func2.o
Pass 1
Pass 2
virtual .o
executable
Compiling
Windows* icl -c /Qipo main.c func1.c func2.c
Linux* icc -c -ipo main.c func1.c func2.c
Mac* icc -c -ipo main.c func1.c func2.c
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
11
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Profile Guided Optimizations (PGO)
Use execution-time feedback to guide many other compiler optimizations
Helps I-cache, paging, branch-prediction
Enabled optimizations:
• Basic block ordering
• Better register allocation
• Better decision of functions to inline
• Function ordering
• Switch-statement optimization
• Better vectorization decisions
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
12
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Instrumented Compilation(Mac*/Linux*) icc -prof_gen[x] prog.c(Windows*) icl -Qprof_gen[x] prog.c
Instrumented ExecutionRun program on a typical dataset
Feedback Compilation(Mac/Linux) icc -prof_use prog.c(Windows) icl -Qprof_use prog.c
DYN file containingdynamic info: .dyn
Instrumented executable
Merged DYNsummary file: .dpiDelete old dyn files if you do not want the info included
Step 1
Step 2
Step 3
Multi-pass OptimizationPGO: Three-Step Process
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
13
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Agenda
Introduction
Compiler Switches
Dual Core• Auto Parallelization• OpenMP• Threading Diagnostics
Vectorization
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
14
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Auto-parallelization
Auto-parallelization: Automatic threading of loops without having to manually insert OpenMP* directives.
• Compiler can identify “easy” candidates for parallelization, but large applications are difficult to analyze.
Windows* Linux* Mac*
/Qparallel -parallel -parallel
/Qpar_report[n] -par_report[n] -par_report[n]
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
15
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
OpenMP* Threading Technology
Pragma based approach to parallelism
Usage:OpenMP switches: -openmp : /Qopenmp
OpenMP reports: -openmp-report : /Qopenmp-report
#pragma omp parallel for for (i=0;i<MAX;i++) A[i]= c*A[i] + B[i];
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
16
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
OpenMP: Workqueueing Extension Example
Intel Compiler’s Workqueuing extension
• Create Queue of tasks…Works on…• Recursive functions• Linked lists, etc.
#pragma intel omp parallel taskq shared(p){ while (p != NULL) {#pragma intel omp task captureprivate(p)
do_work1(p); p = p->next; }}
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
17
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Parallel Diagnostics
Source Instrumentation for Intel Thread Checker
• Allows thread checker to diagnose threading correctness bugs
• To use tcheck/Qtcheck you must have Intel Thread Checker installed
• See thread checker documentation• http://www.intel.com/support/
performancetools/sb/CS-009681.htm
Windows* Linux* Mac*
/Qtcheck -tcheck No support
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
18
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Agenda
Introduction
Compiler Switches
Dual Core
Vectorization• SSE & Vectorization• Vectorization Reports• Explanations of a few specific vectorization inhibitors
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
19
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
SIMD – SSE, SSE2, SSE3 Support
16x bytes
8x words
4x dwords
2x qwords
1x dqword
4x floats
2x doubles
MMX*
SSE
SSE2SSE3
* MMX actually used the x87 Floating Point Registers - SSE, SSE2, and SSE3 use the new SSE registers
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
20
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
SIMD FP using AOS format*
Thread Synchronization
Video encoding
Complex arithmetic
FP to integer conversions
HADDPD, HSUBPD
HADDPS, HSUBPS
MONITOR, MWAIT
LDDQU
ADDSUBPD, ADDSUBPS,
MOVDDUP, MOVSHDUP,
MOVSLDUP
FISTTP
* Also benefits Complex and Vectorization
SSE3 Instructions
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
21
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Using SSE3 - Your Task: Convert This…
128-bit Registers
A[0]
B[0]
C[0]
+ + + +
A[1]
B[1]
C[1]
not used not used not used
not used not used not used
not used not used not used
for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
22
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
… Into This …
128-bit Registers
A[3] A[2]
B[3] B[2]
C[3] C[2]
+ +
A[1] A[0]
B[1] B[0]
C[1] C[0]
+ +
for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
23
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Compiler Based VectorizationProcessor Specific
Description Use Windows* Linux* Mac*
Generate instructions and optimize for Intel® Pentium® 4 compatible processors including MMX, SSE and SSE2.
W /QxW -xW Does not apply
Generate instructions and optimize for Intel® processors with SSE3 capability including Core Duo. These processors support SSE3 as well as MMX,SSE and SSE2.
P /QxP/QaxP
-xP,-axP
Vector-ization occurs by default
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
24
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Compiler Based Vectorization Automatic Processor Dispatch – ax[?]
Single executable
• Optimized for Intel® Core Duo processors and generic code that runs on all IA32 processors.
For each target processor it uses:
• Processor-specific instructions
• Vectorization
Low overhead
• Some increase in code size
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
25
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Why Loops Don’t Vectorize
Independence
• Loop Iterations generally must be independent
Some relevant qualifiers:
• Some dependent loops can be vectorized.
• Most function calls cannot be vectorized.
• Some conditional branches prevent vectorization.
• Loops must be countable.
• Outer loop of nest cannot be vectorized.
• Mixed data types cannot be vectorized.
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
26
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Why Didn’t My Loop Vectorize?
Windows* Linux* Macintosh*
-Qvec_reportn -vec_reportn -vec_reportn
Set diagnostic level dumped to stdout
n=0: No diagnostic information
n=1: (Default) Loops successfully vectorized
n=2: Loops not vectorized – and the reason why not
n=3: Adds dependency Information
n=4: Reports only non-vectorized loops
n=5: Reports only non-vectorized loops and adds dependency info
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
27
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Why Loops Don’t Vectorize
• “Existence of vector dependence”
• “Nonunit stride used”
• “Mixed Data Types”
• “Unsupported Loop Structure”
• “Contains unvectorizable statement at line XX”
• There are more reasons loops don’t vectorize but we will disucss the reasons above
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
28
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
“Existence of Vector Dependency”
Usually, indicates a real dependency between iterations of the loop, as shown here:
for (i = 0; i < 100; i++) x[i] = A * x[i + 1];
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
29
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Defining Loop Independence
Iteration Y of a loop is independent of when (or whether) iteration X occurs.
int a[MAX], b[MAX];
for (j=0;j<MAX;j++) {
a[j] = b[j];
}
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
30
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
“Nonunit stride used”
for (I=0;I<=MAX;I++)
for (J=0;J<=MAX;J++) {
c[I][J]+=1; // Unit Stride
c[J][I]+=1; // Non-Unit
A[J*J]+=1; // Non-unit
A[B[J]]+=1; // Non-Unit
if (A[MAX-J])=1 last1=J;}// Non-Unit
End Result: Loading Vector may take more cycles than executing operation sequentially.
Mem
ory
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
31
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
“Mixed Data Types”
An example:
int howmany_close(double *x, double *y)
{ int withinborder=0;
double dist;
for(int i=0;i<MAX;i++) {
dist=sqrtf(x[i]*x[i] + y[i]*y[i]);
if (dist<5) withinborder++;
}
}
Mixed data types are possible – but complicate things• i.e.: 2 doubles vs 4 ints per SIMD register
Some operations with specific data types won’t work
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
32
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
“Unsupported Loop Structure”
Example:struct _xx {
int data;
int bound; } ;
doit1(int *a, struct _xx *x) {
for (int i=0; i<x->bound; i++) a[i] = 0;
An unsupported loop structure means the loop is not countable, or the compiler for whatever reason can’t construct a run-time expression for the trip count.
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
33
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
“Contains unvectorizable statement”
for (i=1;i<nx;i++) {
B[i] = func(A[i]); }
128-bit Registers128-bit Registers
A[3] A[2]
B[3] B[2]
func func
A[1] A[0]
B[1] B[0]
func func
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
34
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Reference
Web-based and classroom training
• www.intel.com/software/college
White papers and technical notes
• www.intel.com/ids
• www.intel.com/software/products
Product support resources
• www.intel.com/software/products/support
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
35
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
36
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Activity 1 - raytrace2: Initial Compilation
Set up environment and compile with both Microsoft* Visual C++ .NET (MSVC*) and Intel® C++ Compiler (icl)
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
37
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Activity 2 - raytrace2: O3 Compilation
Use Intel compiler’s High Level Optimizer (-O3) for loop centric codes
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
38
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Activity 3 - raytrace2: IPO Compilation
Use Intel compiler’s Inter-procedural Optimization (-Qipo)
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
39
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Activity 4 - raytrace2: PGO Compilation
Use Intel compiler’s Profile-guided Optimization
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
40
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Activity 5 – raytrace2: Vectorization
Use Intel compiler’s Vectorization optimization (-QxP)
Copyright © 2006, Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
41
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Activity 6 - raytrace2: Putting it all together
Use all previous optimizations in tandem (-O3, -QxP, IPO and PGO)