Upload
milagros-naranjo-herrera
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Compiladores Intel 9.x en el procesador Intel® Core Duo™
Windows version
Intel Software College
2
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Objetivos
Al término de este módulo será capaz de:
• Usar las optimizaciones importantes del compilador
• Optimizar software para la arquitectura
• Aumentar el rendimiento con vectorización y otras técnicas
3
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda
Introducción
Parámetros del Compilador
Dual Core
Vectorización
4
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Clave Para Optimizar: Intel® Core™ Duo
Explotar el poder de la arquitectura requiere compiladores sofisticados
Uso óptimo de los
• Registros y unidades funcionales
• Dual-Core/Multi-processor
• Instrucciones SSE
• Arquitectura de la cache
5
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Compatibilidad de C++ con Microsoft
Fuentes y binarios compatibles con VC2003 con /Qvc71,
Fuentes y binarios compatibles con w/ VC 2005 bajo /Qvc8.
Binarios de Microsoft* e Intel OpenMP no son compatibles. • Use el compilador adecuado para todos los módulos compilados con
OpenMP
Para mayor información, refiera al manual de usuario
6
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Usar el compilador de Intel en Microsoft IDEC++
7
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda
Introducción
Parámetros del Compilador• Compilador Intel® C++
Dual Core
Vectorización
8
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Optimizaciones Generales
Windows* Linux* Mac*
/Od -O0 -O0 Deshabilita optimizaciones
/Zi -g -g Crea símbolos
/O1 -O1 -O1 Optimiza para el tamaño del binario: Código Servidor
/O2 -O2 -O2 Optimiza para velocidad (default)
/O3 -O3 -O3 Optimiza para Caché de Datos:
Código de Punto Flotante Cíclico
9
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Optimización Multi-passOptimizaciones Interprocedurales (IPO)
ip: Habilita optimizaciones interprocedurales para
la compilación de un archivo
ipo: Habilita optimizacionesinterprocedurales a travésde archivos
Las funciones “inline” pueden estar en archivos aparte
Mejora la optimización cuando se usan en combinación con otras facilidades de compilador
Windows* Linux* Mac*
/Qip -ip -ip
/Qipo -ipo -ipo
10
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Optimización Multi-paso - IPOUso: Proceso de dos pasos
Encadenando
Windows* icl /Qipo main.o func1.o func2.o
Linux* icc -ipo main.o func1.o func2.o
Mac* icc -ipo main.o func1.o func2.o
Paso 1
Paso 2
virtual .o
ejecutable
Compilando
Windows* icl -c /Qipo main.c func1.c func2.c
Linux* icc -c -ipo main.c func1.c func2.c
Mac* icc -c -ipo main.c func1.c func2.c
11
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Optimizaciones PersonalizadasPGO (Profile Guided Optimizations)
Usa el resultado de una ejecución para guiar otras optimizaciones del compilador
Ayuda para manejo de la cache, paginación, predicción de saltos
Optimizaciones Habilitadas:
• Ordenamiento básico de bloques
• Mejor asignación de registros
• Mejor decisión de las funciones para “inline”
• Ordenamiento de funciones
• Optimización en el enunciado de parámetros
• Mejores decisiones de vectorización
12
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Compilación Instrumentada(Mac*/Linux*) icc -prof_gen[x] prog.c(Windows*) icl -Qprof_gen[x] prog.c
Ejecución InstrumentadaEjecuta el programa con un conjunto de datos típico
Compilación Retroalimentada(Mac/Linux) icc -prof_use prog.c(Windows) icl -Qprof_use prog.c
Archivo DYN que contiene información dinámica: .dyn
Ejecutable Instrumentado
Archivo de resumen DYN fusionado: .dpiElimina archivos dyn files viejos sin no queremos la info incluída
Paso 1
Paso 2
Paso 3
Optimización Multi PasosPGO: Proceso de Tres Pasos
13
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda
Introducción
Parámetros del Compilador
Dual Core• Auto Paralelización• OpenMP• Diagnósticos de Paralelización
Vectorización
14
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Auto-paralelización
Auto-paralelización: Paralelización automática de ciclos sin necesidad de insertar directivas de OpenMP*.
• El compilador puede “fácilmente” identificar candidatos a la paralelización, pero las aplicaciones grandes son difíciles de analizar.
Windows* Linux* Mac*
/Qparallel -parallel -parallel
/Qpar_report[n] -par_report[n] -par_report[n]
15
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Teconolgía de Paralelización OpenMP*
Aproximación basada en Pragma para paralelizar
Uso:Parámetros de OpenMP : -openmp : /Qopenmp
Reportes de OpenMP : -openmp-report : /Qopenmp-report
#pragma omp parallel for for (i=0;i<MAX;i++) A[i]= c*A[i] + B[i];
16
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
OpenMP: Ejemplo de Extensión “Workqueueing”
Extensión “Workqueuing” del compilador de Intel
• Crea Cola de tareas…Trabaja en…• Funciones recursivas• Listas encadenadas, etc.
#pragma intel omp parallel taskq shared(p){ while (p != NULL) {#pragma intel omp task captureprivate(p)
do_work1(p); p = p->next; }}
17
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Diagnósticos (Parallel Diagnostics)
Instrumentación del Fuente para Intel Thread Checker
• Permite al thread checker diagnosticar errores en la paralelización
• Para usar tcheck/Qtcheck es necesario tener instalado el Intel Thread Checker
• Ver documentación del thread checker
• http://www.intel.com/support/performancetools/sb/CS-009681.htm
Windows* Linux* Mac*
/Qtcheck -tcheck No support
18
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda
Introducción
Parámetros del Compilador
Dual Core
Vectorización• SSE y Vectorización• Reportes de Vectorización• Explicaciones de algunos inhibidores de vectorización específicos
19
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Soporte SIMD – SSE, SSE2, SSE3
16x bytes
8x words
4x dwords
2x qwords
1x dqword
4x floats
2x doubles
MMX*
SSE
SSE2SSE3
* MMX actualmente utiliza los Registros de Punto Flotante del x87 - SSE, SSE2, y SSE3 usan los nuevos registros SSE
20
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
SIMD FP usando formato AOS*
Sincronización de hilos
Codificación de vídeo
Aritmética compleja
Conversiones FP a entero
HADDPD, HSUBPD
HADDPS, HSUBPS
MONITOR, MWAIT
LDDQU
ADDSUBPD, ADDSUBPS,
MOVDDUP, MOVSHDUP,
MOVSLDUP
FISTTP
* AOS ( Array Of Structures) Also benefits Complex and Vectorization
Instrucciones SSE3
21
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Usando SSE3 - Tarea: Convertir Esto…
128-bit Registers
A[0]
B[0]
C[0]
+ + + +
A[1]
B[1]
C[1]
no usado no usado no usado
no usado no usado no usado
no usado no usado no usado
for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];
22
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
… En Esto …
128-bit Registers
A[3] A[2]
B[3] B[2]
C[3] C[2]
+ +
A[1] A[0]
B[1] B[0]
C[1] C[0]
+ +
for (i=0;i<=MAX;i++) c[i]=a[i]+b[i];
23
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Vectorización Basada en el CompiladorEspecífica del Procesador
Descripción Uso Windows* Linux* Mac*
Genera instrucciones y optimiza para procesadores compatibles con Intel® Pentium® 4 incluyendo MMX, SSE y SSE2.
W /QxW -xW No aplica
Genera instrucciones y optimiza para procesadores Intel® con soporte para SSE3 incluyendo Core Duo. Estos procesadores soportan SSE3 así como MMX, SSE y SSE2.
P /QxP/QaxP
-xP,-axP
Vector-ización occure por default
24
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Vectorización Basada en el Compilador Despacho Automático del Procesador – ax[?]
Un solo ejecutable
• Optimizado para procesadores Intel® Core y código genérico que corre en todos los procesadores IA32.
Para cada procesador objeto utiliza:
• Instrucciones específicas del procesador
• Vectorización
Baja sobrecarga
• Algunos incrementan en tamaño del código
25
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Por qué los Ciclos no se Vectorizan
Independencia
• Las iteraciones de los ciclos generalmente deben ser independientes
Algunos calificadores relevantes:
• Algunos ciclos dependientes pueden vectorizarse
• La mayoría de llamadas a funciones no pueden vectorizarse.
• Algunos saltos condicionales previenen la vectorización.
• Los ciclos deben ser contables.
• En ciclos anidados, los ciclos de fuera no pueden vectorizarse.
• Tipos de datos mezclados no pueden vectorizarse.
26
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
¿Por Qué no fue Vectorizado mi Ciclo?
Windows* Linux* Macintosh*
-Qvec_reportn -vec_reportn -vec_reportn
Establece niveles de diagnóstico enviados a stdout
n=0: Sin información de diagnóstico
n=1: (Default) Los ciclos fueron vectorizados exitosamente “Loops successfully vectorized”
n=2: Ciclos no vectorizados – y la razón de porque no
n=3: Añade información de dependencia
n=4: Reporta solo ciclos no-vectorizados
n=5: Reporta ciclos no-vectorizados y añade información de dependencias
27
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Por qué los Ciclos no se Vectorizan
• “Existencia de dependencia en vectores”
• “Unidad de pasos (unit stride) no usada”
• “Tipos de Datos Mezclados”
• “Estructura de Ciclo no Soportada”
• “Contiene una sentencia no-vectorizable en la línea XX”
• Hay más razones por las cuales los ciclos no se vectorizan pero se discutiremos las razones aquí descritas
28
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Existencia de Dependencia en Vectores”
Usualmente, indica una dependencia real entre iteraciones del ciclo como aquí se muestra:
for (i = 0; i < 100; i++) x[i] = A * x[i + 1];
29
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Definiendo Independencia en Ciclos
Iteración Y de un ciclo es independiente de cuando (o si) la iteración X ocurre.
int a[MAX], b[MAX];
for (j=0;j<MAX;j++) {
a[j] = b[j];
}
30
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Unidad de Pasos (Unit Stride) No Usada”
for (I=0;I<=MAX;I++)
for (J=0;J<=MAX;J++) {
c[I][J]+=1; // Unit Stride
c[J][I]+=1; // Non-Unit
A[J*J]+=1; // Non-unit
A[B[J]]+=1; // Non-Unit
if (A[MAX-J])=1 last1=J;}// Non-Unit
Resultado Final: Cargar el Vector puede llevar más ciclos que ejecutar la operación secuencialmente.
Mem
oria
31
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Tipos de Datos Mezclados”
Ejemplo:
int howmany_close(double *x, double *y)
{ int withinborder=0;
double dist;
for(int i=0;i<MAX;i++) {
dist=sqrtf(x[i]*x[i] + y[i]*y[i]);
if (dist<5) withinborder++;
}
}
Son posibles tipos de datos mezclados – pero complican las cosas• ejemplo: 2 dobles vs 4 enteros por registro SIMD
Algunas operaciones con tipos de datos específicos no funcionarán
32
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Estructura del Ciclo No-Soportada”
Ejemplo:struct _xx {
int data;
int bound; } ;
doit1(int *a, struct _xx *x) {
for (int i=0; i<x->bound; i++) a[i] = 0;
Una estructura de ciclo no soportada significa que el ciclo no es contable, o el compilador por cualquier razón no puede construir una expresión de tiempo de ejecución para el contador.
33
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
“Contiene enunciado no-vectorizable”
for (i=1;i<nx;i++) {
B[i] = func(A[i]); }
128-bit Registers128-bit Registers
A[3] A[2]
B[3] B[2]
func func
A[1] A[0]
B[1] B[0]
func func
34
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Referencia
Web-based and classroom training
• www.intel.com/software/college
White papers and technical notes
• www.intel.com/ids
• www.intel.com/software/products
Product support resources
• www.intel.com/software/products/support
35
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
36
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 1 - raytrace2: Compilación Inicial
Establecer el ambiente y compila con ambos: Microsoft* Visual C++ .NET (MSVC*) e Intel® C++ Compiler (icl)
37
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 2 - raytrace2: Compilación O3
Usar la optimización de alto nivel (-O3) del compilador Intel para loop centric codes
38
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 3 - raytrace2: Compilación IPO
Usar la optimización Inter-procedural (-Qipo) del compilador Intel
39
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 4 - raytrace2: Compilación PGO
Usar optimización personalizada (Profile-guided Optimization) del compilador de Intel
40
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 5 – raytrace2: Vectorización
Usar optimización de vectorización (-QxP) del compilador de Intel
41
Copyright © 2006, Intel Corporation. All rights reserved.
Intel Compilers 9.x on the Intel® Core Duo™ Processor Windows version
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Actividad 6 - raytrace2: Usando todo simultáneamente
Usa todas las optimizaciónes previas (-O3, -QxP, IPO y PGO)