Upload
tabithascatena
View
1.128
Download
0
Embed Size (px)
Citation preview
Sun Tech Days / Sun Studi0 - # 1
Build High Performance AppsOn Multicore Systems Using Sun Studio Compilers and
Tools
Don KretschSenior Director, Sun Developer ToolsSun Microsystems
1
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 2
Microprocesor TrendsWhere's my 10GHz CPU?
Between 1993 and 1999, the average CPU clock speed increased tenfold; since then, it hasn't even doubledHistorical approach to performance by increasing: clock speed, pipelining, and cache is being negated by heat, power consumption, slow memory
The Clock Race is Over !
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 3
Multi-Core RevolutionPutting transistors to work in a new way
UltraSPARC T2Sun: 8 cores * 1.4GHz(64 threads in a chip)
Intel: Clovertown, AMD: BarcelonaIntel: 4 cores * 2.66GHzAMD: 4 cores * 2.0 GHz
(4 threads in a chip)
Every new system is powered by a multi-core chip !
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 4
Performance • Everyone loves fast
• Take advantage of latest HW features and performance attributes
Parallelism
• Multi-core is here! • Sun's Niagara2 leads with 64 threads/ 8 cores per chip
(Open)Platforms
• Linux, Solaris• SPARC and x86/x64• Equal treatment for all platforms
Productivity• IDE is important to speed development
• No dominant vendor/project Linux
Developer's Needs Have Changed
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 5
Parallelism• No single parallelism
model• Incredibly hard to
parallelize serial apps• Data Races and
deadlocks are common
Platforms• G++ incompatibilities• Constantly evolving
ABI on Linux ...• No uniformity in Linux
platforms
Productivity• Lack of advanced
toolchain• New generation uses
IDEs, but ...• Poor satisfaction with
C, C++ IDEs on Linux ...
Significant Challenges Remain
Performance• Architectures are
changing (too?) fast• Old tricks are no longer
sufficient
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 6
• Simplify Multi-core Development • Maximize Application
Performance• Single source for Linux and
Solaris, SPARC and x86• Modern, productive IDE• Sun Developer Services
Performance. Parallelism. Productivity. Platforms
6
Sun Studio 12
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 7
Performance Performance
Build Fast ApplicationsBuild Fast Applications
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 8
Best in Class SPECint2006:
69% Faster than IBM BladeCenter LS2128% Faster than HP Proliant BL20p G4
Compilers Deliver World Record Performance
Sun Blade X6250
Sun Blade X6220Best in Class SPECOMP M2001:
126% Faster than IBM/Power5
Sun Fire X4600
Best in Class SPECOMP L2001:
11% Faster than HP DL585 G2
Best in Class SPECOMP L2001:
11% Faster than HP DL585 G2
Best in Class SPECOMP M2001:
126% Faster than IBM/Power5
Best in Class SPECint_rate2006
X86 champ on SPECfp_rate2000
Fastest SPECfp_2000 system on planet (7/2006) beating even IBM Power5+
systems
WR count in past 12 months: 5 in SunBlade 6000 systems
10 in Sun Fire X4600; 1 on Sun Fire X45002 each in Sun Fire X2100_M2, X2200, X4100, X4200
Sun Fire X4200, X4100, X2200_M2,
X2100_M2Sun snatches two
WorldRecords in a brand new SPEC CPU2006 benchmark
Both SPECint2006 , SPECfp2006
Sun Niagara2 (8cores/64 Threads)
Best Chip score on SPECintrate 2006 and SPECfprate 2006
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 9
Maximize Application PerformanceSun compilers continue World Record Performance tradition> Set over 25 world records in the past 12 months- and more to come> World Records in EACH category: SPECint 2006, SPECfp 2006,
SPECintrate 2006, SPECfprate 2006 and SPEC OMP2001> World Records on each architecture from 1 core/1socket to 128
cores/64 sockets(scaling): UltraSPARC T2 (Niagara2), UltraSPARC-IV+, SPARC64 VI systems , Intel/Woodcrest, AMD/Opteron
> Sun SPARC Enterprise M9000 system tops 1-TeraFLOP barrierSignificant lead over GCC> 18% -52% on SPARC (SPEC2006)> 11% -18% on x86/AMD (SPEC2006)> 70% + on STREAMImproved optimized debugging abilitiesNew compilers make significant difference
over older releases as well as competitors
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 10
Runtime Performance Optimizations
X86 optimizationsP4, SSE2 instr in assembler Handle P4, SSE2 in inlinesSSE2 instruction schedulingStrength reductionBranch predictionInduction variable elimInvariant hoistingLoop interchangeLoop unswitchingAlignment of symbol blocksLoop unrollingAlignmentConstant propagationVectorization
UltraSPARC Optimizations
CommonOptimizations
Optimized Math libs
x86/x64 Optimizations
SPARC optimizationsBinary optimizations to improve cache localityNiagara, US-IV+, US-IIIi optimizations Modulo SchedulingBlock Scoped optimizationsLinkoptClass Hierarchy Analysis and OptimizationKPIC optimizationsNew CoolTools for UltraSPARC development: ATS, SPOT, ...
Optimized Math Librarieslibm, libmvec, libmopt, libmilLibsunmathMaximally optimized advanced math libraries (BLAS, FFT, LAPACK)MedliaLib, SSE(Math) intrinsics
Optimized performance for each target system: UltraSPARC, X86, and x64, for maximal system utilization
Highly optimized code generationAutomatic parallelization and vectorizationHigh-level loop transformationsInterprocedural optimizationsOptions to exploit advanced architecture pipelines, cache, chipsProfile-Guided OptimizationsAggressive inlining and cloningAdvanced OpenMP supportMore efficient machine resource utilization (throughput) Optimization of (-xbuiltin) callsInline template (assembly) codeAlias-based type disambiguationPrefetch support for newer systemsLinker scoped variables
Sun invests in compiler performance
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 11
Platforms Platforms
Unifying Solaris and Linux Unifying Solaris and Linux developmentdevelopment
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 12
Full Support for Solaris and LinuxBackground> Customers have heterogeneous
Solaris/Linux environments > Sun software portfolio supports Linux> Sun Studio 9 introduced partial support
for Linux– IDE, debugger, profiler> Sun Studio 12 added in compilers,
libraries, to complete the offeringKey Features> Complete feature set now available on
Linux – C, C++, and Fortran compilers, optimized libraries, tools, etc.
> Stable C++ ABI- now available for Linux> Improved GCC compatibility> Ease Multi-Platform Development> Fully enterprise-class support available
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 13
Compilers on LinuxSame features, same source, same components, same performance
C, C++, Fortran Language SystemsStandard C++ libraries, libgc, lint, GPC, ...Optimized Math libraries, including SunPerfLibOpenMP 2.5 APIs, TLS, MPI libraries, ...Popular G++ and GCC extensions, including>asm_inlines, __attribute__>g++ABI for interoperability>Linux Kernel compiled with Sun CompilersExpress Program: >3000 downloads, >800 active usersSun Studio Forum: >600+ messages
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 14
Be Smart. Be Compatible.Compatibility between releases> Allows developers to upgrade their
environment and continue innovating (versus reworking code)
> Leader in C++ ABI compatibility- link with objects produced by earlier versions
Enhanced GCC compatibility> Eases adoption of Sun Studio for
GCC-based developers> Improved source and binary
compatibility
Solaris Binary Compatibility Guarantee> With Sun Studio software> Source and binary compatibility
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 15
Parallelism:Parallelism:
Developing for a Multi-core Developing for a Multi-core
futurefuture
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 16
Compiler Support for Parallel Apps
Solaris
EventPorts
PosixThreads
SolarisThreads
AtomicOperations
libumem
Application
AutoPar MPIMT OpenMP
UltraSPARC T1/T2 SPARC64 VI,
UltraSPARC IV+
Intel/AMD x86/x64
Sun Studio Developer Tools
Easiest Hardest
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 18
Autopar: SPECfp 2006 improvements
bwaves
gamess
milc zeusmp
gro-mac
cac-tusADM
leslie3d
namd
dealII
so-plex
povray
cal-culix
gemsFDT
tonto lbm wrf sphinx3
02.5
57.510
12.515
17.520
22.525
27.5
Woodcrest box: 3.0GHz dual-corePARALLEL=2
Overall Gain: 16%
Base Flags+ Autopar
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 19
Automatic VectorizationSupport for the Fortran, C and C++ applications
-xvector=simd exploits special SSE2+ instructionsWorks on data in adjacent memory locations
Gains are smaller than -xautoparSPECfp 2006 gains are 3% overall and upto 1-7% range individually
Best suited for loop-level SIMD parallelism
for (i=0; i<1024; i++)c[i] = a[i] * b[i]
for (i=0; i<1024; i+=4)c[i:i+3] = a[i:i+3] * b[i:i+3]
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 20
Tools support for Parallel AppsThread Analyzer
Detects data races and deadlocks in a multithreaded application
Points to non-deterministic or incorrect executionBugs are notoriously difficult to detect by examinationPoints out actual and potential deadlock situations
Process:Instrument the code with -xinstrument=dataraceDetect runtime condition with collect -r all [or race, detection] Use the Graphical Analyzer, tha, to identify conflicts and critical regions
Works with OpenMP, Pthreads, Solaris ThreadsAPI provided for user-defined synchronization primitivesWorks on Solaris (SPARC, x86/x64) and Linux
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 21
Tools support for Parallel Apps (2)Multi-thread aware Debugger>Browse, select, view active threads>Monitor thread entry point, PC, events, LWPs>Posix threads and OpenMP code debugginglock_lint stactic source code lock analyzer>Analyzes the use of mutex and multiple readers/single writer
locks>Reports on inconsistent usage of locks that may lead to
data races and deadlocksPerformance Analyzer support for all MT models
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 22
What is OpenMPDefacto industry standard API for writing shared-memory parallel applications in C, C++ and FortranConsists of> Compiler directives (pragmas)> Runtime routines (libmtsk)> Environment variables Advantages:> Incremental parallelization of source code> Small(er) amount of programming effort> Good Performance and Scalability> Portable across variety of vendor compilersSun Studio has consistently led in supporting the latest version (currently v2.5, work underway for v3.0)
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 24
An OpenMP ExampleFind the primes up to 3,000,000 (216816)
Run on Sun Fire 6800, Solaris 9, 24 processors 1.2GHz US-III+, with 9.8GB main memory
Model # threads Time (secs) % changeSerial N/A 6.636 Base
OpenMP
1 7.210 8.65% drop2 3.771 1.76x faster4 1.988 3.34x faster8 1.090 6.09x faster
16 0.638 10.40x faster20 0.550 12.06x faster24 0.931 Saturation drop
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 25
Race Conditions – Tough Parallel Issues
a[0] = a[1] + b[0];
a[1] = a[2] + b[1];
a[2] = a[3] + b[2];
a[3] = a[4] + b[3];
a[4] = a[5] + b[4];
Thread 1
a[5] = a[6] + b[5];
a[6] = a[7] + b[6];
a[7] = a[8] + b[7];
a[8] = a[9] + b[8];
a[9] = a[10] + b[9];
Thread 2
for (i=1, i < n; i++)a[i] = a[1+1] + b[i];
Thread 1 writes 0-5 iterations; Thread 2 writes 5-9 iterations;a[5] could be written by Thread 2 before its read by Thread 1;
This is a Data Race condition
Sequential execution: Results are deterministicParalel execution: Results are non-deterministic
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 26
Design Practice to Avoid RacesAdopt a higher design abstraction (OpenMP today but this area will change in the future)Use Pass-by-value instead of pass-by-pointer to communicate between the threadsDesign the data structure to limit the global variable usage and restrict the access of shared memoryAnalyze a race problem to decide if it is a harmful program bug or a benign race Understand and fix the real cause of a race condition instead of fixing race condition symptom
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 27
ProductivityProductivity
Build applications fasterBuild applications faster
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 28
Integrated Graphical EnvironmentBased on NetBeans open source IDEDebugger and Performance Analyzer GUIsCode editor with syntax highlighting and code foldingCompile error hyperlinks to source code linesWizard for creating makefilesGUI layout editor / designer with X-DesignerHighly configurable
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 29
Debugger and Performance AnalyzerWorld's best debugger: dbx >Debug optimized, threaded, or OpenMP parallelized code>Graphical, point&click interface>Rich, programmable Event-triggered actions>Supports C, C++, Fortran and JavaBest Observability Tool: Performance Analyzer>Easy to use GUI, works with unmodified binaries, low overhead>Offers performance data at statement, instruction, routine level> Compiler Commentary on optimizations>Supports OpenMP, MPI, and Pthreads parallelization>DataSpace Profiling and hardware counter data>Supports C, C++, Fortran and Java
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 30
Faster Builds with Distributed makeDistributes build across a # of processes or a # of servers>Configuration file defines groups, #jobs in each group, each
machine in the group>Same syntax as make (different from gmake)>Communicates with compilers to maintain up-to-date
dependencies (.KEEP_STATE)#jobs dispatched scales with #CPUs and #nodes>3.6x improvement on Sol 9 (12:22 hours to 3:19 hours) for 4
CPUsAutomatic adjustment of #parallel jobsSun GRID Engine support and integration
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 31
Specialized Tools for Difficult ProblemsRunTime Checking for Memory leak issues:>Out of bounds access checks, memory leaks, memory
usageFix and Continue for quick recompile and reload (without restarting debugging session)Libgc – C/C++ garbage collector for memory allocation and heap managementSecure Lint for checking typical programming errors that impact security (e.g., buffer overflow)
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 32
Sun DeveloperSun DeveloperCommunityCommunity
andandTrainingTraining
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 33
Join the Sun Developer CommunitySDN membership gives youexclusive benefits:
Free developer toolsDiscounts for training, support, books, and hardwareAccess to technical content from SunTech Days and JavaOne OnlineParticipation in forums
http://developers.sun.com
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 34
Developer ServicesNeed help?> Developer email support for
Solaris Developer Express, Sun Studio, Java, and Java developer tools available
Also:> Sun Developer Service Plans
for Small to Medium Size Businesses
> Java Multi-Platform support for Enterprise developers and deployments
http://developers.sun.com/services
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 35
Sun Learning ServicesTraining on Software, Servers, Storage, and Services
Solaris 10 Training, Java EE 5 Training Top 5 Industry-Recognized Certifications
Solaris System Admin, Network Admin, Security, Java Programmer, Developer
Certified developers are paid 15%+ in salaryTrained employees reduce system downtime by as much as 49%SunStudio Web-based course is available NOW !http://www.sun.com/training/catalog/courses/WP-100-S10.xml
http: //www.sun.com/training
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 36
Performance, Parallelism, Productivity, ... and more
Popular GCC extensionsThreaded Debugger (dbx), dmakeMemory Leak Detection/Analysis (RTC)Thread Profiling, Thread AnalysisNetBeans-based IDEBinary Compatibility over 10 releasesTested with 400+ OpenSource appsCommunity, Support, TrainingSPARC, x86, x64 (AMD64, EM64T)Solaris and Linux: > Same source, components, features developers.sun.com/sunstudio
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 37
New Tools for Multi-core Development
1. Visit the Sun Studio Portal @ http: //developers.sun.com/sunstudio
√ Downloads, email forums, support, training, previews of new features, technical articles, etc
2. Try Sun Studio 12 – see how much it improves performance / throughput on the new UltraSPARC, Opteron, and Intel systems, even on Linux boxes(!)
√ Send us your experience, maybe we'll feature you at: http: //developers.sun.com/sunstudio/community/heroes.jsp
To Do List
Sun Tech Days / Sun Studi0 - # 38
Thank You !
Don KretschSenior Director, Sun Developer ToolsSun Microsystems
38
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 39
Sun StudioSun Studio
Performance TuningPerformance TuningCookBookCookBook
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 40
Experiences from Tunathons ...Program run to understand application performance, instead of focusing on standard benchmarksBetween 40 - 80 ISV or performance critical applications are considered for tuning and analysisGoals are to speedup app, identify compiler enhancements, and feedback for future system designsOpportunities range from:>Simple: find the best option, upgrade to new compiler>Easy: simple source change, found by simple analysis>Moderate: use of several analyzers, rewrites in assembly>Difficult: Complex analysis+tuning
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 41
Methodology / Tools UsedEnsure Best Builds:Latest CompilerOptimization flagsProfile feedbackInsert #pragmas
Identify Hot Spots:gprof( function timings)tcov( line counts)analyzer (many stats)
Check Libraries Used:optimized math libslibsunperfmedialibWrite special routines?
Get Execution Stats:cputrack(perf counters)locstat(lock containment)trapstat(traps)
Study and rewrite Source as appropriate
Study and rewrite assembly as appropriate
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 42
Changes that impact App Performance
1)Trading some behavior to get speed
2)Exploiting knowledge of the deployment environment
3)Exploiting knowledge of program characteristics
4)Source code changes
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 43
Compiler Options for Performance-xO1 thru -xO5 (default is no opt, -O implies -xO3)-fast: easy to use, best performance on most code, but it assumes compile platform = run platform and makes FP arithmetic simplicationsUnderstand program behavior and assert to optimizer:> -xrestrict: if only restricted pointers are passed to functions> -xalias_level: if pointers behave in certain ways> -fsimple: if FP arithmetic can be simplifiedTarget machine-related:> -xprefetch, -xprefetch_level > -xtarget=, -xarch=, -xcache=, -xchip= > -xvector: converts DO loops into vector instr/calls
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 44
Compiler Options for PerformanceAdvanced Compiler options> -xprofile: profile-feedback enabled optimizations> -xcrossfile, -xipo: performs crossfile/interprocedural
optimizations> -xautopar: enable automatic parallelization> -xdepend: performance dependence analysisUse optimized math libraries>Sun Performance library for algebraic functions>Vectorized math routines (libmvec)> Inline (libmil) and optimized math (libmopt)>Value-added math library (libsunmath)
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 45
Source Code ChangesImprove usage of data cache, TLB, register windows> Use VIS instruction (templates) directly (via -xvis)> Optimize data alignment (also: #pragma align)> Prevent Register Window OverflowCreating inline assembly templates for performance critical routinesLoop Optimizations that compilers may miss:> Prevent Register Window Overflow> Restructuring for pipelining and prefetching> Loop splitting/fission> Loop Peeling> Loop interchange> Loop unrolling and tiling> Pragma directed
Sun Studio Compilers and Tools
Sun Tech Days / Sun Studi0 - # 46
Gains from Tuning Categories
Tuning Category Typical Range of Gain
Source Change 25-100%
Compiler Flags 5-20%
Use of libraries 25-200%
Assembly coding / tweaking 5-20%
Manual prefetching 5-30%
TLB thrashing/cache 20-100%
Using vis/inlines/micro-vectorization 100-200%