View
217
Download
0
Category
Tags:
Preview:
Citation preview
© NVIDIA Corporation 2011
The ‘Super’ Computing Company
From Super Phones to Super Computers
CUDA 4.0
© NVIDIA Corporation 2011
CUDA Toolkit 4.0 Release CandidateAvailable to Registered Developers on March 4th
Press Embargo : February 28th – 6am PST (San Francisco)
© NVIDIA Corporation 2011
Rapid Application PortingUnified Virtual Addressing
Faster Multi-GPU ProgrammingGPUDirect 2.0
CUDA 4.0Application Porting Made Simpler
Easier Parallel Programming in C++ Thrust
© NVIDIA Corporation 2011
CUDA 4.0 for Broader Developer Adoption
CUDA 1.0 2007
Researchers
and Early
Adopters
CUDA 2.0 2008
Scientists and
HPC
Applications
CUDA 3.0 2009
Application
Innovation
Leaders
CUDA 4.0 2011
Broader
Developer
Adoption
© NVIDIA Corporation 2011
NVIDIA GPUDirect™:Towards Eliminating the CPU Bottleneck
• Direct access to GPU memory for 3rd party devices
• Eliminates unnecessary sys mem copies & CPU overhead
• Supported by Mellanox and Qlogic
• Up to 30% improvement in communication performance
Version 1.0
for applications that communicate over a network
• Peer-to-Peer memory access, transfers & synchronization
• MPI implementations natively support GPU data transfers
• Less code, higher programmer productivity
Details @ http://www.nvidia.com/object/software-for-tesla-products.html
Version 2.0
for applications that communicate within a node
© NVIDIA Corporation 2011
Before GPUDirect v2.0
Required Copy into Main Memory
GPU1
GPU1Memory
GPU2
GPU2Memory
PCI-e
CPU
Chipset
SystemMemory
© NVIDIA Corporation 2011
GPUDirect v2.0: Peer-to-Peer Communication
Direct Transfers b/w GPUs
GPU1
GPU1Memory
GPU2
GPU2Memory
PCI-e
CPU
Chipset
SystemMemory
© NVIDIA Corporation 2011
Unified Virtual Addressing Easier to Program with Single Address Space
No UVA: Multiple Memory Spaces
UVA : Single Address Space
System
Memory
CPU GPU0
GPU0Memor
y
GPU1
GPU1Memor
y
System
Memory
CPU GPU0
GPU0Memor
y
GPU1
GPU1Memor
y
PCI-e PCI-e
0x0000
0xFFFF
0x0000
0xFFFF
0x0000
0xFFFF
0x0000
0xFFFF
© NVIDIA Corporation 2011
C++ Templatized Algorithms & Data Structures (Thrust)
Powerful open source C++ parallel algorithms & data structures
Similar to C++ Standard Template Library (STL)
Automatically chooses the fastest code path at compile time
Divides work between GPUs and multi-core CPUs
Parallel sorting @ 5x to 100x faster than STL and TBB
Data Structures
• thrust::device_vector
• thrust::host_vector• thrust::device_ptr• Etc.
Algorithms
• thrust::sort• thrust::reduce• thrust::exclusive_scan
• Etc.
© NVIDIA Corporation 2011Source: http://www.tiobe.com
C
C++
Parallel Programming Sweet Spot
© NVIDIA Corporation 2011
CUDA 4.0: Highlights
• Share GPUs across multiple threads
• Single thread access to all GPUs
• No-copy pinning of system memory
• New CUDA C/C++ features
• Thrust templated primitives library
• NPP image/video processing library
• Layered Textures
Easier ParallelApplication Porting
• Auto Performance Analysis
• C++ Debugging
• GPU Binary Disassembler
• cuda-gdb for MacOS
New & Improved Developer Tools
• Unified Virtual Addressing
• NVIDIA GPUDirect™ v2.0
• Peer-to-Peer Access
• Peer-to-Peer Transfers
• GPU-accelerated MPI
Faster Multi-GPU Programming
© NVIDIA Corporation 2011
GPU Technology Conference 2011Oct. 11-14 | San Jose, CA
3rd annual GPU Technology Conference
New for 2011:
Co-located with Los Alamos HPC Symposium
300+ Research Scientists from National Labs
2010 highlights
• 280 hours of sessions
• 100+ Research posters
• 42 countries representedwww.gputechconf.com
© NVIDIA Corporation 2011
BACKGROUND SLIDESCUDA 4.0
© NVIDIA Corporation 2011
NVIDIA CUDA Summary
New in
CUDA 4.0
Libraries
Thrust C++ LibraryTemplated Performance Primitives
NVIDIA Library Support
Complete math.hComplete BLAS Library (1, 2
and 3)
Sparse Matrix Math LibraryRNG LibraryFFT Library (1D, 2D and 3D)Image Processing Library
(NPP)
Video Processing Library (NPP)
3rd Party Math Libraries• CULA Tools• MAGMA• IMSL• VSIPL
Tools
Parallel Nsight Pro
NVIDIA Tools SupportParallel Nsight 1.0 IDEcuda-gdb Debugger with
multi-GPU
CUDA/OpenCL Visual Profiler
CUDA Memory CheckerCUDA C SDKCUDA Disassembler
CUDA Partner Tools
Allinea DDT RogueWave /Totalview Vampir Tau CAPS HMPP
Platform
GPUDirect 2.0Fast Path to Data
Hardware SupportECC MemoryDouble PrecisionNative 64-bit ArchitectureConcurrent Kernel ExecutionDual Copy Engines Multi-GPU support 6GB per GPU supported
Operating System Support
MS Windows 32/64Linux 32/64 supportMac OSX support
Cluster ManagementGPUDirect Tesla Compute Cluster (TCC)Graphics Interoperability
Programming Model
Unified Virtual Addressing
C++ new/delete
C++ Virtual Functions
C support• NVIDIA C Compiler• CUDA C Parallel Extensions• Function Pointers • Recursion• Atomics• malloc/free
C++ support• Classes/Objects• Class Inheritance• Polymorphism• Operator Overloading • Class Templates• Function Templates• Virtual Base Classes • Namespaces
Fortran, OpenCL
© NVIDIA Corporation 2011
cuda-gdb Now Available for MacOS
Details @ http://developer.nvidia.com/object/cuda-gdb.html
© NVIDIA Corporation 2011
Automated Performance Analysis in Visual Profiler
Summary analysis & hints
Session
Device
Context
Kernel
New UI for kernel analysis
Identify limiting factor
Analyze instruction throughput
Analyze memory throughput
Analyze kernel occupancy
© NVIDIA Corporation 2011
NVIDIA Parallel Nsight™
Professional features now available
free of charge!
Key FeaturesProfessional Profiler Standard
Microsoft Visual Studio 2010 support
Single System Debugging
Tesla Compute Cluster
CUDA Toolkit 3.2
© NVIDIA Corporation 2011
CUDA 3rd Party Ecosystem
Tools
Parallel Debuggers
Visual Studio IDE with
Parallel Nsight Pro
Allinea DDT Debugger
TotalView Debugger
Performance Tools
ParaTools VampirTrace
TauCUDA Performance Tools
PAPI
HPC Toolkit
Compute Platform Providers
Cloud Compute
Amazon EC2
Peer 1
OEM’s
Dell
HP
IBM
Cluster Tools
Cluster Management
Platform LSF Cluster Manager
Platform Symphony
Bright Cluster manager
Job Scheduling Altair PBS
Cluster Resources TORQUE
MPI Libraries
MPI
OpenMPI
Qlogic OFED
Compilers
PGI CUDA Fortran
PGI Accelerators
PGI CUDA x86
CAPS HMPP
TidePowerd GPU.net
pyCUDA
© NVIDIA Corporation 2011
© NVIDIA Corporation 2011
NVIDIA CUDA Developer Resources
ENGINES &LIBRARIES
Math LibrariesCUFFT, CUBLAS, CUSPARSE, CURAND
3rd Party LibrariesCULA LAPACK, VSIPL,
NPP Image LibrariesPerformance primitives for imaging
App Acceleration EnginesRay Tracing: Optix, iRay
Video Libraries
NVCUVID / NVCUVENC
DEVELOPMENTTOOLS
CUDA ToolkitComplete GPU computing development kit
cuda-gdbGPU hardware debugging
Visual ProfilerGPU hardware profiler for CUDA C and OpenCL
Parallel NsightIntegrated development environment for Visual Studio
SDKs AND CODE SAMPLES
GPU Computing SDK CUDA C/C++, DirectCompute,OpenCL code samples and documentation
Books CUDA by Example, GPU Gems
Optimization GuidesBest Practices for GPU computing and graphics development
http://developer.nvidia.com
© NVIDIA Corporation 2011
Proven Research Vision
John Hopkins University
Nanyan University
Technical University-Czech
CSIRO
SINTEF
HP Labs
ICHEC
Barcelona SuperComputer Center
Clemson University
Fraunhofer SCAI
Karlsruhe Institute Of Technology
World Class Research Leadership and Teaching
University of Cambridge
Harvard University
University of Utah
University of Tennessee
University of Maryland
University of Illinois at Urbana-Champaign
Tsinghua University
Tokyo Institute of Technology
Chinese Academy of Sciences
National Taiwan University
Georgia Institute of Technology
http://research.nvidia.com
GPGPU Education350+ Universities
Academic Partnerships / Fellowships
GPU Computing Research & Education
Mass. Gen. Hospital/NE Univ
North Carolina State University
Swinburne University of Tech.
Techische Univ. Munich
UCLA
University of New Mexico
University Of Warsaw-ICM
VSB-Tech
University of Ostrava
And more coming shortly.
© NVIDIA Corporation 2011
CUDA Applications Momentum Increasing
© NVIDIA Corporation 2011
Today’s CUDA CAE Solutions
Structural Mechanics
Electromagnetics
ANSYS Mechanical
AFEA
Abaqus/Standard
(beta)AcuSolveMoldflowCulises (OpenFOAM)Particleworks
NexximEMProCST MSXFdtdSEMCAD X
Fluid Dynamics
Recommended