Upload
bryan-palmer
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
A High-Performance Scalable Graphics Architecture
Daniel R. McLachlanDirector, Advanced Graphics Engineering
SGI
Growth in Model Sizes
Worldwide Production of Information
0
20
40
60
80
100
120
140
160
180
200
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Exa
byt
es
Source: Gartner
Images courtesy of Parametric Technology Corporation; Photodisc, and Magic Earth, LLC
Problems Are Getting Increasingly Complex Over Time
Bumper Bumper, hood, engine, wheels Entire car
Crash dummyOrgan damage
E-crash dummy
Images courtesy of EAI; SCI Institute, NLM, Theoretical Biophysics Group of
the Beckman Institute at UIUC; Livermore Software Technology Corporation
The Complexity of the Simple
Images courtesy of Procter & Gamble
Potato Chips
Diapers
2000 2001 2002 2003 2004
Ban
dwid
th S
peci
ficat
ion
(nor
mal
ized
)
Polygons Fill Rate Internal Bus Network I/C
Graphic Cards Are Outpacing PC Architecture and Bandwidth
Performance Gap
Graph based on relative scale.
Perf
orm
ance
2003
Visualization
• Low cost• Fast simple polygons• Single screen image quality
1992
• Extreme resolution• Absolute visual quality• VAN
• Solving complex problems• Dense data sets
Visualization Breaks The Cognitive Barrier For Better DecisionsVisualization Breaks The Cognitive Barrier For Better Decisions
Addressing Real Needs
Images courtesy of Advantage CFD; SCI institute; NLM; Theoretical Biophysics Group of the Beckman Institute at UIUC; Laboratory for Atmospheres, NASA Goddard Space Flight Center; Donghoon Shin, Art Center College of Design, Nvidia Corporation; ATI Technologies, Inc; and Nintendo Co., Ltd.
Clusters Graphics
Cluster Comparison
Pros• Cheap• Industry standard • High display list performance• Good for “embarrassingly parallel”
problems• Can potentially scale to 1000s of
processors
Cons• Cumbersome to program • High administration costs• Few applications for visualization• Difficult to scale for large problems• Difficult to dynamically load balance• Lack of software productivity tools• Often requires data replication• Reliability• Limited to 2GB memory space
Traditional Clusters SGI® NUMAflex™
node+
OS...
Fast NUMAflex™ interconnectGlobal shared memory
node+
OS
...
Commodity interconnectmemmemmemmemmemmem
node+
OS
node+
OS
node+
OS
node+
OS
node+
OS
node+
OS
node+
OS
node+
OS
What is shared memory?• All nodes operate on one large shared memory space, instead of each node having its own
small memory space Shared memory is high-performance• All nodes can access one large memory space efficiently, so complex communication and data
passing between nodes aren’t needed• Big data sets fit entirely in memory; less disk I/O is needed
Shared memory is cost-effective and easy to deploy • It requires less memory per node, because large problems can be solved in big shared memory• Simpler programming means lower tuning and maintenance costs
The Benefits of Shared Memory
1-2 CPUs per node < 64 CPUs per node
How SGI® Onyx® Enables the Role
System at a GlanceScalable Graphics I/O
Scalable Disk I/O Scalable Resolution
Appropriate Delivery
Scalable Rendering
Scalable Data
Compositor
Network
Scalable
Graphics
Scalable
Compute
and
Large
Memory
SGI Onyx
Large
Data
Sets
Scalable Interaction
Moving from a fixed rendering path…
Images courtesy of Pratt and Whitney Canada and Magic Earth, LLC
…to a scalable and programmable rendering path.
Application accelerators
Geometry
Silicon Graphics® Onyx4™ UltimateVision™ Changing the Application Paradigm
ScalingA Shift in Pipe Paradigm
3. Time-based decomposition
Even more powerful in combination
All modes can be used separately orcombined in any number of ways
Even more powerful in combination
All modes can be used separately orcombined in any number of ways
Data courtesy of DaimlerChrysler, Images courtesy of MAK Visible Human public data set
1. Screen-based decomposition
2. Eye-based decomposition
4. Data-based decomposition
Multi-Tier CompositionComposite output of multiple compositors e.g., first layer does 2D composition, second layer does anti-aliasing
Visual ServingComposited output sent to workstations for viewing and/or editing
Compositor Flexibility
Silicon Graphics® Onyx4™ UltimateVision™ System Architecture
2 Graphics Pipes2 Graphics Pipes
CPUCPU
CPUCPU
CPUCPU
CPUCPU
8GB RAM8GB RAM
Memory Controller
Memory Controller
SG
I® N
UM
A s
cala
bili
ty
Standard I/Oor
2 Graphics Pipes
Optional
Silicon Graphics® Onyx4™ UltimateVision™Solving bigger and more complex problems
• World’s most scalable visualization system•Up to 32 GPUs in an SSI architecture
• World-leading computational capability•Up to 64 CPUs per node, scalable to 1024 processors
• Solves system b/w limitations of PCs and clusters•Up to 8 NUMAlink 3 connections to a single shared memory pool
• New-generation programmable graphics architecture•OpenGL Shading Language
Conclusion