27
Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat Hanrahan

Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

Workshop on Commodity-Based Visualization Clusters

Learning From the Stanford/DOE Visualization Cluster

Mike Houston, Greg Humphreys, Randall Frank, Pat Hanrahan

Page 2: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

2Workshop on Commodity-Based Visualization Clusters

Outline

Stanford’s current cluster– Design decisions– Performance evaluation– Bottleneck evaluation

Cluster “Landscape”– General classification– Bottleneck evaluation

Stanford’s next cluster– Design goals– Research directions

Page 3: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

3Workshop on Commodity-Based Visualization Clusters

Stanford/DOE Visualization Cluster

The Chromium Cluster

Page 4: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

4Workshop on Commodity-Based Visualization Clusters

Cluster Configuration (Jan. 2000)

Cluster: 32 graphics nodes + 4 server nodes Computer: Compaq SP750

– 2 processors (800 MHz PIII Xeon, 133MHz FSB)– i840 core logic (big issue for vis-clusters)

• Simultaneous fast graphics and networking• Network: 64-bit, 66 MHz PCI• Graphics: AGP-4x

– 256 MB memory– 18GB SCSI 160 disk (+ 3*36GB on servers)

Graphics (Sept. 2002)– 16 NVIDIA GeForce3 w/ DVI (64 MB)– 16 NVIDIA GeForce4 TI4200 w/ DVI (128 MB)

Network– Myrinet 64-bit, 66 MHz (LANai 7)

Page 5: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

5Workshop on Commodity-Based Visualization Clusters

Graphics Evaluation

NVIDIA GeForce3– 25 MTri/s triangle rate observed– 680 MPix/s fill rate observed

NVIDIA GeForce4– 60 MTri/s triangle rate observed– 800 MPix/s fill rate observed

Read Pixels performance– 35 MPix/s (140 MB/s) RGBA– 22 MPix/s (87 MB/s) Depth

Draw Pixels performance– 45 MPix/s (180 MB/s) RGBA– 21 MPix/s (85 MB/s) Depth

Page 6: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

6Workshop on Commodity-Based Visualization Clusters

Network Evaluation

Myrinet LANai 7 PCI64A boards– Theoretical Limit: 160 MB/s – 142 MB/s observed peak under Linux– ~100 MB/s observed sustained under Linux

ServerNet not chosen– Driver support– Large switching infrastructure required

Gigabit Ethernet– Performance and scalability concerns

Page 7: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

7Workshop on Commodity-Based Visualization Clusters

Myrinet Issues

Fairness: Clients starved of network resources– Implemented credit scheme to minimize congestion

Lack of buffering in switching fabric– Causes poor performance in high load conditions– Open issue

Partitioned Cluster

Unpartitioned Cluster

Page 8: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

8Workshop on Commodity-Based Visualization Clusters

i840 Chipset Evaluation

66MHz 64bit PCI performance not full speed:– 210 MB/s PCI read (40% of theoretical peak)– 288 MB/s PCI write (54% of theoretical peak)– Combined read/write ~121 MB/s

AGP– Fast Writes / Side Band Addressing unstable under Linux

Page 9: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

9Workshop on Commodity-Based Visualization Clusters

Sort-First Performance

Configuration– Application runs application on client– Primitives distributed to servers

Tiled Display– 4x3 @ 1024x768– Total resolution: 4096x2304,

9 Megapixel

Quake 3– 50 fps

Atlantis– 450 fps

Page 10: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

10Workshop on Commodity-Based Visualization Clusters

Sort-Last Performance

Configuration– Parallel rendering on multiple nodes– Composite to final display node

Volume Rendering on 16 nodes– 1.57 GVox/s [Humphreys 02]– 1.82 GVox/s (tuned) 9/02– 256x256x1024 volume1

rendered twice

1Data Courtesy of G. A Johnson, G.P.Cofer, S.L Gewalt, and L.W. Hedlund from the Duke Center for In Vivo Microscopy (an

NIH/NCRR National Resource)

Page 11: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

11Workshop on Commodity-Based Visualization Clusters

Cluster Accomplishments

Development Platform– WireGL– Chromium

Cluster configuration replicated Interactive Performance

– 256x512x1024 volume @ 15fps– 9 Megapixel Quake3 @ 50fps

Page 12: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

12Workshop on Commodity-Based Visualization Clusters

Sources of Bottlenecks

Sort-First– Packing speed (processor)– Primitive distribution (network and bus)– Rendering (processor and graphics chip)

Sort-Last– Rendering (graphics chip)– Composite (network, bus, and read/draw pixels)

Page 13: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

13Workshop on Commodity-Based Visualization Clusters

Bottleneck Evaluation – Stanford

Sort-First: Processor and Network Sort-Last: Network and Read/Draw

0 200 400 600 800 1000

Read/Draw

Network

Bus

Graphics

Processor

Throughput

Page 14: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

14Workshop on Commodity-Based Visualization Clusters

The Landscape of Graphics Clusters

Many Options– Low End <$2500/node– Mid End ~$5000/node– High End >$7500/node

Tradeoffs– Different bottlenecks– Price/Performance– Scalability– Usage

Evaluation– Based off of published benchmarks and specs

Page 15: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

15Workshop on Commodity-Based Visualization Clusters

Cluster Interconnect Options

Many choices– GigE

• ~100 MB/s– Myrinet 2000 (http://www.myrinet.com)

• 245MB/s– SCI/Dolphin (http://www.dolphinics.com)

• 326 MB/s– Quadrics (http://www.quadrics.com)

• 340 MB/s

Future options– 10 GigE– Infiniband– HyperTransport

Page 16: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

16Workshop on Commodity-Based Visualization Clusters

Low End

General Definition– Single CPU– Consumer Mainboard– Integrated Graphics– High Speed commodity network

Example Node Configuration– Nvidia NForce2– AMD Athlon 2400+– 512 MB DDR– GigE and 10/100– 1U rack chassis– Estimated Price: $1500

Page 17: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

17Workshop on Commodity-Based Visualization Clusters

Bottleneck Evaluation – Low End

Bus/Network limited

0 200 400 600 800 1000

Read/Draw

Network

Bus

Graphics

Processor

Throughput

Page 18: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

18Workshop on Commodity-Based Visualization Clusters

Mid End

General Definition– Dual Processor

– “Workstation” mainboard

– High performance bus

• 64-bit PCI or PCI-X

– High Speed Commodity / Low end cluster interconnect

– High-End consumer graphics board Example Node Configuration

– Intel i860

– Dual Intel P4 Xeon 2.4GHz

– 2GB RDRAM

– ATI Radeon 9700

– GigE onboard + Myrinet 2000

– 2U rack chassis

– Estimated Price: $4000

Page 19: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

19Workshop on Commodity-Based Visualization Clusters

Bottleneck Evaluation – Mid End

Sort-First: Network limited Sort-Last: Read/Draw and Network limited

0 200 400 600 800 1000

Read/Draw

Network

Bus

Graphics

Processor

Throughput

Page 20: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

20Workshop on Commodity-Based Visualization Clusters

High End

General Definition– Dual or Quad processor– Cutting edge bus

• PCI-X, HyperTransport, PCI Enhanced– High Speed Commodity/ High end cluster interconnect– “Professional” graphics board– RAID system

Example Node Configuration– ServerWorks GC-WS– Dual P4 Xeon 2.6GHz– Nvidia Quadro4 900XGL– 4GB DDR– GigE onboard + Infiniband– Estimated Price: $7500

Page 21: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

21Workshop on Commodity-Based Visualization Clusters

Bottleneck Evaluation – High End

Sort-First: Well balanced Sort-Last: Read/Draw limited

0 200 400 600 800 1000

Read/Draw

Network

Bus

Graphics

Processor

Throughput

Page 22: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

22Workshop on Commodity-Based Visualization Clusters

Balanced System is Key

Only as fast as slowest component– Spend money where it matters!

0 200 400 600 800 1000

Network

Bus

Graphics

Processor

Throughput

High End

Mid End

Stanford

Low End

Page 23: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

23Workshop on Commodity-Based Visualization Clusters

Goals for Next Cluster

Performance– Sort-Last

• 5 GVox/s• 1 GTri/s

– Sort-First at 4096x2304• Quake3 @ >100fps

Research– Remote visualization– Time-varying datasets– Compositing

Page 24: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

24Workshop on Commodity-Based Visualization Clusters

What we plan to build

16 Node cluster, 1U nodes Mainboard chipsets

– Intel Placer– ServerWorks GC-WS– AMD Hammer

Memory– 2-4GB

Graphics Chip– Nvidia NV30 – ATI R300/350

Interconnect– Infiniband, Quadrics

Disk– IDE RAID or SCSI

Page 25: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

25Workshop on Commodity-Based Visualization Clusters

Continuing Chipset Issues

Why do chipsets perform so poorly?– “Workstation”

• Intel i860– 215 MB/s read (40% of theoretical)– 300 MB/s write (56% of theoretical)

• AMD 760MPX– 300 MB/s read (56% of theoretical)– 312 MB/s write (59% of theoretical)

– “Server”• ServerWorks ServerSet III LE

– 423 MB/s read (79% of theoretical)– 486 MB/s write (91% of theoretical)

Why can’t a “server” have an AGP slot?Performance numbers from http://www.conservativecomputer.com

Page 26: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

26Workshop on Commodity-Based Visualization Clusters

Ongoing Bottlenecks

Readback performance– Will be fixed “soon”– Hardware compositing?

Chipset Performance– Achieve fraction of theoretical– Need faster busses in commodity chipsets

Network Performance– Scalability– Fast is VERY expensive

Page 27: Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat

27Workshop on Commodity-Based Visualization Clusters

Conclusions

What we still need– More vendors– More chipsets– More performance

Graphics Clusters are getting better– Chipsets– Interconnects– Form factor– Processing– Graphics Chips

Things are really starting to get interesting!