Reconfigurable Computing: HPC Network Aspects

Preview:

DESCRIPTION

Reconfigurable Computing: HPC Network Aspects. Craig Ulmer (8963) cdulmer@sandia.gov. Mitch Sukalski (8961) David Thompson (8963). Pete Dean R&D Seminar December 11, 2003. FPGAs are promising…. But what’s the catch? - PowerPoint PPT Presentation

Citation preview

Reconfigurable Computing: HPC Network Aspects

Mitch Sukalski (8961)

David Thompson (8963)

Craig Ulmer (8963)cdulmer@sandia.gov

Pete Dean R&D SeminarDecember 11, 2003

FPGAs are promising…

But what’s the catch?

There are three main challenges that need to be addressed in order to apply to practical, scientific computing.

RC Challenge #1: Floating Point

• Most FPGAs fine grained

• Floating point units are large– 32b FP occupies ~1,000 CLBs– Commercial capacity improving

• 2000: 6,000 CLBs

• 2003: 40,000 CLBs (Max: 220,000)

• Keith Underwood at Sandia/NM– LDRD: Working on high-speed 64b floating-point cores

32b FP in Xilinx V2P7

RC Challenge #2: Design Tools

• Hardware design is non-trivial– Micromanage computations, clock-by-clock– Not appropriate for most scientists– Need languages, APIs that are easy to use

• Maya Gokhale at LANL– Streams-C: C-like language for HW design– Pipeline/unroll loops– Schedules access to external memory

RC Challenge #3: High-speed I/O

• FPGAs have large internal computational power– How do we get data into/out of FPGA?– How do we connect to our existing HPC machines?

• Mitch Sukalski, David Thompson, Craig Ulmer– LDRD: Connect FPGAs to high-performance SANs

FPGA

FPGA

Outline

• Where we have beenNetworking FPGAs using external NI cards

• Where we are goingNetworking FPGAs using internal transceivers

• Project statusEarly details

Previous Work

Where we’ve been..

Networking Earlier FPGAs

• Previous generation of FPGAs were like blank ASICs– Configurable logic and pins

• Attach a network card to an FPGA card– Communication over PCI

• Examples:– Virginia Tech: Myrinet– Washington U. in St. Louis: ATM (inline)– Clemson University: Gigabit Ethernet– Georgia Tech: Myrinet

CPU

FPGA

NIC

PC

I B

us

GRIM Project at Georgia Tech

• Add multimedia devices to cluster– Message layer connects

CPUs, memory, and peripherals

– Myrinet between hosts,PCI within hosts

• Celoxica RC-1000 FPGA– Virtex FPGA (1M logic gates)– Four SRAM banks – PCI w/ PMC

SRAM

0SRAM

1SRAM

2SRAM

3

PCIFPGA

Control & Switching

CPU

CPUCPU CPU CPU

CPU

FPGA

RAID

FPGAFPGA

Ethernet

GRIM

FPGA Organization

Frame

Incoming Message Queues

OutgoingMessage Queues

Communication Library API

ApplicationData

Memory API

FPGA Card Memory

FPGACircuit Canvas

User Circuit API

User Circuit n

User Circuit 1

Lessons Learned

• Frame provides simple OS– Isolates users from board– Portability

• Dynamically manage resources– Card memory– Computational circuits

• PCI bottleneck– Distance between NI and FPGA– PCI difficult to work with

Page A

SRAM 1

Page B

SRAM 2

HostCPU

FPGA

Circuit X

Circuit Y

Circuit E

Circuit F

Circuit G

FunctionFault

Message:Use Circuit F

on $C0000000

PageFault

Page C

Page C

NIC

Network Features of Recent FPGAs

Where we’re going…

FPGA Network Improvements

• Recent FPGAs have special, built-in cores– High-speed transceivers, dedicated processors

• Idea: Build our NI inside the FPGA– FPGA becomes a networked, compute resource– Removes the PCI bottleneck

FPGA

NI Tx

Rx

NI Tx

Rx

User-definedComputational

Circuits

CPU

NIC

System Area Network

CPU

NIC

CPU

NIC

Xilinx Virtex-II/Pro FPGA

• Up to 4 PowerPC405 cores– Embedded version of PPC– 300-400MHz

• Multiple gigabit transceivers– Run at 600Mbps to 3.125Gbps– Up to twenty-four transceivers

• Additional cores– Distributed internal memory– Arrays of 18b multipliers– Digital clock multipliers, PLLs

Xilinx V2P20

Multi-Gigabit Transceivers: Rocket I/O

• Flexible, high-speed transceivers– Can be configured to connect with different physical layers– InfiniBand, GigE, FC, 10GigE, Aurora– Note: low-level interface (commas, disparity, clock mismatches)

FPGAFabric

Serializer

Deserializer

Tx FIFO8B/10B

EncoderCRC

8B/10BDecoder

Rx ElasticBuffer

ClockRecoverCRC check

PIN+

-PIN

PIN+

-PIN

FPGAFabric

Rocket I/OPIN

PIN

Rocket I/OPIN

PIN

Rocket I/OPIN

PIN

Why MGTs are Important

• Direct connection to networks– Same chip, different network – Remove PCI from equation

• Fast connections between FPGAs– Reduces analog design issues– Chain FPGAs together– Reduce pin count

• Update: Virtex II/ProX– Now 2.488 Gbps – 10.3125 Gbps– Chips have either 8 or 20 transceivers

3.125 Gbps over 44” FR4 *

* From Xilinx, http://www.xilinx.com/products/virtex2pro/mgtcharacter.htm

Hard PowerPC Core

• PowerPC 405– 16KB Instruction / 16KB Data caches– Real and Virtual memory modes– GCC is available

• Multiple memory ports for core– On-chip memory (OCM)– Processor Local Bus (PLB)

• User-defined memory map– Connect memory blocks or cores– External memory cores available

ProcessorLocal

Bus (PLB)

PowerPC

I-Cache D-Cache

On-ChipMemory

(OCM) Interface

System on a Chip (SoC)

• Commercial SoC– Designing with cores– Customize system

• New tools– Rapidly connect cores– Library of cores & buses– Saves on wiring legwork

Xilinx Platform Studio

Current Status

• Exploring V2P– New architecture, new tools

• Two reference boards– ML300 (V2P7-6)– Avnet (V2P20-6)

• Transceiver work– Raw transmission over fiber– Working towards IB

http://cdulmer.ran.sandia.gov

Recommended