19
Commercial In Confidence. Copyright ©2007, Nallatech. Why FPGAs will win the Accelerator Battle: Building Computers that Minimize Data Movement Allan Cantle President & Founder www.nallatech.com

Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.

Why FPGAs will win the Accelerator Battle:

Building Computers that Minimize Data

Movement

Allan Cantle – President & Founder

www.nallatech.com

Page 2: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.2

Overview

» Commercial Realities For HPC

» A View From Berkeley

» Two Fundamental Computing Architectures» Processor Centric – Data is brought to the Processor

» Data Centric – Processors are built around the Data

» Mixing Processor/Data Centric Architectures

» Data Centric Computing Architectures

» Integrating Processor & Data Centric Architectures

» Summary

Page 3: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.3

Commercial Realities for HPC

Market Size*unviable Viable

10,000

$M

1,0001001010.1

*Market size based on processor chip revenues only

Code base readily Available

and portable between

Architecture generations

Code base NOT readily Available

and NOT portable between

Architecture generations

FPGAs

x86

Cell

Clearspeed

= Single Source Vendor

= Multi Source Vendors

GPUs (as graphics accelerator)

GPGPUs

Page 4: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.4

Reco

nfigura

ble

Pro

cess

ors

e.g

. Cle

ars

peed &

GPU

’s

ProcessorFPGA

1993 to1998

Growing FPGA Families Distributed Memory

Multipliers/DSPsGb Serial I/O

Ce

ll /

Mu

lti

Co

re

A processor that is capable of

morphing itself into the most efficient implementation

for any given computing problem

Polymorphic

Application Data Types

SymbolicVector/StreamingBit Level

SW

EP

T E

ffic

ien

cy

Siz

e, W

eig

ht,

Energ

y,

Perf

orm

ance

, Tim

e

Commercial Realities - Technology Battleground

Page 5: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.5

ProcessorFPGA

1993 to1998

Growing FPGA Families Distributed Memory

Multipliers/DSPsGb Serial I/O

Reco

nfigura

ble

Pro

cess

ors

e.g

. Cle

ars

peed &

GPU

’s

Ce

ll/M

ult

i-C

ore

Application Data Types

SymbolicVector/StreamingBit Level

SW

EP

T E

ffic

ien

cy

Siz

e, W

eig

ht,

Energ

y,

Perf

orm

ance

, Tim

e

Commercial Realities - Technology Battleground

FPGA Technologies building

software on an inherently

parallel architecture

Incumbent Technologies

building parallel architectures

that can still support

inherently serial softwareTe

ch

nolo

gie

s

pa

ralle

lisin

g H

ard

wa

re &

So

ftw

are

Sim

ulta

ne

ously

Page 6: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.6

Commercial Realities - The Changing Face of HPC

» HPC’s traditional markets» Scientific Modelling – Largest in CFD / FEA

» Typically Symmetrical problems

» Non Real Time ( Non Deterministic)

» Growing HPC Markets» Triple Play – Voice, Video & Data

» Internet

» Centralisation of Application Servers – Back to the Mainframe

» Virtualisation

» Deterministic & Real Time – e.g. VOIP

» Some of the new HPC challenges have already been

solved in HPEC world

Page 7: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.7

A View from Berkeley

Page 8: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.8

Two Fundamental Computing Models

» Processor Centric – HPC World» Von Neumann & Harvard Architectures

» Invented when the processor was the bottleneck

» Hub and Spoke approach

» Data Centric – HPEC World» System Level Data Centric

» Chip Level Data Centric

» Distributed processing approach

» Minimise Data Movement

Page 9: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.9

Today’s Processor Centric Architecture (Intel)

Quad Core

Xeon

Processor

MCH /

Northbridge

ChipsetFSB

SDRAM Memory DIMMs

Quad Core

Xeon

Processor

FSB

PCIe

x8

Southbridge

Chipset

PCIe

x8

PCIe

x8

21 GB/s8.5 GB/s

8.5 GB/s

2+2 GB/s

2+2 GB/s

2+2 GB/s

Storage I/O

USB I/O

Graphics I/O

Network I/O

Accelerators

1394 I/O

Wireless I/O

Quad Core

Xeon

Processor

Quad Core

Xeon

Processor

OS

Dependent &

Non

Deterministic

Leads to High Latency

& hence the need for

fully decoupled

accelerators & I/O

Page 10: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.10

Application Data Types

SymbolicVector/StreamingBit Level

SW

EP

T E

ffic

ien

cy

Siz

e, W

eig

ht,

Energ

y,

Perf

orm

ance

, Tim

e

Mixing Processor/Data Centric Architectures

Bit level

processor

= Processor Centric

= Data Centric

Symbolic

processorVector / Streaming

processors

Parallelised

Bit level

processors

Parallelised

Vector / Streaming

processors

Parellised

Symbolic

processors

= 33%

= 66%

Are Data Centric Architectures the

future of Computing?!

Page 11: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.11

Mixing Processor/Data Centric Architectures

Application Data Types

SymbolicVector/StreamingBit Level

SW

EP

T E

ffic

ien

cy

Siz

e, W

eig

ht,

Energ

y,

Perf

orm

ance

, Tim

e

Bit level

processor

Vector / Streaming

processors

Parallelised

Bit level

processors

Parallelised

Vector / Streaming

processors

Parellised

Symbolic

processors

X86

Code Base

25+ Years

FPGA VHDL

Code Base

10+ Years

» Cell, ClearSpeed, others have a 1-2 year code base

» Complex Heterogeneous Architectures will provide significant challenges on code portability» FPGA code very portable

» X86 Architectures where we are inherently porting from

Page 12: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.12

The FPGA can effectively concatenate functions

» Minimises Memory Thrashing» Critical Bottleneck in today’s Architectures

» Requires Data to be ordered

» The Micropocessor almost becomes the Coprocessor» When viewed from a “Data Centric Perspective”

Decrypt

Data

Ethernet

protocol

Un

Compress

XML

DecodeInternet

Infrastructure

Ethernet

protocol

Function

Hot Spot

Encrypt

Data

Compress

Data

XML

Encode

System

Memory

FPGA

FSB/PCI

FSB

uProcesso

rMain

Algorithmic

Function

Page 13: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.13

A Generic Data Centric Computing Architecture

FPGA Data

Centric

Processor

FPGA Data

Centric

Processor

FPGA Data

Centric

Processor

FPGA Data

Centric

Processor

Internet

X86

Symbolic

Processor

X86

Symbolic

Processor

X86

Symbolic

Processor

X86

Symbolic

Processor

Multimedia

Main Processor &

Co-Processor?!

or Peer Processors?

System Partitioned to

minimise comms

bandwidth

Page 14: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.14

We are entering a Gigabit serial comms world!

» All Processor I/O will be Gigabit serial

» Phy layer will be compatible across memory, system interconnect and I/O

» Software Configurable

» Possibility to choose your architecture that best suits your application.

Memory

I/O

Pro

ce

ss

ors

Storage

= 4 x 5Gb bidirectional serial llinks

Processor

Memory

I/O

Pro

ce

ss

ors

Processor

Memory

I/O I/OProcessor

Storage Storage

Workstation Database Search Network processing

Page 15: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.15

X86 Architecture Migration

Quad Core

Xeon

Processor

MCH /

Northbridge

ChipsetFSB

SDRAM Memory DIMMs

Quad Core

Xeon

Processor

FSB

PCIe

x8

Southbridge

Chipset

PCIe

x8

PCIe

x8

21 GB/s8.5 GB/s

8.5 GB/s

2+2 GB/s

2+2 GB/s

2+2 GB/s

Storage I/O

USB I/O

Graphics I/O

Network I/O

Accelerators

1394 I/O

Wireless I/O

Quad Core

Xeon

Processor

Quad Core

Xeon

Processor

Interfaces already

Gbit Serial

Geneseo PCIe extensions resolving

Latency, Determinism, small & large

packet Bandwidth issues (2010)

Page 16: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.16

Intel & Nallatech – Bringing processor & Data

Centric Architectures together

PCIe

x8

PCIe

x8

Quad Core

Xeon

Processor

MCH /

Northbridge

ChipsetFSB

SDRAM Memory DIMMs

Quad Core

Xeon

Processor

FSB

PCIe

x8

Southbridge

Chipset

21 GB/s8.5 GB/s

8.5 GB/s

2+2 GB/s

2+2 GB/s

2+2 GB/s

Quad Core

Xeon

Processor

Quad Core

Xeon

Processor

Page 17: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.17

Intel & Nallatech – Bringing processor & Data

Centric Architectures together

PCIe

x8

PCIe

x8

Quad Core

Xeon

Processor

MCH /

Northbridge

ChipsetFSB

SDRAM Memory DIMMs

Quad Core

Xeon

Processor

FSB

PCIe

x8

Southbridge

Chipset

21 GB/s8.5 GB/s

8.5 GB/s

2+2 GB/s

2+2 GB/s

2+2 GB/s

Quad Core

Xeon

Processor

Quad Core

Xeon

Processor

Virtex 5

FPGA

LXT110 -

LXT330

Up to

5+5 GB/s

Nallatech’s Data Centric FPGA

Computing Platform – DIME-II Intel’s Processor Centric MP Xeon

Platform

Page 18: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.18

Summary

» Only Two Fundamental Computing Approaches» Processor Centric – x86 Microprocessor

» Data Centric – FPGAs

» HPC Can Learn from HPEC

» FPGAs are by far the best all round accelerator» Commericial viability

» Multi Vendor

» application Flexibility

» extremely complementary Microprocessors

» Nallatech & Intel working together to deliver balanced

FPGA/x86 architectures in early Q1/2008

Page 19: Why FPGAs will win the Accelerator Battle: Building ...rssi.ncsa.illinois.edu/docs/industry/Nallatech_presentation.pdf · FPGA VHDL Code Base 10+ Years » Cell, ClearSpeed, others

Commercial In Confidence. Copyright ©2007, Nallatech.

All rights reserved. Nallatech, the Nallatech logo, the triangles device and “The High Performance FPGA Solutions Company" are trademarks of Nallatech Limited. All other trademarks acknowledged.

www.nallatech.com