17
Frank Vahid, 1 Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation Bailey Miller*, Frank Vahid*, Tony Givargis** *Univ. of California, Riverside **Univ. of California, Irvine Supported by the NSF (CNS1016792, CPS1136146), and Semiconductor Research Corporation (GRC 2143.001) Intern at SpaceX

Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

  • Upload
    jereni

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation. Intern at SpaceX. Bailey Miller*, Frank Vahid*, Tony Givargis** *Univ. of California, Riverside **Univ. of California, Irvine. - PowerPoint PPT Presentation

Citation preview

Page 1: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, 1

Embedding-Based Placement of Processing element Networks on

FPGAs for Physical Model Simulation

Bailey Miller*, Frank Vahid*, Tony Givargis**

*Univ. of California, Riverside**Univ. of California, Irvine

Supported by the NSF (CNS1016792, CPS1136146), and Semiconductor Research Corporation (GRC 2143.001)

Intern at SpaceX

Page 2: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Bailey Miller, UCR 2

Page 3: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 3

Models of physical world that run in real-time

http://www.nhlbi.nih.gov/

Test cyber-physical systems

Physical mockup

Transducermodels

Environmentmodel

Digital mockup

Page 4: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Issue: Real-time achieved via inaccuracy

Frank Vahid, UCR 4

Weibel lung complexity4 gen: 32 ODEs6 gen: 128 ODEs8 gen: 512 ODEs10 gen: 2048 ODEs

“2-3 minutes to simulate one breath accurately”

V[1],R[1]

V[2],R[2]V[7],R[7]

Page 5: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 5

0100200300400500600700800900

1000

Weib

el

Neuron

Weib

el + g

as

Hemod

ynam

ic

Weib

el + h

emo

Per

form

ance

(ms)

PC(1)PC(4)GPUHLS / FPGAs

Speedup vs real-time

PC(1): 0.8xPC(4): 3.1xGPU: 1.6xHLS/FPGA: 3.2x

Earlier results14301490 1522

1184

Parallel computations + • Neighbor communication Seem like great match for FPGAs

Page 6: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 6

Network of synchronized PEs on FPGAs

FPGA

Digital mockup

General Processing Element• Iterative ODE solver (Euler/RK4) 0.1 ms / 0.01 ms timestep

PEPE

PE 1 PE: 300 MHz

Page 7: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 7

0100200300400500600700800900

1000

Weib

el

Neuron

Weib

el + g

as

Hemod

ynam

ic

weibel

+ hem

o

Per

form

ance

(ms)

PC(1)PC(4)GPUHLSGeneral PEs

Speedup vs real-timePC(1): 0.8xPC(4): 3.1xGPU: 1.6xHLS: 3.2xGeneral PE: 4.9x

Earlier results1430

1490 1522

1184

Page 8: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 8

Current work Problem: More PEs Lower frequency

FPGA

Inter-PE critical path

FPGA

DSP

INST MEM

DATA MEM

Internal PE critical path

0100200300400

0 200 400 600Freq

uenc

y (M

Hz)

# PEs

Frequency vs. PE network size

11-gen Weibel model, Virtex6 240T FPGA, general PEs

0

500

1000

1500

2000

1 10 20 50 100 200 400

# PEs

kODE

s/se

c

Real ODEs/sec

Lost ODEs/sec due to freq drop

Page 9: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Earlier approach

Frank Vahid, UCR 9

10K iterations

150K iterations

Convert virtual PEs to physical circuits using

FPGA place-route

1

2

Phase

Maps ODEs to virtual PEs using

simulated annealing

Page 10: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 10

FPGA

This work: Main idea

Graph embedding: Map guest graph to host graph, minim. max wire length

Guest

Host

Virtual PEs

Physical PEs

Use physical model structure to avoid using FPGA placement (Phase 2)

Page 11: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Frank Vahid, UCR 11

FPGA

12 3

FPGA FPGA

Phase 2 – Map virtual PEs to physical PEs

Embedding algorithm

H-tree embedding Linear embedding Direct map embedding

Guest

Host

[1] Zienicke, P. 1990. Embeddings of Treelike Graphs into 2-Dimensional Meshes. (WG '90).[2] Aleliunas, R., and Rosenberg, A.L. 1982. On Embedding Rectangular Grids in Square Grids. (Computers ‘82).[3] Berman, F., and Snyder, L. 1987. On mapping parallel algorithms into parallel architectures, (PDC, ‘87).

Page 12: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

2D grid of physical PEs

Frank Vahid, UCR 12

EqP1EqV1

EqP2EqV2

EqP3 EqV3

EqP4EqV4

EqP7EqV7

EqP5EqV5

EqP6EqV6

FPGA

Bypass FPGA placement

EqP1EqV1

EqP4EqV4

EqP2EqV2

EqP5EqV5

EqP2EqV2

EqP6EqV6

EqP7EqV7

(Phase 1: May require "graph folding" first to reduce #PEs)

Page 13: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

FPGA

Compare/backup: Simulated annealing

Frank Vahid, UCR 13

Cost function:

C = w1*sum + w2*max + w3*gapsSum = sum of wire distancesMax = max wire length (Euclidean dist.)Gaps = wires across architectural features

FPGA

P1 P1

P2

Neighbor function: Swap PEs based on distance to neighbors

Page 14: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Results

Frank Vahid, UCR 14

No placement strategy

Simulated annealing placement

Embedding placement

4 generations shown 5 generations shown 5 generations shown

F V
"No placement strategy" shows fewer nodes, misleads viewer into thinking the placement area is smaller. Can you show all 30 nodes for all three strategies?
Page 15: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Results

Frank Vahid, UCR 15

0

100

200

300

400

Model and #PEs

Circ

uit f

requ

ency

(MH

z)

No placement strategy Simulated Annealing Embedding

Not routable

Strategy # LUTS # BRAM #DSP Equivalent LUTs

None 58362 512 256 306682

SA 58567 512 256 306887

Embed 58569 512 256 306889

No impact on size

2D Neuron model - 256PE – Xilinx Virtex6Strategy Total

power (mW)

Dynamic power (mW)

Static power (mW)

None 15525 8744 6481

SA 16604 10013 6590

Embed 19859 12999 6859

20% more powerUsing Xilinx XPower Analyzer

0100200300400

0 200 400 600Freq

uenc

y (M

Hz)

# PEs

Frequency vs. PE network size

Page 16: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Results/Conclusions

Frank Vahid, UCR 16http://www.meti.com/

Speedup vs real-time (avg)

PC(1): 0.8xPC(4): 3.1xGPU: 1.6xHLS: 3.2xGeneral PE: 4.9xGrph emb(GPE): 11.2xCustom PE: 6.1xHeterog PE: 34.5x(Grph emb(HPE): 48.5x)

• Graph emb. gives additional speedup– FPGAs: Fastest cost-effective

execution of physical models–

http://www.youtube.com/watch?v=ThUKVhqoA3Q • Future

– Manycore device– Beyond testing CPS

• Implement end-productsSpeedup / dollar

Heterog PE • 3.0x better than

PC(4)• 4.5x better than GPUCPU (I7-950 + Intel X58 board): $480

GPU(GTX460 + I3-540 + H55 board): $380FPGA (Xilinx Virtex6 240T-2 board): $1800

F V
F V
Bailey, Please include data on algorithm runtime and size. We should almost never only report one metric, everything is a tradeoff, gotta be more complete. Please provide numerical summary in conclusion, including speedups over PC (1), PC(4), and GPU. Perhaps show on original table from Slide 5?
Page 17: Embedding-Based Placement of Processing element Networks on FPGAs for Physical Model Simulation

Questions?

Frank Vahid, UCR 17