Transcript

SmartCore system for Dependable Many-core Processor with Multifunction Routers

Shinya Takamaeda†, Shimpei Sato†, Takefumi Miyoshi‡, Kenji Kise†

†Tokyo Institute of Technology, Japan ‡The University of Electro-communications, Japan

10-11-18 ICNC’10 @Hiroshima Regular Paper Hardware Design and Implementation 14:50-15:20

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 2

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 3

Many-core Processors appear!

10-11-18

Intel Single Chip Cloud Computer 48 cores (x86)

TILERA TILE-Gx100 100 cores (MIPS)

ICNC'10 4

Inter-connection for Many-core processors  NoC (Network on Chip)

 Data transmission via on-chip-routers

10-11-18 ICNC'10 5

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

Low Dependability on Many-core  Process technology scaling for more transistors

 But it increases …  Soft errors (e.g. bit inversion)

•  since cosmic radiations

 Timing errors •  since variations in transistor characteristic or wire delay

10-11-18 ICNC'10

How to create a reliable Many-core processor?

6

Circuit

Micro-architecture

Architecture

Software

Assurance of the reliability on each layer

10-11-18 ICNC'10 7

Razor-FF

Lock-step

Check-pointing / Re-execution

Inter-connection SmartCore system

Canary-FF

ECC in DRAM Memory Architectural Core Salvaging

Slip Stream Processor

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 8

We propose the SmartCore system  SmartCore system

= Smart many-core system with redundant cores and multifunction routers

 Key: NoC-based DMR  To detect a error,

compare the output packets from the pair

 On-chip router has 3 special functions

•  Copy a packet •  Change the destination •  Wait and Compare

2 packets

10-11-18 ICNC'10

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

Handling the same packets

by packet coping

Running the same thread (DMR)

Running the single thread

(DMR)

sharing a packet /

comparing 2 packets

9

Base many-core architecture: M-Core [1]  2D mesh network

connects Nodes  Each Node memory

is independent  Inter-Node

communication  DMA via packets

using ID  A packet is a series of

flits (Flow Control Unit) •  Only the head flit of

a packet contains the destination

10-11-18 ICNC'10

Node (2, 1)

INCC Node memory

Core

Comp. Node (1,1)

Comp. Node (1,2)

Comp. Node (1,8)

Comp. Node (2,1)

Comp. Node (2,2)

Comp. Node (2,8)

Comp. Node (3,1)

Comp. Node (3,2)

Comp. Node (3,8)

Comp. Node (8,1)

Comp. Node (8,1)

Comp. Node (8,8)

Operation Node (0,0)

Memory Node (1,0)

Off chip memory modules and switch Conventional I/O

Many-core processor chip

Memory Node (2,0)

Memory Node (3,0)

Memory Node (8,0)

Node (1, 1)

INCC Node memory

Core

Router Router

10

DMR on two nodes by using SmartCore  Executing a same program binary on the pair

 Master Node and Mirror Node   If generated packets are different, they are faulty

 Packet coping on the Router of the Master   for the Mirror to use the same data as Master

 Packet comparison on the Router of Master   If these two differ, then the Router detects a error

10-11-18 ICNC'10 11

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

PE

R

Master Node Mirror Node

Node (1,1) Node (2,1) Node (3,1) Node (4,1)

Node (1,2) Node (2,2) Node (3,2) Node (4,2)

Logically Node (1,1)

1. Coping a packet to the Mirror Node  Router on Master Node copies a coming packet

to the Mirror Node  The destination is changed to the Mirror Node’s ID  Original program has a several DMA communications  To certainly continue executing the same program in

the two Node

10-11-18 ICNC'10

INCC INCC

R R

Master Mirror

P P

12

2. Wait for a packet from the Mirror Node 3. Compare the contents of two packets  Router on Master Node waits a packet from

Master Node and a packet from Mirror Node  When Router on Master receives the head flits

from both Nodes, then it starts to compare the 2 flits in order  If the contents of flits differ, a error exists in either

Master Node or Mirror Node

10-11-18 ICNC'10

INCC INCC

R R

Master Mirror P P

13

Base router  5 inputs with input buffers / 5 outputs  X-Y Dimension-order routing  Wormhole switching, Xon/Xoff flow control  1hop/1cycle, single cycle, no virtual channels

10-11-18 ICNC'10

Router

XBAR Switch

Output port X+

Output port X-

Output port Y+

Output port Y-

Output port DMAC

Input port X+

Input port X-

Input port Y+

Input port Y-

Input port DMAC

Arbiter

14

  Additional buffer for coping for Mirror Node (a)   ID translator to change the destination (b)   Flit comparator to verify (c)   Node type, Master/Mirror Node ID

  Configured by system software

Multifunction router for SmartCore system

10-11-18 ICNC'10

Output port INCC Input port INCC

Router

XBAR Switch

Output port X+

Output port X-

Output port Y+

Output port Y-

Input port X+

Input port X-

Input port Y+

Input port Y-

Arbiter

node type master / mirror ID

V Verify

ID translation

(a) (b)

(c)

15

Advantages of SmartCore system  Adaptable to any kind of hardware modules

generating a packet  ex) Cache, DSP, Processor core

 Because of …  Error detection mechanism is independent to

Node structure •  Core-granularity redundant execution /

Packet level error detection

10-11-18 ICNC'10 16

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 17

Preliminary Evaluation of SmartCore system  2 evaluations

 Performance overhead on DMR  Packet rendezvous time

 Environment: SimMc 1.0  64 (8×8) threads on 128 (16×8) Nodes  Core

•  MIPS32 single issue / single cycle processor  Router

•  1 hop / 1 cycle, no virtual channels, flit size: 4 bytes  INCC (Network Interface)

•  up to 1 flit / cycle receive/send from/to router  Benchmark: 4 apps from NAS Parallel Benchmarks

•  cg, ft, is, lu, Size: S

10-11-18 ICNC'10 18

Node (X, Y)

INCC Node memory

Core

Router

3 configurations of thread mapping

10-11-18 ICNC'10 19

1,1

1,2

1,8

2,1

2,2

2,8

8,1

8,2

8,8

8 Nodes 8 Nodes

8 N

odes

1,1

1,2

1,8

2,1

2,2

2,8

8,1

8,2

8,8

8 N

odes

16 Nodes

1,1

1,2

1,8

1,1

1,2

1,8

2,1

2,2

2,8

2,1

2,2

2,8

8,1

8,2

8,8

8,1

8,2

8,8

8 N

odes

16 Nodes

(a) Base Allocation

(b) Redundant space allocation (Area 2x) (c) Redundant execution with SmartCore system

x,y Proper thread (Master Node)

Redundant thread (Mirror Node) x,y

Not working

to see the effect on #hops to see the effect on SmartCore

Evaluation: Performance overhead on DMR  A little slow down

 Redundant space (Area 2x): up to 1% slow down  Redundant execution (SmartCore): up to 4% slow

down (in cg of NPB)

10-11-18 ICNC'10 20

Evaluation: Packet rendezvous time  Cumulative distribution of # cycles that the router

on Master Node waits for a packet from Mirror Node

 Almost communications with a little rendezvous

10-11-18 ICNC'10

cg ft

is lu

21

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 22

Hardware Implementation on FPGAs  Dependable Many-core processor on FPGA-

based prototyping system  by using ScalableCore system [8]

•  Connected FPGA boards

•  Variable # FPGA boards

 2 execution mode •  Normal Mode

–  Standard M-Core •  SmartCore Mode

–  The pair executes same thread

10-11-18 ICNC'10 23

SD

Loader (0,1)

PhysicalID (1,1)

Path (0,2)

PhysicalID (1,2)

PhysicalID (2,1)

PhysicalID (2,2)

PhysicalID (3,1)

PhysicalID (3,2)

PhysicalID (4,1)

PhysicalID (4,2)

Path (0,3)

PhysicalID (1,3)

PhysicalID (2,3)

PhysicalID (3,3)

PhysicalID (4,3)

LogicalID (1,1)

LogicalID (1,2)

LogicalID (1,3)

LogicalID (2,1)

LogicalID (2,2)

LogicalID (2,3)

Power Master Mirror Master Mirror

Overview of 15 Nodes ScalableCore system with SmartCore system

10-11-18 ICNC'10 24

Logical ID (1,1)

Master Mirror

Logical ID (1,2)

Master Mirror

Logical ID (1,3)

Master Mirror

Logical ID (2,1)

Master Mirror

Logical ID (2,2)

Master Mirror

Logical ID (2,3)

Master Mirror

Program Loader ID (0,1)

SmartCore system detects a artificial fault

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 25

Related work  Slipstream Processor [9, Karthik, ASPLOS2000]

 Improving ILP and dependability by using tightly coupled two cores

 2 threads •  Proper sequence and shorter sequence

 Loose Lock-stepped system [10, Nidhi, ISCA2007]  Dividing cores, cache, main memory into two groups  I/O level error detection

 Lockstep [11, IBM]  Redundant execution on synchronized processors  I/O level error detection

10-11-18 ICNC'10 26

Contents  Motivation  Proposal: SmartCore system  Preliminary Evaluation  Hardware Implementation on FPGAs  Related Work  Conclusion

10-11-18 ICNC'10 27

Conclusion  We propose the SmartCore system

 NoC-based DMR by using multifunction routers  Multifunction router has 3 special functions

•  Coping a packet •  Changing the destination of a packet •  Waiting and comparing the contents of two packets

 Low performance overhead  Hardware implementation on FPGA-based

prototyping system  Future works

 Recovery after error detections  TMR by SmartCore system

10-11-18 ICNC'10 28