SmartCore System for Dependable Many-core Processor with Multifunction Routers (in ICNC'10 Hiroshima)

  • View
    783

  • Download
    2

Embed Size (px)

Text of SmartCore System for Dependable Many-core Processor with Multifunction Routers (in ICNC'10...

  • 1. SmartCore system for Dependable Many-core Processor with Multifunction Routers Shinya Takamaeda, Shimpei Sato, Takefumi Miyoshi, Kenji Kise Tokyo Institute of Technology, Japan The University of Electro-communications, Japan 10-11-18 ICNC10 @Hiroshima Regular Paper Hardware Design and Implementation 14:50-15:20

2. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'102 3. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'103 4. Many-core Processors appear! 10-11-18 Intel Single Chip Cloud Computer 48 cores (x86) TILERA TILE-Gx100 100 cores (MIPS) ICNC'104 5. Inter-connection for Many-core processors NoC (Network on Chip) Data transmission via on-chip-routers 10-11-18 ICNC'105 R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE R PE 6. Low Dependability on Many-core Process technology scaling for more transistors But it increases Soft errors (e.g. bit inversion) since cosmic radiations Timing errors since variations in transistor characteristic or wire delay 10-11-18 ICNC'10 How to create a reliable Many-core processor? 6 7. Circuit Micro-architecture Architecture Software Assurance of the reliability on each layer 10-11-18 ICNC'107 Razor-FF Lock-step Check-pointing / Re-execution Inter-connection SmartCore system Canary-FF ECC in DRAM Memory Architectural Core Salvaging Slip Stream Processor 8. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'108 9. We propose the SmartCore system SmartCore system = Smart many-core system with redundant cores and multifunction routers Key: NoC-based DMR To detect a error, compare the output packets from the pair On-chip router has 3 special functions Copy a packet Change the destination Wait and Compare 2 packets 10-11-18 ICNC'10 PE R PE R PE R PE R PE R PE R Handling the same packets by packet coping Running the same thread (DMR) Running the single thread (DMR) sharing a packet / comparing 2 packets 9 10. Base many-core architecture: M-Core [1] 2D mesh network connects Nodes Each Node memory is independent Inter-Node communication DMA via packets using ID A packet is a series of flits (Flow Control Unit) Only the head flit of a packet contains the destination 10-11-18 ICNC'10 Node (2, 1) INCC Node memory Core Comp. Node (1,1) Comp. Node (1,2) Comp. Node (1,8) Comp. Node (2,1) Comp. Node (2,2) Comp. Node (2,8) Comp. Node (3,1) Comp. Node (3,2) Comp. Node (3,8) Comp. Node (8,1) Comp. Node (8,1) Comp. Node (8,8) Operation Node (0,0) Memory Node (1,0) Off chip memory modules and switch Conventional I/O Many-core processor chip Memory Node (2,0) Memory Node (3,0) Memory Node (8,0) Node (1, 1) INCC Node memory Core Router Router 10 11. DMR on two nodes by using SmartCore Executing a same program binary on the pair Master Node and Mirror Node If generated packets are different, they are faulty Packet coping on the Router of the Master for the Mirror to use the same data as Master Packet comparison on the Router of Master If these two differ, then the Router detects a error 10-11-18 ICNC'1011 PE R PE R PE R PE R PE R PE R PE R PE R Master Node Mirror Node Node (1,1) Node (2,1) Node (3,1) Node (4,1) Node (1,2) Node (2,2) Node (3,2) Node (4,2) Logically Node (1,1) 12. 1. Coping a packet to the Mirror Node Router on Master Node copies a coming packet to the Mirror Node The destination is changed to the Mirror Nodes ID Original program has a several DMA communications To certainly continue executing the same program in the two Node 10-11-18 ICNC'10 INCCINCC R R Master Mirror P P 12 13. 2. Wait for a packet from the Mirror Node 3. Compare the contents of two packets Router on Master Node waits a packet from Master Node and a packet from Mirror Node When Router on Master receives the head flits from both Nodes, then it starts to compare the 2 flits in order If the contents of flits differ, a error exists in either Master Node or Mirror Node 10-11-18 ICNC'10 INCCINCC R R Master MirrorP P 13 14. Base router 5 inputs with input buffers / 5 outputs X-Y Dimension-order routing Wormhole switching, Xon/Xoff flow control 1hop/1cycle, single cycle, no virtual channels 10-11-18 ICNC'10 Router XBAR Switch Output port X+ Output port X- Output port Y+ Output port Y- Output port DMAC Input port X+ Input port X- Input port Y+ Input port Y- Input port DMAC Arbiter 14 15. Additional buffer for coping for Mirror Node (a) ID translator to change the destination (b) Flit comparator to verify (c) Node type, Master/Mirror Node ID Configured by system software Multifunction router for SmartCore system 10-11-18 ICNC'10 Output port INCCInput port INCC Router XBAR Switch Output port X+ Output port X- Output port Y+ Output port Y- Input port X+ Input port X- Input port Y+ Input port Y- Arbiter node type master / mirror ID V Verify ID translation (a)(b) (c) 15 16. Advantages of SmartCore system Adaptable to any kind of hardware modules generating a packet ex) Cache, DSP, Processor core Because of Error detection mechanism is independent to Node structure Core-granularity redundant execution / Packet level error detection 10-11-18 ICNC'1016 17. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'1017 18. Preliminary Evaluation of SmartCore system 2 evaluations Performance overhead on DMR Packet rendezvous time Environment: SimMc 1.0 64 (88) threads on 128 (168) Nodes Core MIPS32 single issue / single cycle processor Router 1 hop / 1 cycle, no virtual channels, flit size: 4 bytes INCC (Network Interface) up to 1 flit / cycle receive/send from/to router Benchmark: 4 apps from NAS Parallel Benchmarks cg, ft, is, lu, Size: S 10-11-18 ICNC'1018 Node (X, Y) INCC Node memory Core Router 19. 3 configurations of thread mapping 10-11-18 ICNC'1019 1,1 1,2 1,8 2,1 2,2 2,8 8,1 8,2 8,8 8 Nodes 8 Nodes 8Nodes 1,1 1,2 1,8 2,1 2,2 2,8 8,1 8,2 8,8 8Nodes 16 Nodes 1,1 1,2 1,8 1,1 1,2 1,8 2,1 2,2 2,8 2,1 2,2 2,8 8,1 8,2 8,8 8,1 8,2 8,8 8Nodes 16 Nodes (a) Base Allocation (b) Redundant space allocation (Area 2x) (c) Redundant execution with SmartCore system x,y Proper thread (Master Node) Redundant thread (Mirror Node) x,y Not working to see the effect on #hops to see the effect on SmartCore 20. Evaluation: Performance overhead on DMR A little slow down Redundant space (Area 2x): up to 1% slow down Redundant execution (SmartCore): up to 4% slow down (in cg of NPB) 10-11-18 ICNC'1020 21. Evaluation: Packet rendezvous time Cumulative distribution of # cycles that the router on Master Node waits for a packet from Mirror Node Almost communications with a little rendezvous 10-11-18 ICNC'10 cg ft is lu 21 22. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'1022 23. Hardware Implementation on FPGAs Dependable Many-core processor on FPGA- based prototyping system by using ScalableCore system [8] Connected FPGA boards Variable # FPGA boards 2 execution mode Normal Mode Standard M-Core SmartCore Mode The pair executes same thread 10-11-18 ICNC'1023 SD Loader (0,1) Physical ID (1,1) Path (0,2) Physical ID (1,2) Physical ID (2,1) Physical ID (2,2) Physical ID (3,1) Physical ID (3,2) Physical ID (4,1) Physical ID (4,2) Path (0,3) Physical ID (1,3) Physical ID (2,3) Physical ID (3,3) Physical ID (4,3) LogicalID (1,1) LogicalID (1,2) LogicalID (1,3) LogicalID (2,1) LogicalID (2,2) LogicalID (2,3) Power Master Mirror Master Mirror 24. Overview of 15 Nodes ScalableCore system with SmartCore system 10-11-18 ICNC'1024 Logical ID (1,1) Master Mirror Logical ID (1,2) Master Mirror Logical ID (1,3) Master Mirror Logical ID (2,1) Master Mirror Logical ID (2,2) Master Mirror Logical ID (2,3) Master Mirror Program Loader ID (0,1) SmartCore system detects a artificial fault 25. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'1025 26. Related work Slipstream Processor [9, Karthik, ASPLOS2000] Improving ILP and dependability by using tightly coupled two cores 2 threads Proper sequence and shorter sequence Loose Lock-stepped system [10, Nidhi, ISCA2007] Dividing cores, cache, main memory into two groups I/O level error detection Lockstep [11, IBM] Redundant execution on synchronized processors I/O level error detection 10-11-18 ICNC'1026 27. Contents Motivation Proposal: SmartCore system Preliminary Evaluation Hardware Implementation on FPGAs Related Work Conclusion 10-11-18 ICNC'1027 28. Conclusion We propose the SmartCore system NoC-based DMR by using multifunction routers Multifunction router has 3 special functions Coping a packet Changing the destination of a packet Waiting and comparing the contents of two packets Low performance overhead Hardware implementation on FPGA-based prototyping system Future works Recovery after error detections TMR by SmartCore system 10-11-18 ICNC'1028