1
Reliability is a big concern in Multi-core Processors as the technology nodes are scaling Inter-Core resource sharing increases reliability in presence of fault (e.g. StageNet Architecture) This leads to high communication delay 3D integration is a promising solution to increase density and performance Introduction Objective Architecture Implementation Conclusion MaPnet: A Three-Dimensional Fabric for Reliable Multi-core Processors Javad Bagherzadeh, Sugandha Gupta, Byounchan Oh University of Michigan, Ann Arbor 3D architectures have performance advantage in many core designs (Figure 6) MAPnet design is implemented on a 4-core architecture with simple cores TSV delay and behavior is simulated by SPICE simulations and used to determine routing delay and processor frequency Table 1. Advantages and disadvantages of each design Results Use StageNet based structure with two functionalities in case of failure Arranging healthy resources in different cores to create one healthy pipeline (Virtual Pipeline) Sharing healthy resources in other cores when they are not being used Use 3D integration to reduce communication delay and hence improve performance over 2D design Model Through-Silicon Vias (TSV) to simulate the 3D design and performance Advantages Disadvantages Regular Design -Easy To Design -Slow -Low Scalability Pipelined Design -Higher clock Frequency -Easy Dynamic Reconfiguration -High Flexibility in Sharing Resources -Better scalability -Difficult to Design -Need extra control units -Area Overhead 2 D - Metal line delay 3 D - TSV delay MAPnet Design 2D Core-based 3D Pipeline 3D Clock (No Fault) 2.5 ns (400 MHz) 2.5 ns (400 MHz) 2.5 ns (400 MHz) Clock (Fault Condition) 7 ns (143 MHz) 3.95 ns (250MHz) 3.85 (260MHz) Footprint Area 1200*1200 sq.us 600*600 sq. us Layout Density 71% 67% Scenario Number of Faults Disabled Resources Scenario 1 2 IF0, EX1 Scenario 2 4 IF0, ID1, EX2, WB3 Scenario 3 10 IF0, ID1, EX0, MEM1, WB0, IF2, ID3, EX2, MEM3, WB2 Both 3D designs show performance improvement over the 2D design The throughput increases by 1.74X On comparing the post-APR frequencies in faulty and healthy cores and considering Table 1, we decided to pipeline both architectures to reduce clock frequency Crossbars in 2D and 3D structures need 1 and 2 pipeline stages respectively. The RTL was synthesized using Synopsys Design Compiler and Cadence Encounter was used to create the layout IPC was measured for each scenario Figures 8 reflect the results for baseline 2D and 3D pipelined structures regarding sharing the same resource and no sharing. Each fault forces the processor to reconfigure and adds extra delays cycles into pipeline structure to use resources in other cores We simulated a single core for different test cases and measured the IPC for extra delay cycles added in pipeline structure which is reflected in Figure 7. Improvement with 2D and 3D StageNet compared 2 baseline 16.3% on average and up to 28.2% improvement in 3D compared to 2D area of our 3D design is under than 25% of the traditional 2D design due to reduced interconnection complexity. Figure 2. Three different designs for MAPnet (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline- based 3D structure (a) (b) (c) Figure 3. Single core architecture with connection to crossbar and pipeline registers Figure 5. Layouts for (a) 2D 4-core processor, (b) Basic 1-core and (c) Conceptual 3D 4-core Design (a) (c) (b) Figure 6. Estimated Clock Period for different number of cores for MAPnet – (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure Figure 4. Comparison between routing delay for 2D Metal vs 3D TSVs Table 2. Clock Period results after Place and Route (using Encounter) Table 3. shows 3 different fault scenarios for various fault numbers and locations within pipeline stages ID/E X IF/ ID MEM/ WB EX/ MEM Inputs from other cores Outputs to other cores Inputs from other cores Outputs to other cores FETCH EXECUTE DECODE MEMORY WRITEBACK CORE 0 MaPnet: Using inter-core redundancy to salvage the processor in case of faults. 2D design Core-based 3D design Pipeline-based 3D design ID 1 EX 3 MEM 1 WB 2 IF 0 CORE 3 IF 3 IF 2 IF 1 ID 0 ID 2 ID 3 EX 0 EX 1 EX 2 MEM 0 MEM 2 MEM 3 WB 0 WB 1 WB 3 CORE 2 CORE 1 CORE 0 Figure 1. A 5-satge pipeline structure for 4 cores and switches between stages I F 0 I D 0 E X 0 M E M 0 W B 0 I F 1 I D 1 E X 1 M E M 1 W B 1 CROSS - BAR I F 3 I D 3 E X 3 M E M 3 W B 3 I F 2 I D 2 E X 2 M E M 2 W B 2 CORE 0 CORE 1 CORE 3 CORE 2 0.0 0.2 0.4 0.6 0.8 1.0 IPC Single Pipeline Performance 0-delay 1-delay 2-delay 3-delay 4-delay 5-delay 6-delay 7-delay 8-delay Figure 7. Single Pipeline Performance Figure 8. IPC for different fault scenarios 3D reconfigurable pipeline design, named MaPNet. more-reliable operation, higher performance, lower cost, and/or lower power consumption by taking advantage of the redundancy and capabilities available at each layer in the system stack. EECS 578 Correct Operation for Processors and Embedded Systems Fall 2015 0.0 1.0 2.0 3.0 4.0 5.0 IPC Fault Scenario 1 Baseline No sharing (2D) No sharing (3D) Fully sharing (2D) Fully sharing (3D) 0.0 1.0 2.0 3.0 4.0 5.0 Fault Scenario 2 Baseline No sharing (2D) No sharing (3D) Fully sharing (2D) Fully sharing (3D) 0.0 1.0 2.0 3.0 4.0 5.0 IPC Fault Scenario 3 Baseline No sharing (2D) No sharing (3D) Fully sharing (2D) Fully sharing (3D)

Single Pipeline Performance · Single Pipeline Performance 0-delay 1-delay 2-delay 3-delay 4-delay 5-delay 6-delay 7-delay 8-delay Figure 7. Single Pipeline Performance Figure 8

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Single Pipeline Performance · Single Pipeline Performance 0-delay 1-delay 2-delay 3-delay 4-delay 5-delay 6-delay 7-delay 8-delay Figure 7. Single Pipeline Performance Figure 8

• Reliability is a big concern in Multi-core Processors as the technology nodes are scaling

• Inter-Core resource sharing increases reliability in presence of fault (e.g. StageNet Architecture)

• This leads to high communication delay• 3D integration is a promising solution to increase density and

performance

Introduction

Objective

Architecture

Implementation

Conclusion

MaPnet: A Three-Dimensional Fabric for Reliable Multi-core ProcessorsJavad Bagherzadeh, Sugandha Gupta, Byounchan Oh

University of Michigan, Ann Arbor

• 3D architectures have performance advantage in many core designs (Figure 6)

• MAPnet design is implemented on a 4-core architecture with simple cores

• TSV delay and behavior is simulated by SPICE simulations and used to determine routing delay and processor frequency

Table 1. Advantages and disadvantages of each design

Results

• Use StageNet based structure with two functionalities in case of failure Arranging healthy resources in different cores to create one

healthy pipeline (Virtual Pipeline) Sharing healthy resources in other cores when they are not

being used • Use 3D integration to reduce communication delay and hence

improve performance over 2D design • Model Through-Silicon Vias (TSV) to simulate the 3D design and

performance

Advantages Disadvantages

Regular Design -Easy To Design -Slow

-Low Scalability

Pipelined Design -Higher clock Frequency

-Easy Dynamic

Reconfiguration

-High Flexibility in Sharing

Resources

-Better scalability

-Difficult to Design

-Need extra control units

-Area Overhead

2D-Metal line delay

3D-TSV delay

MAPnet Design 2D Core-based 3D Pipeline 3D

Clock

(No Fault)

2.5 ns

(400 MHz)

2.5 ns

(400 MHz)

2.5 ns

(400 MHz)

Clock

(Fault Condition)

7 ns

(143 MHz)

3.95 ns

(250MHz)

3.85

(260MHz)

Footprint Area 1200*1200 sq.us 600*600 sq. us

Layout Density 71% 67%

Scenario Number of Faults Disabled Resources

Scenario 1 2 IF0, EX1

Scenario 2 4 IF0, ID1, EX2, WB3

Scenario 3 10 IF0, ID1, EX0, MEM1,

WB0, IF2, ID3, EX2,

MEM3, WB2

• Both 3D designs show performance improvement over the 2D design

• The throughput increases by 1.74X• On comparing the post-APR frequencies in

faulty and healthy cores and considering Table 1, we decided to pipeline both architectures to reduce clock frequency

• Crossbars in 2D and 3D structures need 1 and 2 pipeline stages respectively.

• The RTL was synthesized using Synopsys Design Compiler and Cadence Encounter was used to create the layout

• IPC was measured for each scenario

• Figures 8 reflect the results for baseline 2D and 3D pipelined structures regarding sharing the same resource and no sharing.

• Each fault forces the processor to reconfigure and adds extra delays cycles into pipeline structure to use resources in other cores

• We simulated a single core for different test cases and measured the IPC for extra delay cycles added in pipeline structure which is reflected in Figure 7.

• Improvement with 2D and 3D StageNet compared 2 baseline

• 16.3% on average and up to 28.2% improvement in 3D compared to 2D

• area of our 3D design is under than 25% of the traditional 2D design due to reduced interconnection complexity.

Figure 2. Three different designs for MAPnet (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure

(a) (b) (c)

Figure 3. Single core architecture with connection to crossbar and pipeline registers

Figure 5. Layouts for (a) 2D 4-core processor, (b) Basic 1-core and (c) Conceptual 3D 4-core Design

(a)

(c)

(b)

Figure 6. Estimated Clock Period for different number of cores for MAPnet – (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure

Figure 4. Comparison between routing delay for 2D Metal vs 3D TSVs

Table 2. Clock Period results after Place and Route (using Encounter)

Table 3. shows 3 different fault scenarios for various fault numbers and locations within pipeline stages

ID/EX

IF/ID

MEM/WB

EX/MEM

Inputs from other cores

Outputs to other cores

Inputs from other coresOutputs to other cores

FETCH EXECUTEDECODE MEMORY WRITEBACK

CORE 0

• MaPnet: Using inter-core redundancy to salvage the processor in case of faults. 2D design Core-based 3D design Pipeline-based 3D design

ID 1

EX 3

MEM 1

WB 2

IF 0

CORE 3

IF 3

IF 2

IF 1

ID 0

ID 2

ID 3

EX 0

EX 1

EX 2

MEM 0

MEM 2

MEM 3

WB 0

WB 1

WB 3

CORE 2

CORE 1

CORE 0

Figure 1. A 5-satge pipeline structure for 4 cores and switches between stages

I

F

0

I

D

0

E

X

0

M

E

M

0

W

B

0

I

F

1

I

D

1

E

X

1

M

E

M

1

W

B

1

CROSS-

BAR

I

F

3

I

D

3

E

X

3

M

E

M

3

W

B

3

I

F

2

I

D

2

E

X

2

M

E

M

2

W

B

2

CORE 0 CORE 1

CORE 3CORE 2

0.0

0.2

0.4

0.6

0.8

1.0

IPC

Single Pipeline Performance0-delay 1-delay 2-delay 3-delay 4-delay 5-delay 6-delay 7-delay 8-delay

Figure 7. Single Pipeline Performance

Figure 8. IPC for different fault scenarios

• 3D reconfigurable pipeline design, named MaPNet.

• more-reliable operation, higher performance, lower cost, and/or lower power consumption by taking advantage of the redundancy and capabilities available at each layer in the system stack.

EECS 578 Correct Operation for Processors and Embedded Systems

Fall 2015

0.0

1.0

2.0

3.0

4.0

5.0

IPC

Fault Scenario 1

Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)

0.0

1.0

2.0

3.0

4.0

5.0

IPC

Fault Scenario 2

Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)

0.0

1.0

2.0

3.0

4.0

5.0

IPC

Fault Scenario 3

Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)