Single Pipeline Performance · Single Pipeline Performance 0-delay 1-delay 2-delay 3-delay 4-delay...

Preview:

Citation preview

• Reliability is a big concern in Multi-core Processors as the technology nodes are scaling

• Inter-Core resource sharing increases reliability in presence of fault (e.g. StageNet Architecture)

• This leads to high communication delay• 3D integration is a promising solution to increase density and

performance

Introduction

Objective

Architecture

Implementation

Conclusion

MaPnet: A Three-Dimensional Fabric for Reliable Multi-core ProcessorsJavad Bagherzadeh, Sugandha Gupta, Byounchan Oh

University of Michigan, Ann Arbor

• 3D architectures have performance advantage in many core designs (Figure 6)

• MAPnet design is implemented on a 4-core architecture with simple cores

• TSV delay and behavior is simulated by SPICE simulations and used to determine routing delay and processor frequency

Table 1. Advantages and disadvantages of each design

Results

• Use StageNet based structure with two functionalities in case of failure Arranging healthy resources in different cores to create one

healthy pipeline (Virtual Pipeline) Sharing healthy resources in other cores when they are not

being used • Use 3D integration to reduce communication delay and hence

improve performance over 2D design • Model Through-Silicon Vias (TSV) to simulate the 3D design and

performance

Advantages Disadvantages

Regular Design -Easy To Design -Slow

-Low Scalability

Pipelined Design -Higher clock Frequency

-Easy Dynamic

Reconfiguration

-High Flexibility in Sharing

Resources

-Better scalability

-Difficult to Design

-Need extra control units

-Area Overhead

2D-Metal line delay

3D-TSV delay

MAPnet Design 2D Core-based 3D Pipeline 3D

Clock

(No Fault)

2.5 ns

(400 MHz)

2.5 ns

(400 MHz)

2.5 ns

(400 MHz)

Clock

(Fault Condition)

7 ns

(143 MHz)

3.95 ns

(250MHz)

3.85

(260MHz)

Footprint Area 1200*1200 sq.us 600*600 sq. us

Layout Density 71% 67%

Scenario Number of Faults Disabled Resources

Scenario 1 2 IF0, EX1

Scenario 2 4 IF0, ID1, EX2, WB3

Scenario 3 10 IF0, ID1, EX0, MEM1,

WB0, IF2, ID3, EX2,

MEM3, WB2

• Both 3D designs show performance improvement over the 2D design

• The throughput increases by 1.74X• On comparing the post-APR frequencies in

faulty and healthy cores and considering Table 1, we decided to pipeline both architectures to reduce clock frequency

• Crossbars in 2D and 3D structures need 1 and 2 pipeline stages respectively.

• The RTL was synthesized using Synopsys Design Compiler and Cadence Encounter was used to create the layout

• IPC was measured for each scenario

• Figures 8 reflect the results for baseline 2D and 3D pipelined structures regarding sharing the same resource and no sharing.

• Each fault forces the processor to reconfigure and adds extra delays cycles into pipeline structure to use resources in other cores

• We simulated a single core for different test cases and measured the IPC for extra delay cycles added in pipeline structure which is reflected in Figure 7.

• Improvement with 2D and 3D StageNet compared 2 baseline

• 16.3% on average and up to 28.2% improvement in 3D compared to 2D

• area of our 3D design is under than 25% of the traditional 2D design due to reduced interconnection complexity.

Figure 2. Three different designs for MAPnet (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure

(a) (b) (c)

Figure 3. Single core architecture with connection to crossbar and pipeline registers

Figure 5. Layouts for (a) 2D 4-core processor, (b) Basic 1-core and (c) Conceptual 3D 4-core Design

(a)

(c)

(b)

Figure 6. Estimated Clock Period for different number of cores for MAPnet – (a) 2D structure with crossbars (b) Core-based 3D structure (c) Pipeline-based 3D structure

Figure 4. Comparison between routing delay for 2D Metal vs 3D TSVs

Table 2. Clock Period results after Place and Route (using Encounter)

Table 3. shows 3 different fault scenarios for various fault numbers and locations within pipeline stages

ID/EX

IF/ID

MEM/WB

EX/MEM

Inputs from other cores

Outputs to other cores

Inputs from other coresOutputs to other cores

FETCH EXECUTEDECODE MEMORY WRITEBACK

CORE 0

• MaPnet: Using inter-core redundancy to salvage the processor in case of faults. 2D design Core-based 3D design Pipeline-based 3D design

ID 1

EX 3

MEM 1

WB 2

IF 0

CORE 3

IF 3

IF 2

IF 1

ID 0

ID 2

ID 3

EX 0

EX 1

EX 2

MEM 0

MEM 2

MEM 3

WB 0

WB 1

WB 3

CORE 2

CORE 1

CORE 0

Figure 1. A 5-satge pipeline structure for 4 cores and switches between stages

I

F

0

I

D

0

E

X

0

M

E

M

0

W

B

0

I

F

1

I

D

1

E

X

1

M

E

M

1

W

B

1

CROSS-

BAR

I

F

3

I

D

3

E

X

3

M

E

M

3

W

B

3

I

F

2

I

D

2

E

X

2

M

E

M

2

W

B

2

CORE 0 CORE 1

CORE 3CORE 2

0.0

0.2

0.4

0.6

0.8

1.0

IPC

Single Pipeline Performance0-delay 1-delay 2-delay 3-delay 4-delay 5-delay 6-delay 7-delay 8-delay

Figure 7. Single Pipeline Performance

Figure 8. IPC for different fault scenarios

• 3D reconfigurable pipeline design, named MaPNet.

• more-reliable operation, higher performance, lower cost, and/or lower power consumption by taking advantage of the redundancy and capabilities available at each layer in the system stack.

EECS 578 Correct Operation for Processors and Embedded Systems

Fall 2015

0.0

1.0

2.0

3.0

4.0

5.0

IPC

Fault Scenario 1

Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)

0.0

1.0

2.0

3.0

4.0

5.0

IPC

Fault Scenario 2

Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)

0.0

1.0

2.0

3.0

4.0

5.0

IPC

Fault Scenario 3

Baseline No sharing (2D) No sharing (3D)Fully sharing (2D) Fully sharing (3D)

Recommended