14
A Comparison of Five Different Multiprocessor SoC Bus Architectures Kyeong Keol Ryu, Eung Shin and Vincent J. Mooney III School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, USA {kkryu, eung, mooney}@ece.gatech.edu

A Comparison of Five Different Multiprocessor SoC Bus

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Comparison of Five Different Multiprocessor SoC Bus

A Comparison of Five DifferentMultiprocessor SoC Bus Architectures

Kyeong Keol Ryu, Eung Shin and Vincent J. Mooney IIISchool of Electrical and Computer EngineeringGeorgia Institute of Technology, Atlanta, USA

{kkryu, eung, mooney}@ece.gatech.edu

Page 2: A Comparison of Five Different Multiprocessor SoC Bus

Outline

IntroductionMotivation and Previous WorkFive Bus Architectures for SoC:

BFBA, GBIA, GBIIA, CSBA, and CCBA

Application Examples: OFDM transmitter and MPEG2 decoder

Experiment EnvironmentComparison in View of Algorithm and ArchitectureComparison of Throughput of the Bus ArchitecturesConclusion

Page 3: A Comparison of Five Different Multiprocessor SoC Bus

IntroductionMPC750_A SRAM SRAM_A REGISTERS BI-FIFO_A

MPC750_B SRAM SRAM_B REGISTERS BI-FIFO_B

MPC750_C SRAM SRAM_C REGISTERS BI-FIFO_C

MPC750_D SRAM SRAM_D REGISTERS BI-FIFO_D

zz

xx

zz

xx

zz

xx

zz

xx(A) (B)

(A) (B)

CPU Bus A

CPU Bus B

CPU Bus C

CPU Bus D

PCB

Page 4: A Comparison of Five Different Multiprocessor SoC Bus

Motivation and Previous Work (I)

CoreConnect (IBM):Processor Local Bus (PLB)On-chip Peripheral Bus (OPB)

AMBA (ARM):Advanced High-performance Bus (AHB)Advanced Peripheral Bus (APB)

Intellectual Propery (IP)

IP1 IP2 IP3

PLB

IP1 IP2 IP3

AHB

Page 5: A Comparison of Five Different Multiprocessor SoC Bus

Motivation and Previous Work (II)

Sonics uNetworkTDMA arbitrationIP reuse and integration

Whisbone architecture (Silicore)one bus for allsupports multiple masters

In terms of bus topology, uNetwork and Whisbone are similar to AMBA and CoreConnect

Page 6: A Comparison of Five Different Multiprocessor SoC Bus

Five Bus Architectures for 4 processor System (I)

Global Bus I Architecture (GBIA)Bi-FIFO Bus Architecture (BFBA)

MPC750_ASRAM SRAM_A REGISTERS BI-FIFO_A

MPC750_BSRAM SRAM_B REGISTERS BI-FIFO_B

MPC750_C SRAM SRAM_C REGISTERS BI-FIFO_C

MPC750_D SRAM SRAM_D REGISTERS BI-FIFO_D

zz

xx

zz

xx

zz

xx

zz

xx

xx

(A) (B)

(A) (B)

CPU Bus A

CPU Bus B

CPU Bus C

CPU Bus D

Page 7: A Comparison of Five Different Multiprocessor SoC Bus

Crossbar Switch Bus Architecture(CSBA)

IBM CoreConnect Bus Architecture(CCBA)

Global Bus II Architecture (GBIIA)

Five Bus Architectures for 4 processor System (II)

Page 8: A Comparison of Five Different Multiprocessor SoC Bus

Application Examples (I)OFDM Transmitter

Block Diagram

Data Format: 32 guard samples and 128 data samplesFunction Assignment

Reference: D. Kim and G. L. Stüber, ''Performance of Multiresolution OFDM on Frequency-selective Fading Channels,''

IEEE Transaction on Vehicular Technology, vol. 48, no. 5, pp. 1740-1746, September 1999.

A1Pro_A

Pro_B

Pro_C

Pro_D

Time

Compute Node

…..B1

A2

B2

A3

C1

B3

A4

C2

D1

C3

D2

B4

C4

D3 D4

Page 9: A Comparison of Five Different Multiprocessor SoC Bus

Application Examples (II)

MPEG2 DecoderVideo Processing Example

16 x 16 pixel resolution, M=1, N=2

SH I P

SH I P

SH I P

SH I P

SH I P

SH I P

SH I P

SH I P

Pro_A

Pro_B

Pro_C

Pro_D

Time

SH: Sequence header, I: Intra decoding frame, P: Predictive decoding frame

Compute Node

…..SH I P

SH I P

SH I P

SH I P

SH I P

SH I P

SH I P

SH I P

Pro_A

Pro_B

Pro_C

Pro_D

Time

Compute Node

…..

( BFBA and GBIA ) ( GBIIA, CSBA, and CCBA )

Page 10: A Comparison of Five Different Multiprocessor SoC Bus

Experiment EnvironmentCo-simulation Environment

Seamless CVE• co-simulator from Mentor Graphics

VCS• A Verilog HDL simulator from Synopsys

XRAY• A High-level debugger from Mentor Graphics

PowerPC C cross compiler• GCC

External Clock of PowerPC 750• 83.33 MHz (the internal clock speed can be much faster,

e.g., 400MHz)

Page 11: A Comparison of Five Different Multiprocessor SoC Bus

Comparison in View of Algorithm and Architecture

AlgorithmOFDM Transmitter

• Strong output-data dependency between functions using many local variables• Many short loops• Few global variables

MPEG2 Decoder• Many global variables for header information• Hierarchical data structure which has a long loop with many nested loops

ArchitectureBFBA and GBIA

• No method to access global data• Fast data transfer between processor blocks

GBIIA, CSBA, and CCBA• Efficient access of global data

Page 12: A Comparison of Five Different Multiprocessor SoC Bus

Comparison of Throughput of the Bus Architectures (I)

OFDM Transmitter

1.02

1.04

1.06

1.08

1.1

1.12

1.14

BFBA

GBIA

GBIIA

CSBA

CCBA

[Mbps]

1.1208Mbps4.5682 ms380,686CCBA

1.1222Mbps4.5624 ms380,199CSBA

1.1197Mbps4.5727 ms381,061GBIIA

1.0588Mbps4.8360 ms403,000GBIA

1.1277Mbps4.5402 ms378,348BFBA

ThroughputExe.Time/Packet

Exe.Cycles/Packet

BusArchitecture

Reference: 128 data samples and 32 guard samples per packet

Page 13: A Comparison of Five Different Multiprocessor SoC Bus

Comparison of Throughput of the Bus Architectures (II)

MPEG2 Decoder

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

BFBA

GBIA

GBIIA

CSBA

CCBA

[Mbps]

0.6769Mbps4.5382 ms378,181CCBA

0.6781Mbps4.5306 ms377,548CSBA

0.6780Mbps4.5307 ms377,562GBIIA

0.4852Mbps6.3305 ms527,545GBIA

0.5041Mbps6.0942 ms507,853BFBA

ThroughputExe.Time/Packet

Exe.Cycles/Packet

BusArchitecture

Reference: 128 data samples and 32 guard samples per packet

Page 14: A Comparison of Five Different Multiprocessor SoC Bus

ConclusionFive bus architectures evaluated

• BFBA, GBIA, GBIIA, CSBA, and CCBATwo application programs

• OFDM transmitter and MPEG2 decoderPipeline or parallel operation improves performanceBFBA best for OFDM

• pipelined applicationsCSBA best for MPEG2

• parallel applicationsbus architecture performance heavily dependent on

• distribution of computation load• algorithm style

Future work: combine the bus architectures with switching logic to maximize performance according to application characteristics