A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit

A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit

Based on a ENEL619.23 white paperprepared by Darrell Anklovitch

Overview

• Architecture Overview

• Register Map

• ALU features and sample instructions

• Multiplier features and sample instructions

• Shifter features and sample instructions

References

• ADSP-BF535 Blackfin Processor Hardware Reference, Rev 2, April 2004, Analog Devices. – Section 2

• Blackfin Processor Instruction Set Reference, Rev 2, May 2003, Analog Devices. – Sections 8 ~ 10, 14 & 15

• A number of the figures in this presentation are based on figures found in the ADSP-BF535 Blackfin Processor Hardware Reference.

ADSP-2106x Core ArchitectureADSP-2106x Core Architecture

DAG 2

8 x 4 x 24

DAG 1

8 x 4 x 32

CACHE

MEMORY

32 x 48

PROGRAM

SEQUENCER

PMD BUS

DMD BUS

24PMA BUS

PMD

DMD

PMA

32DMA BUSDMA

48

40

JTAG TEST &

EMULATION

FLAGS

FLOATING & FIXED-POINT

MULTIPLIER,

FIXED-POINT

ACCUMULATOR

32-BIT

BARREL

SHIFTER

FLOATING-POINT

& FIXED-POINT

ALU

REGISTER

FILE

16 x 40

BUS CONNECT

TIMER

Register File and COMPUTE Units

• Key issues– 5 data paths FROM COMPUTE units

– 5 data paths TO COMPUTE units

– Highly parallel operations UNDER THE RIGHT CONDITIONS

BF533 Memory Accesses

Under the right conditions -- 4 memory accesses at same time64 bit Instruction Fetch, 2x32 bit Data Loads, 32 bit Data Store

PLUS up to 2 ALU(32 bit) and 2 MAC(16 bit) operations at the same timePLUS background DMA activity

Compute Unit Architecture

2 Multipliers

2 ALUs

1 set of Video ALUs1

Shifter

RegisterFile

Register File

8 x 32 bit OR

16 x 16 bit

2 x 40 bitaccumulators

DATA REGISTER SYNTAX:•R0, R1 etc refer to 32 bit registers•R0.L refers to the low 16 bits of the R0 32 bit reg•R0.H refers to the high 16 bits of the R0 registerACCUMULATOR SYNTAX:•A0.L => low 16 bits•A0.H => next 16 bits•A0.W => least significant 32 bit word•A0.X => MS 8 bit extension

SHARC – 16 32-bit data registers, integer and floatThere is a pair of SHARC accumulator registers too

ALU Data Flow2 x 32 bit paths to dualMultiplier/ALU units

2 x 32 bit paths back to register file

Sample instructions

BlackfinR0 = R1 + R2;

R0.L = R1.L + R2.H;

R0 = R1 +|- R2;

Means

R0.L = R1.L – R2.Lin parallel withR0.H = R1.H + R2.H

SHARCR0 = R1 + R2;

Closest

R0 = R1 + R2, R4 = R1 – R2;

68KMOVE.L R2, R0ADD.L R1, R0

MOVE.W R2, R0ADD.W R1, R0

MOVE.L R2, R0ASR.L #16, R0MOVE.L R1, R3ASR.L #16, R3ADD.W R3, R0ASL.L #16, R0MOVE.W R2, R0ADD.W R1, R0

ALU Features

Dual 16 bit OPS:

Can be :

Single 16 bit OPS:

Single 32 bit OPS:

31

31

Rm

Rp

Rn

Rm

Rp

Rn

Dual 16 bit Cross:

ALU Sample InstructionsSingle 16 bit ops: Dual 16 bit ops:

Quad 16 bit ops:

A B A BDC

Single 32 bit ops:

Dual 32 bit ops:

•A & B registers must stay on the same side of the ‘|’ for bothInstructions•For dual and quad 16 bit operations the (CO) option causes the destination registers to cross

Operator order is important+ must come before -

Does not work in parallelMust have this option

Multiply Data Flow2 x 32 bit paths to dualMultiplier/ALU units

2 x 32 bit paths back to register file

2 x 40 bitaccumulator

Multiplier share the same operand/result buses as the ALU

Multiply Features

H H

H L

L H

L L

•Multiplies are signed fractional by default•Signed fractional multiply result is automatically leftshifted 1 bit. •Signed fractional multiply != signed integer multiply•Rounding available on fractional number multiplies andspecial option of integer number multiplies

Rounding2 cases:

0x8000

31

Rd

top 16 bits go to destination register

31

Rm31

Rp

0x8000

31

Rd

top 16 bits go to destination register

32 bit result

Rounding adds 0x8000 to the 32 bit multiplier result oraccumulator value before extracting a 16 bit value to thedestination register

Fractional Multiply

•When extracting a 16 bit fractional value from an accumulator the high 16 bits is taken•Where in the destination register it goes depends on whichaccumulator is being extracted from

Fractional Multiply !=Integer Multiply


Integer Multiply

•When extracting a 16 bit integer value from an accumulatorthe low 16 bits is taken.•Where in the destination register the 16 bit value goes depends on which accumulator is being extracted from


Multiply Sample Instructions16 bit extraction from ACC 0 16 bit extraction from ACC 1

32 bit extraction A1 += R1.H * R2.L , A0 += R1.L * R2.L;R3.H = (A1 += R1.H * R2.L) , R3.L = (A0 += R1.L * R2.L);Any combination of .H and .L in the 2 operands is allowed

R3 = (A1 += R1.H*R2.L), R2 = (A0 += R1.L * R2.L);Where destination registers must be paired as follows: R[1,0], R[3,2], R[5,4] and R[7,6]

R3.H = (A1 += R1.H * R2.L), A0 += R1.L * R2.L;

Multi-issue MAC Instruction Examples

Shifter Sample Instructions

2 operatorRegistershifts

2 operatorImmediateshifts

3 opRegshift

3 opImmediateshift

Arithmetic shift

Parallel Instruction Examples• In general there are 16 and 32 bit versions of

the arithmetic instructions• Most of the 32 bit instructions can be

executed in parallel with 2 x 16 bit memory/index operations

• Exceptions are DIVS, DIVQ and MULTIPLY with 32 bit operands

• || means parallel• Examples:

– A1=R2.L*R1.L,A0=R2.H*R1.H||R2.H=W[I2++] || [I3++]=R3;\– R2=R2+|+R4, R4=R2-|-R4 || I0+=M0||R1=[I0];

Documents

A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit