Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Laboratory 1. 1
LABORATORY 1. INTRODUCTION OF SOFTWARE DEVELOPMENT TOOLS FOR STARCORE DSP
1.1 CodeWarrior Development Studio for StarCore DSP Architectures
1.1.1 Creating a project
Launch the CodeWarrior IDE
1. Select Start > Programs > Freescale CodeWarrior > CW for StarCore 10.5 CodeWarrior > CodeWarrior IDE — the Workspace Launcher dialog box appears
Figure 1.1. Workspace Launcher Dialog Box
2. If you wish to change the location of your project's Workspace, click Browse to select a new path.
3. Select the required folder or click Make New Folder to create a new folder for storing your projects.
4. Click OK to store the project at the specified location — CodeWarrior launches and displays the Welcome page.
2 SYSTEMS ON CHIP FOR COMMUNICATIONS
Figure 1.2. Welcome Page
Create a new project
5. From the CodeWarrior IDE menu bar, select File > New > StarCore Project.
The Create a StarCore Project page appears. In the Project name field, type Demo.
Figure 1.3. New StarCore Project Wizard
Laboratory 1. 3
6. Click Next — the Devices page appears. Select the SC3850 option in the Device Family Group. Select the Application option from Project Type.
Figure 1.4. New StarCore Project Wizard
7. Click Next — the Build Settings page appears. Select Memory Type- Huge and Languages – C.
4 SYSTEMS ON CHIP FOR COMMUNICATIONS
Figure 1.5. New StarCore Project Wizard
8. Click Next — the Launch Configuration page appears. Choose:
Debugger Connection Type –Simulator,
Program Download option: Disable memory verification,
Launch Configurations options: Create all launch configurations.
Laboratory 1. 5
Figure 1.6. New StarCore Project Wizard
9. Click Next — the Simulators page appears. Choose:
Simulator- ISS with Remote System Configurations- Default
6 SYSTEMS ON CHIP FOR COMMUNICATIONS
Figure 1.7. New StarCore Project Wizard
10. Click Next — the Software Analysis page appears. Click Finish.
Figure 1.8. New StarCore Project Wizard
11. The Main window of the CodeWarrior Development Studio apppears.
Laboratory 1. 7
Figure 1.9. CodeWarrior Development Studio
The C code for the main.c program is given below:
/*-----------------------------------------------------------------------*
msc8156_main.c
StarCore DSP C Using Project Stationery.
COPYRIGHT (C) : Freescale Semiconductor, Inc., 2008
*-----------------------------------------------------------------------*/
/*
Biquad simulation.
*/
8 SYSTEMS ON CHIP FOR COMMUNICATIONS
#include <prototype.h>
#include <stdio.h>
#include <stdlib.h>
#define DataBlockSize 40 /* size of data block to process */
#define a1 -19661
#define a2 6554
#define b1 16384
#define b2 -6554
Word16 DataIn[DataBlockSize] = {
328, 9830, 8192, -6553, -3276, 3277, 3277, -6553, -9829, 4915,
8192, -6553, 328, 9830, 4915, -6553, -3276, 3277, 3277, -9829,
4915, -3276, -9829, 8192, -6553, 328, 9830, -6553, 3277, 3277,
3277, 328, 9830, 4915, -3276, -9829, 8192, -6553, -6553, 3277,
};
Word16 DataOut[DataBlockSize];
int func1();
int func2()
{
#pragma noinline
printf("Hello private text\n");
return 0;
}
int func1()
{
#pragma noinline
Word16 YNM1=0, YNM2=0;
Word32 TN,TNP1,YN,YNP1;
int i;
for (i = 0; i < DataBlockSize/2; i++) { // do all samples
Laboratory 1. 9
TN = L_deposit_h(DataIn[2*i]);
TNP1 = L_deposit_h(DataIn[2*i+1]);
TN = L_mac(TN, YNM2,a2); YN = L_mult(YNM2,b2);
TN = L_mac(TN, YNM1,a1); YN = L_mac(YN,YNM1,b1);
YN = L_add(YN,TN); YNM2 = _round(TN);
TNP1 = L_mac(TNP1, YNM1,a2); YNP1 = L_mult(YNM1,b2);
TNP1 = L_mac(TNP1, _round(TN),a1); YNP1 = L_mac(YNP1,YNM2,b1);
YNP1 = L_add(YNP1,TNP1); YNM1 = _round(TNP1);
DataOut[2*i] = _round(YN);
DataOut[2*i+1] = _round(YNP1);
}
for (i = 0; i < DataBlockSize; i++)
printf("Output %d\n",DataOut[i]);
return(0);
}
int main()
{
func2();
return func1();
}
1.1.2 Simulation of the project
To build the project, select from the CodeWarrior IDE menu bar Project > Build all.
To modify the debugger settings and start debugging a CodeWarrior project, perform these steps:
1. Click Project > Debug to start the debugging session.
1. The Debug perspective appears and the execution halts at the first statement of main().
2. Click on the thread in the Debug view.
The program counter icon on the marker bar points to the next statement to be executed.
10 SYSTEMS ON CHIP FOR COMMUNICATIONS
3. In the Debug view, click Step Over .
The debugger executes the current statement and halts at next statement.
2. Set breakpoint and execute program to breakpoint.
1. In the editor area, scroll to a line of command statement, for example: if (prod!=prod_ref)
2. Double-click on the marker bar next to the statement.
The breakpoint indicator (blue dot) appears next to the statement.
3. In the Debug view, click Resume .
The debugger executes all statements up to but not including the breakpoint statement.
3. Observation of the evolution of the registers as the program executes
Select the Registers tab and double click on the General Purpose Registers.
It can be seen that after every Step Over, the registers modified by the executed instructions are changing their value and turn red.
Laboratory 1. 11
4. Observation of the memory
Select the Memory tab. Observe the vectors saved in the memory at hexadecimal addresses. Find the vector DataIn and DataOut.
5. Control the program
1. In the Debug view, click Step Over .
The debugger executes the breakpoint statement and halts at the next statement.
2. In the Debug view, click Resume .
The program outputs to the Console window at the bottom.
3. In the Debug view, click Terminate .
The debug session ends.
1.2 SC3850 core overview
The SC3850 core technology is a generation of Freescale StarCore DSP cores. This figure below shows the SC3850 block diagram.
Figure 1.xxx. StarCore SC3850 block diagram
There are four parallel arithmetic logic units (ALUs) in the data arithmetic-logic unit (DALU), where most of the arithmetic and logical operations are performed on data operands. Each data ALU can perform two 16 × 16 multiplications per cycle (total of 8 multiplications for all ALUs, and up to 8 GMACs per cycle at 1 GHz core frequency). There are two address arithmetic
12 SYSTEMS ON CHIP FOR COMMUNICATIONS
units (AAUs) in the address generation unit (AGU), which performs effective address calculations using the integer arithmetic necessary to address data operands in memory. The program control unit (PCU) performs instruction fetch, instruction dispatch, hardware loop control, and exception processing. The resource stall unit (RSU) controls the hardware interlocks.
The figures below show the SC3850 programming model. There are sixteen 40-bit data registers in DALU, sixteen 32-bit address registers, four 32-bit offset registers, and four 32-bit modulo registers in AGU. In addition, there are control and configuration registers.
Laboratory 1. 13
Figure 1.xxx. StarCore SC3850 block diagram
The challenge facing DSP programmers is to use all the resources available in the advanced SC3850 architectures effectively. Ideally, the design should maximize the use of the program and data buses, and all six operational units (4 ALUs and 2 AAUs) simultaneously.
1.2.1 Data Arithmetic and Logic Unit DALU
Data can be represented as signed fractionals or signed/unsigned integers. Data types are of 8,16,20 and 40 bits.
Fractional numbers:
16 bits - Word16 (-1:2-15:1-2-15)
32 bits – Word32 (-1:2-31:1-2-31)
Integer numbers:
16 bits - short (-215:1:215-1)
32 bits – long (-231:1:231-1)
01• • •131415
fracties .
2–152–142–22–1-20
01• • •131415
fracties .
2–152–142–22–1-20
01• • •293031
fracties .
2–312–302–22–1-20
01• • •293031
fracties .
2–312–302–22–1-20
01• • •131415
întreg .s
2021213214-215
01• • •131415
întreg .s
2021213214-215
01• • •293031
întreg .s
2021229230-231
01• • •293031
întreg .s
2021229230-231
14 SYSTEMS ON CHIP FOR COMMUNICATIONS
DALU calculations are based on 40-bit registers.
The two multipliers of each ALU can be used in various ways:
SIMD2 or dot-product multiplication
Complex multiplication
Extended precision multiplication (16x32, 32x32)
Table 1.1
Operation Precision Operations per cycle
Real Multiply 16x16 8
16x32 4
32x32 2
Complex Multiply 16x16 2
16x32 1
1.2.2 Integer and Fractional Arithmetic
One of the strengths of both the StarCore architecture and the StarCore compiler is the ability to perform both fractional and integer arithmetic. Values stored in memory or registers are interpreted differently depending on the operation performed. For integers, the binary point is considered to be immediately to the right of the LSB. For the fractional case, the binary point is considered to be immediately to the right of the MSB. Table 1 illustrates this for 16-bit data values.
Table 1.2
Binary Representation Hexadecimal Representation
Integer Value (decimal)
Fractional value (decimal)
0100 0000 0000 0000 0x4000 16384 0.5
0001 0000 0000 0000 0x1000 4096 0.125
0000 0000 0000 0000 0x0000 0 0.0
1100 0000 0000 0000 0xC000 -16384 -0.5
1111 0000 0000 0000 0xF000 -4096 -0.125
Laboratory 1. 15
The StarCore compiler implements fractional arithmetic using built-in intrinsic functions based on integer data types. The compiler supports many intrinsic (built-in) functions that map directly to SC100 assembly instructions. As C does not support fractional types and operations, these intrinsic functions let you use integer data types to implement fractional operations. Any fractional values or constants must therefore be defined using their integer equivalent. Useful relationships for deriving these integer representations from the fractional vales are as follows:
16-bit Integer Value = Fractional Value × 215
32-bit Integer Value = Fractional Value × 231
The syntax structure for the compiler group of intrinsic functions is compatible with the ETSI and ITU reference implementations of bit-exact standards.
The example from the figure below illustrates how the instructions are mapped based on the type of the arithmetic required. For integer arithmetic, the compiler generates integer instructions (for example, imac). For fractional arithmetic, it generates fractional instructions (for example, mac). Also, move instructions are generated with correct data alignment.
Figure 1.xxx. StarCore SC3850 block diagram
A complete list of the intrinsic functions for simple precision and double precision fractional arithmetic can be found in the C/C++ Compiler User’s Manual.
Table 1.3
Intrinsic Syntax
add Word16 add(Word16 var1, Word16 var2)
mac_r Word16 mac_r(Word32 L_var3,Word16 var1, Word16 var2);
msu_r Word16 msu_r(Word32 L_var3,Word16 var1, Word16 var2);
max Word16 max(Word16 var1, Word16 var2);
16 SYSTEMS ON CHIP FOR COMMUNICATIONS
mult Word16 mult(Word16 var1, Word16 var2);
round Word16 round(Word32 L_var1);
saturate Word16 saturate(Word32 L_var1);
shl Word16 shl(Word16 var1, Word16 var2);
shr Word16 shr(Word16 var1, Word16 var2);
sub Word16 sub(Word16 var1, Word16 var2);
extract_h Word16 extract_h(Word32 L_var1);
extract_l Word16 extract_l(Word32 L_var1);
L_add Word32 L_add(Word32 L_var1, Word32 var2);
L_sub Word32 L_sub(Word32 L_var1, Word32 var2);
L_mult Word32 L_mult(Word32 L_var1, Word32 var2);
L_mac Word32 L_mac(Word32 L_var3, Word16 var1, Word16 var2);
L_msu Word32 L_msu(Word32 L_var3, Word16 var1, Word16 var2);
The data type to be used for fractional variables is.
Table 1.4
Fractional data type Equivalent C type
Word16 short
Word32 long/int
The fractional variables that are declared as Word16, Word32 may be initialized with fractional values using WORD16() or WORD32() macros.
For example:
Word16 x = WORD16(0.5);
Laboratory 1. 17
The program implemented before represents an order 2 (biquad) IIR filter. The transfer function can be written as:
1 2
1 21 2
1 2
1( )
1
b z b zH z
a z a z
The filter is implemented in direct (canonic) form 2 and it has the following structure:
Fig. 1.10. IIR biquad filter
The difference equations corresponding to the structure above for the filter are:
1 2( ) ( ) ( ) ( 1) ( ) ( 2)w n x n a w n a w n
1 2( ) ( ) ( 1) ( 2)y n w n b w n b w n (1.1)
EXERCISE:
1. Modify the program to implement the structure below (transposed biquad):
Fig. 1.10. IIR transposed biquad filter
( )x n ( )y n
1b
2b 2a
1a
2w
1w1z
1z
( )x n ( )y n
1b
2b
( )w n
1a
2a
1z
1z