Upload
vanhanh
View
245
Download
1
Embed Size (px)
Citation preview
1
Code Generation Tools Tips & TricksFor All TI Code Generation Tools
April 2011
2
Online Resources
• Compiler Wiki– http://processors.wiki.ti.com/index.php?title=Category:
Compiler (link)– Good chance your question is answered here– Constantly evolving– All the manuals!– Downloads– FAQ
• Compiler Forum– http://e2e.ti.com/support/development_tools/compiler/f/
343.aspx (link)– Questions and discussion– Search-able
3
Actually Read the Readme
• This talk only covers the highlights• Many more features covered in the readme files• Critical detail as well• Well worth the 1 hour it takes to read it• Available with the compiler download
– https://www-a.ti.com/downloads/sds_support/TICodegenerationTools/download.htm (link)
– ARM compiler not there• Also in root directory of compiler install
– CCSv4 typical path: C:\Program Files\Texas Instruments\ccsv4\tools\compiler\target
– target: c2000, c5400, c5500, c6000, msp430, tms470
4
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
5
Recommended Development Flow
Edit Compile -g Debug Works?
Compile -oProfileGoals Met?Done
No
Yes
No
Yes
• Separate use of –g and -o
Start
6
Optimization vs. Debug
• As one improves the other degrades• If you combine them together …• Effects on debugging
– Operations reordered– Variables eliminated
• Effects on performance– Instruction scheduling is impaired
• Compare performance -o vs. –o -g– C6x MP3 decoder– -o runs 16% faster than –o –g
7
Optimization LevelsOption Scope
Default None
-o0 Statement
-o1 Block
-o2 or -o Function
-o3 Filehigh
low
• Optimization is critical!• Not on by default• Option is –olevel where level is 0-3
8
If You Must Build with Debug …
• Then also use –mn• Restores some of the lost performance• Degrades debug experience, but not too much• Wiki article:
http://processors.wiki.ti.com/index.php/Debug_versus_Optimization_Tradeoff (link)
9
Quick Help on Compiler Options
• For more detail use –h –option– More detail not always present
• To search options use –h text
• For command line users• Run compiler shell with no options
C:\dir>cl6x | moreTMS320C6x C/C++ Compiler vX.Y.Z…
-@=filename Read options from specified file-D=NAME[=value] Pre-define NAME-I=dir Add dir to #include search path
…
C:\dir>cl6x | moreTMS320C6x C/C++ Compiler vX.Y.Z…
-@=filename Read options from specified file-D=NAME[=value] Pre-define NAME-I=dir Add dir to #include search path
…
C:\dir>cl6x -h debug-mn Optimize fully in the presence of …--symdebug:coff Enable full symbolic COFF debugging …
object or out file (DEPRECATED).-g Enable full symbolic DWARF debugging …
C:\dir>cl6x -h debug-mn Optimize fully in the presence of …--symdebug:coff Enable full symbolic COFF debugging …
object or out file (DEPRECATED).-g Enable full symbolic DWARF debugging …
10
Options: Long and Short Forms
• All compiler options have a long form– Start with two dashes– Example: --symdebug:dwarf
• Many have a shorter alias– Example: -g
• Documentation and CCS build dialog emphasize long form
• Table of long and short forms for options used in this presentation is next
11
Options: Long and Short Forms-g --symdebug:dwarf
-o --opt_level
-mn --optimize_with_debug
-w --warn_sections
-pdr --issue_remarks
-pden --display_error_number
-pdsr --diag_remark
-pdsw --diag_warning
-pdse --diag_error
-pds --diag_suppress
12
Recommended Build Options
• For C6000– http://processors.wiki.ti.com/index.php/C6000_Compiler
:_Recommended_Compiler_Options (link)• For C5500
– http://processors.wiki.ti.com/index.php/C5500_Compiler:_Recommended_Compiler_Options (link)
• Many points made, however, apply equally well to all TI compilers
13
C6000 Optimization Hints
• Much can be gained from informing the compiler about additional properties of your code– Pointers don’t access the same memory locations– How many times a loop iterates– Pointers are aligned
• http://processors.wiki.ti.com/index.php/C6000_CGT_Optimization_Lab_-_1 (link)– Shows techniques for giving the compiler that
information– A working example is modified, in steps, to run faster
and faster
14
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
15
Linker Tips
• Global variables are not initialized to 0!– Except under EABI
• The linker does not know your system memory layout– Linker command file does that
• MEMORY and SECTIONS directives
• Use -w– Warns you when certain suspicious things occur– Creating a section without a specification
• Never intentionally occurs in a production build– Stack not created– Heap not created
16
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
17
Introduction to EABI
• ABI: Application Binary Interface– Conventions which allow separately compiled object
files and libraries to link into a cohesive executable• Default for all TI compilers is COFF ABI• TMS470 v4.4.x and C6000 v7.2.x introduce EABI
– Build Option: --abi=eabi• Impossible to mix COFF ABI and EABI• Before using EABI yourself, obtain EABI
versions of all your libraries– Availability is very good, but not yet 100%
• Details on the Wiki– http://processors.wiki.ti.com/index.php/EABI_Support_i
n_C6000_Compiler (link)– http://processors.wiki.ti.com/index.php/EABI_Support_i
n_ARM_Compiler (link)
18
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
19
Intrinsics vs. Assembly
• TI supports limited asm() statements– Not like GCC asm() statements– Very little interaction with C environment– No interaction with registers or local variables
• Intrinsics are preferred– Act like function calls– Implemented in one instruction
• With a few exceptions
• Compiler knows very little about instructions within asm() statements
• Knows everything possible about intrinsics• Thus, optimization with intrinsics is far more
effective
20
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
21
Type Sizes by CPUType C28xTM C55xTM C6000TM MSP430 470
char 16 16 8 8 8
short 16 16 16 16 16
int 16 16 32 16 32
long 32 32 40 32 32
long long 64 40 64 NA 64
float 32 32 32 32 32
double 32 32 64 32 64
long double 64 32 64 32 64
• Shaded sizes different from hosted systems• C6000 EABI long is 32-bits. More on that later.
22
Type Differences
• Because char is 16-bits on C55xTM and C28xTM
– sizeof(int) == sizeof(char) == 1
– Redefines the term byte– 8-bit wide external streams must be handled carefully– App note: http://www-s.ti.com/sc/techlit/spra757 (link)
• On 470 plain char is unsigned– Use plain char only for ASCII chars– Otherwise, use signed char or unsigned char
• Floating point is very slow on CPU’s without hardware support
• MSP430: Will add 64-bit long long and 64-bit long double types
23
Standard Integer Typedefs
• stdint.h defines standard names for exact width integer types
#if defined(__TMS320C2000__) || defined(_TMS320C5XX) \|| defined(__TMS320C55X__)typedef int int16_t;typedef unsigned int uint16_t;typedef long int32_t;typedef unsigned long uint32_t;
#elif defined(_TMS320C6X) || defined(__TMS470__)typedef signed char int8_t;typedef unsigned char uint8_t;typedef short int16_t;typedef unsigned short uint16_t;typedef int int32_t;typedef unsigned int uint32_t;
#elif defined(__MSP430__)…
#if defined(__TMS320C2000__) || defined(_TMS320C5XX) \|| defined(__TMS320C55X__)typedef int int16_t;typedef unsigned int uint16_t;typedef long int32_t;typedef unsigned long uint32_t;
#elif defined(_TMS320C6X) || defined(__TMS470__)typedef signed char int8_t;typedef unsigned char uint8_t;typedef short int16_t;typedef unsigned short uint16_t;typedef int int32_t;typedef unsigned int uint32_t;
#elif defined(__MSP430__)…
• Other typedefs: minimum width, fastest, etc.• No need to define your own
24
C6000 EABI and Size of long
• COFF ABI: long is 40-bits• EABI: long is 32-bits
– Eases porting of general purpose code to C6000• Porting from COFF ABI to EABI?
– In COFF change from long to int40_t• Unless you are 100% certain 32-bits is enough
– Makes port easy
25
Alignment & Structures
• Alignment is different from hosted systems• Misaligned access
– x86: Works but imposes cycle penalty– TI CPUs: Fails silently
• TI CPUs alignment == size of type (few exceptions)
• Structures are laid out differently between CPU’s– Order of members is guaranteed– Any member, or the whole struct, may be aligned– Affects exchanging data with external sources
26
Packed Structures
CPU Family Version NotesARM 4.8.0 Cortex onlyC6000 7.2.0 Not C62x, C67x, C67x+
• Syntax is same as GCC• Details here
– http://processors.wiki.ti.com/index.php/GCC_Extensions_in_TI_Compilers (link)
• Underlying HW must support unaligned access– Will be relaxed over time
• MSP430 support coming soon
• Support starts in these compilers …
27
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
28
16 X 16 32 Multiply
long_var = short_var1 * short_var2;
long_var = (long) short_var1 * (long) short_var2;
• Instead write …
• Very important when sizeof(int) != sizeof(long)• Accurately represents operation• Implemented efficiently• App note: http://www-s.ti.com/sc/techlit/spra683
(link)
• Do not write …
29
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
30
Understanding Memory ModelsLarge Small
Memory Range Full Partial
Code Size Larger Smaller
Speed Slower Faster
To enable See docs Default
31
Understanding Memory Models
CPU Extends BecauseC6000TM Code PC-relative branches
C6000TM Data DP-offset global variables
C55xTM Data 2 ways to access globals2 sizes of address registers
C28xTM Data 2 sizes of address registers
MSP430 Code Wider ALU
MSP430 Data Wider ALU
32
Understanding Memory Models
• Details on C6000 Data Memory Models– http://processors.wiki.ti.com/index.php/C6000_Memory
_models (link)
33
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
34
Compiler Diagnostics
Remark Warning Error
Severity Low Medium High
Build fails? No No Yes
To enable -pdr Default Default
• Is it okay to ignore remarks?
35
Automatic Error Detection
• Remarks expose common bugs• This complete program compiles fine
• But doesn’t work• Compile with –pdr to see remarks
• Always build with –pdr!• Take remarks seriously• Always include required RTS header files
• Always prototype user functions
void main() { printf(“hello, world\n”); }void main() { printf(“hello, world\n”); }
“hello.c”, line 1: remark: function declared implicitly“hello.c”, line 1: remark: function declared implicitly
#include <stdio.h>#include <stdio.h>
36
Elevate Remark to Error
• Second, use –pdsenum to elevate the remark• In this case –pdse225 gives
• Force remark to cause build to fail• Use 2 separate compilation steps• First, use –pden –pdr to see remark number
"hello.c", line 1: remark #225-D: function declared implicitly"hello.c", line 1: remark #225-D: function declared implicitly
"hello.c", line 1: error: function declared implicitly1 error detected in the compilation of "hello.c".
>> Compilation failure
"hello.c", line 1: error: function declared implicitly1 error detected in the compilation of "hello.c".
>> Compilation failure
• Can use –pdse225, without –pdr, on all builds• Projects originated with CCSv4 have –pdsw225
by default– Warning, not an error
37
Control Diagnostic Levels
• First identify diagnostic id with -pdenSet level to: Option #pragmaRemark -pdsrid #pragma diag_remark id
Warning -pdswid #pragma diag_warning id
Error -pdseid #pragma diag_error id
Default none #pragma diag_default id
Suppress -pdsid #pragma diag_suppress id
• Diagnostics with “-D” appended to id can be suppressed or changed– All warnings and remarks– A few errors
• #pragma is alternative to -pdsXXX• #pragma provides line by line control
38
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
39
Useful Utilities
• ofdXX:Dumps out contents of object files and libraries– Use –x option for XML format
• nmXX: Lists symbol table in .out or .obj file• stripXX: Strips symbol and debug information
from .out file• demXX: Demangles symbols into their C++ form• disXX: Disassembler
– Not on some ISA’s• Replace XX with the abbreviation for your ISA,
e.g. ofd6x, nm55, etc.
40
Even Cooler Utilities
• Turn the ofdXX XML or linker map file XML into useful information
• Call graph, stack usage, section sizes by type, compare libraries, and much more
• Called the cg_xml package• Target independent• Released separately from compiler tools• Command line executables
– http://processors.wiki.ti.com/index.php/Code_Generation_Tools_XML_Processing_Scripts (link)
• Invoke from within CCSv4– http://processors.wiki.ti.com/index.php/Code_Generatio
n_Tools_XML_Processing_Scripts_Plug-in_for_CCS(link)
41
Agenda
• Recommended Development Flow• Linker Tips• EABI• Intrinsics vs. Assembly• Types and Alignment• 16 x 16 32 Multiply• Memory Models• Diagnostics• Useful Utilities• Remaining Tips
42
Remaining Tips
• C++ code is usually very efficient• #pragma is compiler specific and does not port
– http://processors.wiki.ti.com/index.php/Pragmas_You_Can_Understand (link)
• C FAQ – http://www.eskimo.com/~scs/C-faq/faq.html (link)
• C++ FAQ– http://www.parashift.com/c++-faq-lite (link)
43
Questions
44
Backup Slides
45
Backup Agenda
• More Optimization Hints• Standardize Handling of Types• More on Diagnostics• C Headers in Assembly• Interrupt Intrinsics• Predefined Symbols for Version
46
More Optimization Hints
• Optimization may uncover user errors like:– Uninitialized variables– Loose adherence to ANSI standard– Failure to use volatile
• Use volatile on variables modified by:– Interrupts– Peripherals– Other processors
• volatile controls order of access, not timing
47
Backup Agenda
• More Optimization Hints• Standardize Handling of Types• More on Diagnostics• C Headers in Assembly• Interrupt Intrinsics• Predefined Symbols for Version
48
inttypes.h
• Another header file for standardizing types• Includes stdint.h• Also defines printf format strings
#include <inttypes.h>…printf(“%” PRId32 “\n”, (int32_t) 1); // portableprintf(“%d\n”, (int32_t) 1); // not portable
#include <inttypes.h>…printf(“%” PRId32 “\n”, (int32_t) 1); // portableprintf(“%d\n”, (int32_t) 1); // not portable
– Note careful use of commas in first printf
49
Backup Agenda
• More Optimization Hints• Standardize Handling of Types• More on Diagnostics• C Headers in Assembly• Interrupt Intrinsics• Predefined Symbols for Version
50
Diagnostic Control Example
int ex(int i){
switch (i){
case 10 :return val(); /* line 7 */
break; /* line 9 */
…}
}
C:\dir> cl55 –pden –pdr ex.c"ex.c", line 7: remark #225-D: function declared implicitly"ex.c", line 9: warning #112-D: statement is unreachable
51
Diagnostic Control Example
#pragma diag_error 225 /* Require explicit function decls */int ex(int i){
switch (i){
case 10 :return val(); /* line 7 */#pragma diag_suppress 112 /* suppress msg on break */break; /* line 9 */#pragma diag_default 112 /* restore msg level */ …
}}
C:\dir> cl55 –pden –pdr ex.c"ex.c", line 7: error #225-D: function declared implicitly1 error detected in the compilation of "ex.c".
>> Compilation failure
• #pragma is alternative to -pdsXXX• #pragma provides precise control
52
Verbose Diagnostics -pdv
• For this source lineextern struct example;extern struct example;
• The diagnostic is “ex.c", line 1: warning: a storage class may not be specified here“ex.c", line 1: warning: a storage class may not be specified here
• To avoid confusion add -pdv“ex.c", line 1: warning: a storage class may not be specified here
extern struct example;^
“ex.c", line 1: warning: a storage class may not be specified hereextern struct example;^
53
Backup Agenda
• More Optimization Hints• Standardize Handling of Types• More on Diagnostics• C Headers in Assembly• Interrupt Intrinsics• Predefined Symbols for Version
54
C Headers in Assembly.cdecls optional parameters%{
/* C/C++ code here, usually #include’s */%}
.cdecls optional parameters%{
/* C/C++ code here, usually #include’s */%}
• Converted constructs (usually found in .h files)– function/variable declarations (prototypes)– structs, unions, enumerations– Non-function-like macros
• NOT converted– function/variable definitions– function-like macros
• Each .cdecls region is separate context• Conversion after pre-processing• Option to warn on each construct NOT converted• Includes generated assembly code in listing file
55
How C/C++ is Transformed
#define WANT_ID 1#define NAME ”Jari\n”
extern int var;extern float cvt(int src);
struct duo {int ifld;float ffld;
};enum status {
OK = 1,FAIL = 256
};
hd.h .cdecls C, LIST, “hd.h”-----------------------------.define ”1”, WANT_ID.define ”””Jari\n”””, NAME
.global _var
.global _cvt
duo .struct 0, 2ifld .field 16 ; pretty and
.field 16 ; informativeffld .field 32 ; comments
.endstructstatus .enumOK .emember 1FAIL .emember 256
.endenum
56
Typical Usage
.cdecls C, LIST, ”hd.h”
.data.if $defined(WANT_ID)id: .cstring NAME.endif
.bss data, $sizeof(duo)data: .tag duo...
hope:AR1 = #dataAC0 = *AR1(#(duo.ifld)) << 16dbl(*AR1(#(duo.ffld))) = AC0AR0 = #idT0 = #(status.OK)return
#define WANT_ID 1#define NAME ”Jari\n”
extern int var;extern float cvt(int src);
struct duo {int ifld;float ffld;
};enum status {
OK = 1,FAIL = 256
};
hd.h
57
Backup Agenda
• More Optimization Hints• Standardize Handling of Types• More on Diagnostics• C Headers in Assembly• Interrupt Intrinsics• Predefined Symbols for Version
58
Interrupt Intrinsics
• For enabling/disabling interruptsunsigned int _disable_interrupts(); // 3 cycles on C6000unsigned int _enable_interrupts(); // 3 cycles on C6000void _restore_interrupts(unsigned int); // 1 cycle on C6000
unsigned int _disable_interrupts(); // 3 cycles on C6000unsigned int _enable_interrupts(); // 3 cycles on C6000void _restore_interrupts(unsigned int); // 1 cycle on C6000
• Disable and enable return interrupt state before change; use that value when restoring state
• Barriers to optimization• Usage example …
unsigned int local;local = _disable_interrupts();if (sem) sem--; /* atomic test/update of semaphore */_restore_interrupts(local);
unsigned int local;local = _disable_interrupts();if (sem) sem--; /* atomic test/update of semaphore */_restore_interrupts(local);
• Replacement for HWI_disable(), HWI_restore()– Faster. HWI_disable on C64x takes 16 cycles.
• Cycle counts given are “best case” numbers– Ignores cache effects, memory latency, etc.
59
Backup Agenda
• More Optimization Hints• Standardize Handling of Types• More on Diagnostics• C Headers in Assembly• Interrupt Intrinsics• Predefined Symbols for Version
60
Predefined Symbols for Version• __TI_COMPILER_VERSION__ and __TI_ASSEMBLER_VERSION__
• Returns int corresponding to compiler version• Version number breaks down:
– Major number (1-2 digits)– Minor version (3 digits)– Patch version (3 digits)– Example: v5.1.0 5 001 000 5001000
• Workaround compiler bugs#if defined(_TMS320C6X) && __TI_COMPILER_VERSION__ == 5001000workaround C6x compiler bug
#endif
#if defined(_TMS320C6X) && __TI_COMPILER_VERSION__ == 5001000workaround C6x compiler bug
#endif
• Target independent test for TI compiler#if defined(__TI_COMPILER_VERSION__)does not work with older compilers!
#endif
#if defined(__TI_COMPILER_VERSION__)does not work with older compilers!
#endif