44
Accelerating Your Success™ V10_1_2_0 Avnet Speedway Design Workshop Lecture 2: System Prototyping with the Avnet Spartan-3A DSP FPGA DaVinci Development Kit

Avnet Speedway Design Workshop - pub.ro

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Avnet Speedway Design Workshop - pub.ro

Accelerating Your Success™

V10_1_2_0

Avnet SpeedwayDesign Workshop™

Lecture 2: System Prototyping with the Avnet Spartan-3A DSP FPGA DaVinci Development Kit

Page 2: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

2

Avnet SpeedWay Design Workshop™2

Develop Executable Spec in Simulink

Partition Between DSP and FPGA Co-Processor

Model-Based Design Flow

Design Exploration for Targeting Hardware

Verify Hardware in HW Co-simulation

Implement Stand-Alone Video System

Page 3: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™3

The Problem We Wish to Solve

High level behavioral models are great for expressing ideas and prototyping quickly. Models also provide an executable specification or reference design that can be used for verification.

However, as we move closer towards implementation of our model, we need to elaborate it with details about the target hardware architecture.

Page 4: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

4

Avnet SpeedWay Design Workshop™4

Agenda

• Overview of TI DSP devices and design flow

• Design exploration for targeting hardware

• Overview of Real-Time Workshop Embedded Coder c-code generation

• Integrating with the TI DSP design flow

Page 5: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

5

Avnet SpeedWay Design Workshop™5

Agenda

• Overview of TI DSP devices and design flow

• Design exploration for targeting hardware

• Overview of Real-Time Workshop Embedded Coder c-code generation

• Integrating with the TI DSP design flow

Page 6: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

6

Avnet SpeedWay Design Workshop™6

Generate Ve

rify

Generate Ve

rify

Tools Overview

Code Composer Studio

TITITI

.

C & ASM

ISE

Hardware Hardware CoCo--simulationsimulation

Software Software CoCo--simulationsimulation

Avnet Spartan3A-DSP DaVinci Development Kit

DaVinci DM6437 Spartan®-3A DSP 3SD1800A

HDLTITI

XilinxXilinxXilinx

AvnetAvnetAvnet

MATLAB® Embedded MATLAB ToolboxesToolboxes

Simulink® Embedded MATLAB & C BlocksetsBlocksets

RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,

IDE Link CC, Target TC6IDE Link CC, Target TC6

MathWorksMathWorksMathWorks

Page 7: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

77

Avnet SpeedWay Design Workshop™7

Application processing

OMAP3503

Low power processing Video processing

OMAP3530OMAP3525

DM355“DM3xx

Next”

DM644x

DM647DM648

DM6467“DM64xx

Next”DM6437DM643x

C674x

C640x

C550x

OMAP-L1

OMAP3515 DM335

Which TI device is best for me?

13 Different Products & Suites of Products shown, including many products for video. Including…

Applications processing with OMAP35xxHighest performance ARM + GraphicsFirst to market with Cortex-A8Up to 600MHz ARM Cortex-A8 (~ 1200 ARM9 MIPS)Up to 10 million polygons/ second with Graphics Accelerator

DM355:Low Price for HD, $10-$15 Range depending on volumeMPEG4 HD video, JPEGUp to 270 ARM9 MHz

DM644x:Up to 720p video decodeUp to 600 MHz C64x+ DSP + video accelerator performance4 10bit video DAC’s supporting composite, component, or S-Video

Page 8: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

8

Avnet SpeedWay Design Workshop™8

720

480

1080

DM64x™ DM644X

DM644X• H.264, MPEG2,

MPEG4, VC1• OSD capable

DM6437DM643X

DM643X (Lower Cost)• H.264 enc or dec• MPEG2 dec• MPEG4 enc or dec• VC1 dec

DM647/8Multi-SD

DM647/8 (Multi-Channel)• H.264 BP, MPEG2,

MPEG4• Multi-video interface• VC1 dec

DM6467 (HD)• H.264 HP, MPEG-4, VC1, MPEG2• Multi-SD enc & dec• 1080p 30fps dec, 720p enc or decFuture

DeviceProduction

In Development

Sampling

DM6467HD

• MPEG4 720p enc or dec• H.264 MP VGA decode• H.264BP/VC1/ WMV9 D1 enc or dec

OMAP3530OMAP3525

65nm

OMAP3530OMAP3525

65nm

• MPEG4 720p enc or dec

DM35590nm

DM35590nm

DM355 OMAP3530/3525

“DM64xxNEXT”

“DM3xxNEXT”

TI Video device capabilities

Page 9: Avnet Speedway Design Workshop - pub.ro

Slide 8

RS2 change this green bubble to purple. Label as "DM64xx Next"

Remove the red text under the bubbleRita Sulma, 25/08/2008

Page 10: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™9

Code Composer Studio

Page 11: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

10

Avnet SpeedWay Design Workshop™10

TI Code Composer Studio™ IDE

• Project Manager• Editor• Disassembly• Memory Registers• RTA: Extension Graph• Graphing: Eye Diagram, CPU

Load Graph• Message Log• Statistics• Watch Windows

Page 12: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™11

DSP/BIOS Concept Slide

• DSP/BIOS is a RTOS that provides run-time services which developers use to build DSP applications and manage application resources.

• The DSP/BIOS provides real-time, run-time kernel services that form the underlying architecture, or infrastructure, of real-time DSP applications.

• The DSP/BIOS kernel tightly integrates with the Code Composer Studio Integrated Developers Environment (IDE) to provide the ability to:– Select and configure the foundation modules and kernel objects

required by the application with the DSP/BIOS Configuration Tool– Provide DSP/BIOS kernel object viewing with the Code Composer

Studio (CCStudio) plug-in utility– Support the real-time analysis features in the DSP/BIOS kernel with

host-side tooling.

Avnet SpeedWay Workshops

11

Page 13: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™12

HOST DEVELOPMENT COMPUTER

Code Composer Studio r

TARGET TMS320 DSP HARDWARE

DSP/BIOS Real-time Analysis

Instrumented DSP application

executableimage

DEBUG

JTAGEMULATION

RTDX

kernel modules

CONFIGURATION

VISUALIZATION

BUILD

programsources kernel APIs

Graphical or script-based OS configurationEasily select only the modules requiredStatic creation of kernel data structures

Deterministic, multithreading kernelPreemptive schedulerDebug version builds-in instrumentation Scalable to minimal footprint

Graphical analysis & debug toolsExamine state of OS objectsReal-time capture of execution history, CPU load, & thread performance

DSP/BIOS OS & Tools

DSP/BIOS is a RTOS that provides run-time services which developers use to build DSP applications and manage application resources. The DSP/BIOS provides real-time, run-time kernel services that form the underlying architecture, or infrastructure, of real-time DSP applications.The DSP/BIOS kernel tightly integrates with the Code Composer Studio Integrated Developers Environment (IDE) to provide the ability to:

Select and configure the foundation modules and kernel objects required by the application with the DSP/BIOS Configuration ToolProvide DSP/BIOS kernel object viewing with the Code Composer Studio (CCStudio) plug-in utilitySupport the real-time analysis features in the DSP/BIOS kernel with host-side tooling.

DSP/BIOS includes interrupt dispatcher that can handle all interrupts coming into the deviceh d h h hl d bl d h f h

Avnet SpeedWay Workshops

12

Page 14: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™13

Interrupt Handling

• DSP/BIOS includes interrupt dispatcher that can handle all interrupts coming into the device– The dispatcher is highly optimized assembly code that

performs operations such as context save/restore and disabling/enabling preemption

• Interrupt handlers can be written in C• The dispatcher supports muxing of 64+ device

interrupt pins to multiple interrupt sources

Page 15: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™14

Real-time Analysis

•••

CPU Load

Message Logs

ThreadStatisticalInformation

Execution Graph (Software Logic Analyzer)

Page 16: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™15

CDB

KER

NEL O

BJEC

TS

GRAPHICAL CONFIGUATION

TEXTU

AL C

ON

FIGU

RA

TION

Page 17: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™16

Kernel Modules

Module DescriptionHWI Interface from hardware interrupts to kernel via dispatcher or macros

SWI Preemptible thread that uses program stack but cannot yield

TSK Independent, preemptible thread of execution that has its own stack and can yield the processor

PRD Time-triggered SWIMSGQ Variable-length transparent message passingMBX Mailboxes for synchronized fixed-sized data exchange between tasksLCK Nestable semaphore with concept of ownershipSEM Counting semaphore

Page 18: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™17

Kernel Modules

Module DescriptionQUE Atomic linked listsCLK Interface to hardware timersGIO Extensible I/O with support for asynchronous I/O & synchronous read/write

SIO Streaming I/OMEM Heap managerBUF Deterministic fixed-sized buffer allocation

Page 19: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™18

Real-time Analysis Modules

Module DescriptionLOG Low-overhead ‘printf’ or event logging to a buffer

STS Statistics such as # or times called, average execution time, and maximum execution time

HST Stream data to/from desktop host computer system

Page 20: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

19

Avnet SpeedWay Design Workshop™19

Agenda

• Overview of TI DSP devices and design flow

• Design exploration for targeting hardware

• Overview of Real-Time Workshop Embedded Coder c-code generation

• Integrating with the TI DSP design flow

Page 21: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

20

Avnet SpeedWay Design Workshop™20

Design Exploration for targeting hardware

• Convert from floating to fixed-point data types• Model the dataflow for your hardware:

– Patch / ROI processing for DSP– Line buffers for FPGAs– Streaming pixel processing for FPGAs

• Modeling data organization (row major vs column major)• Partition algorithm between DSP / FPGA• Use blocks that can create the code you want:

– Video and Image Processing Blockset -> C code– TI IMGLIB -> ASM code– Xilinx System Generator for DSP -> HDL code– Custom C / HDL code

Page 22: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

21

Avnet SpeedWay Design Workshop™21

Fixed-point design challenges

• Finite word lengths introduce quantization error– Overflow (overload distortion)

• Data beyond range of fixed-point data type– Underflow (granular noise)

• Not enough fractional bits for exact data match • Properly scale input, output, and intermediate quantities• Minimize error propagation of signals and parameters

s … 32 16 8 4 2 1 1/2 1/4 1/8 1/16 1/32 …

7+1=8 bit word length & 5 fractional bits Range =[-4 3.9688) Step = 1/32

7+1=8 bit word length & 1 fractional bit Range =[-64 63.5) Step =1/2

FPGA’s are inherently fixed-point machines. There are cores that are floating point capable, but it adds overhead. We instead choose “budget math” and need to contend with the above challenges to arrive at accurate results.

Page 23: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™22

Fixed-point design solutions

• Data type propagation• Port data type visualization

ufix16_E2un-signed fixed-point number16 bit word, 2 bit positive scaling

Range: [0 262140]Precision: 4

Signed | Integer | Word Length | Direction | Scale

Binary point scaling representation

Page 24: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™23

Demo: Fixed-Point Workflow

• Modeling fixed-point data types for bit-true simulation• Autoscaling to determine the optimum fractional settings

(scaling) for DSP word lengths

Page 25: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™24

Fixed-point design solutions

Set fixed-point data types for signals and blocks

Full manual control

Page 26: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™25

Fixed-point design solutions

Log Min, Max, and Overflow

Override data types with double precision

View fixed-point log and scaling

recommendations

Page 27: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

26

Avnet SpeedWay Design Workshop™26

Model the dataflow for your hardware

• Serial stream processing vs frame processing• Patch Processing for efficient data movement• Parallel Architecture• Pipelining• Row-Major vs Column-Major data organization

Insert a picture with Block Processing

Page 28: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

27

Avnet SpeedWay Design Workshop™27

• Typical embedded processors support two data types:– Base data type – integer of the specified bit size of an embedded

processor– Accumulator data type – integer that is twice the size of the base

data type supported by an embedded processor

• Base data type is supported for basic simulation operations such as addition, subtraction, multiplication, delay, and shift.

• Accumulation data type is supported only for operations such as addition, subtraction, and delay, not multiplication.

General Advice for Design Exploration

Targeting Embedded Processors

Typical embedded processors support two data types:

• Base data type—Integer of the specified bit size of an

embedded processor

• Accumulator data type—Integer that is twice the size of the

base data type supported by an embedded processor

Base data type is supported for basic simulation operations such as addition, subtraction, multiplication, delay, and shift. Accumulation data type is supported only for operations such as addition, subtraction, and delay, not multiplication.

Page 29: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

28

Avnet SpeedWay Design Workshop™28

General Advice for Design Exploration

• Multiplications must use the base data type.• Delays should use the base data type.

– Use of the accumulator data type is costly because they are stored in memory from one time step to the next.

– Delays usually feed to gains for multiplication.

• Temporary variables can use the accumulator data type.– They are stored temporarily in shared and reused memory like

RAM or CPU registers.

• Summations can use the accumulator data type.– To reduce buildup of errors due to round off.– To prevent overflows.

Fixed-Point Rules for Targeting Embedded Processors

The following is a set of guidelines for data type selection when targeting a fixed-point processor that supports a base data type

and an accumulator data type:

• Multiplications must use the base data type.

• Delays should use the base data type. Use of the accumulator data

type is costly because it is stored in memory from one time step

to the next. Besides, delays usually feed to gains for multiplication.

• Temporary variables can use the accumulator data type. They are

stored temporarily in shared and reused memory like RAM or

CPU registers.

• Summations can use the accumulator data type. This reduces the

buildup of errors due to round off and prevents overflows.

Page 30: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

29

Avnet SpeedWay Design Workshop™29

Agenda

• Overview of TI DSP devices and design flow

• Design exploration for targeting hardware

• Overview of Real-Time Workshop Embedded Coder c-code generation

• Integrating with the TI DSP design flow

Page 31: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

30

Avnet SpeedWay Design Workshop™30

Model-Based Design supports both Software and Hardware systems

• Coders– Code generation from models– Language options– Code interfacing, optimization

• Links– Verification tool integration– Project generation, build, download– Co-simulation, SIL/PIL/HIL

• Targets– Processor & memory specific optimization– Device drivers, board support– Schedulers, RTOS integration C / ASM

Verif

y

MCU DSP FPGA

VHDL / Verilog

Generate Ve

rify

Generate

MATLABMATLAB®® and Simulinkand Simulink®®

Algorithm and System Design

RealReal--Time WorkshopTime WorkshopEmbedded Coder,Embedded Coder,

IDE Link CC, Target TC6IDE Link CC, Target TC6

Code Generation

Page 32: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

31

Avnet SpeedWay Design Workshop™31

Real-Time Workshop® Embedded Coder

• Automatically generates C code from Simulink® models• Code is ANSI/ISO-C compliant, so it can run on any

microprocessor or real-time operating system (DSP/BIOS)• Concisely partitions multi-rate code for efficient scheduling with or

without an RTOS• Provides commenting capabilities to trace code to models and

requirements• Verifies code by importing it into Simulink for software-in-the-loop

testing• Generates optimized code:

– Automatic replacement of math functions and operators with target-specific implementations (TFL)

– Eliminate unnecessary initialization, termination, logging, and error-handling code

– Combine output/update functions to reduce code size– Remove floating-point code from integer-only applications

Page 33: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™32

Demo: Generating C-Code

Page 34: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™33

Real-Time Workshop Embedded Coder

Real-Time Workshop Embedded Coder uses target files to translate models into code that runs in a particular environment.

– You customize or use ready-to-run targets including:

– Optimized for floating-point code– Optimized for fixed-point code– Embedded Target products

You generate code for any processor by specifying integer word sizes and other required target characteristics or by choosing from a list of targets with predefined settings or by creating your own custom target.

Page 35: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™34

Real-Time Workshop Embedded Coder

Automatic replacement of math functions and operators with target-specific implementations (TFL)

Page 36: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

35

Avnet SpeedWay Design Workshop™35

Agenda

• Design exploration for targeting hardware

• Overview of TI DSP architecture and Design Flow

• Overview of Real-Time Workshop Embedded Coder C-Code Generation

• Integrating with the TI DSP Design Flow– Embedded IDE Link CC– Target Support Package TC6

Page 37: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™36

Connecting to TI Processors

Texas InstrumentsTexas InstrumentsCode Composer StudioCode Composer Studio

C & ASM Compile & Link Download

Debug

Verif

y

Real-Time WorkshopEmbedded Coder

RealReal--Time WorkshopTime WorkshopEmbedded CoderEmbedded Coder

IDE Link CCIDE Link CC

Gen

erat

e

Target TC6Target TC6

MATLAB & SimulinkMATLAB & Simulink

• Embedded IDE Link CC• Target Support Package TC6

Page 38: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™37

Embedded IDE Link Demo

DSP Implementation: Automatically Generate CCS projects using Real-Time Workshop Embedded Coder generated code

Two

PIL Verification: Processor-in-the-loop simulation to verify Simulink subsystems executing on target DSP with timing controlled by Simulink model

Three

Code Profiling: Real-time code profiling (stack and run time) with a graphical view and a detailed HTML report

Four

One Automation Interface: Use MATLAB scripts or Simulink models to automate verification and debugging tasks in CCS

Page 39: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

38

Avnet SpeedWay Design Workshop™38

Source Files

Generated Code Successfully Built within CCS

Processor Specific Interrupt Handler and Timer Code

DSP Implementation – Project Creation

Project Generation is the ability to create complete C projects from your Simulink models that can be built in CCS and executed on the DSP. Real-Time Workshop EC generated code can contain processor intrinsics, however, a typical DSP project contains, apart from the algorithm code, other processor specific files such as the scheduler or the Real-Time Operating Systems, memory map files, peripheral drivers etc. Link for CCS in conjunction with RTW-EC will generate some of these processor specific non-algorithmic files that is key to execute the code on the DSP.

Page 40: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

39

Avnet SpeedWay Design Workshop™39

Files Added in Project Creation

Created by Real-Time Workshop Embedded Coder• C and Header Files representing algorithms and

systems as Simulink models

Created by Link for Code Composer Studio• Memory Map – CMD files• Real-Time Scheduler and Timer Code• Interrupt Handling Code (with Hardware Interrupts

blocks)

Here’s a summary of all the files generated during the project generation.

Page 41: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

40

Avnet SpeedWay Design Workshop™40

Simulink Test Bench

TestSignals

CCS Test Bench

VerificationsAlgorithm

Link forCCS

Link forCCS

PILInterface

AlgorithmCode

P-I-L Verification

Fix Coloring

The next important functionality of this product combines both the project generation and automated verification features.

Consider a Simulink model that’s been tested in the simulation using test signals, and visualizing the output.

The next step is to implement and verify just algorithm subsystem on the target processor. RTW-EC can generate the algorithm code, but with the Link, one can also create the necessary test bench interface so the original Simulink model now acts as a test harness for the algorithm code.

We already saw this in our first demo that showed Lane Detection subsystem running on the DM642 DSP.. Let’s see another example, that of an ANC using an LMS filter, and this time, we’ll go through the steps to see how straightforward this process is.

If the code is compiled and tested in CCS once with optimization settings to “Register (-o0)” and again with optimization settings to “File (-o3)”, are both resulting output vectors guaranteed to be the same?

No – Optimization makes tradeoffs which may affect the results.If the code is compiled and tested using Microsoft Visual C++ and then using VisualDSP++, are the output vectors guaranteed to be the same?

No – In addition to differing optimization setting and schemes, the differing size of accumulators and overflow bits may produce different results for edge cases.Is a cycle-accurate Simulator always cycle accurate?

No – Most of the time they are but many customers call them “cycle approximate” simulators. Do you need to verify code developed and tested on a PC with code executing on the target embedded processor?

Yes!

Page 42: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

41

Avnet SpeedWay Design Workshop™41

Profiling real-time code execution on DSP

Code Profiling

• Uses DSP/BIOS statistics to measure execution time

• Profile Stack Usage

The code profiler in Target Support Package TC6 uses DSP/BIOS statistics objects to measure the execution time of code segments generated by individual subsystems. A code profile report helps you identify segments of generated code that are candidates for off-loading to an FPGA co-processor.

In depth technical information on code profiling is available at the following:

http://www.mathworks.com/access/helpdesk/help/toolbox/tic6000/index.html?/access/helpdesk/help/toolbox/tic6000/f8-7016.html

Page 43: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Workshops

42

Avnet SpeedWay Design Workshop™42

• Provides board support peripheral libraries• Added support for optimized C-intrinsics (C callable ASM

libraries)

Target Support Package TC6

Page 44: Avnet Speedway Design Workshop - pub.ro

Avnet SpeedWay Design Workshop™43

Lab #2