26
Embedded DSP Processor Design Application Specific Instruction Set Processors

Embedded DSP Processor Design - Elsevierbooksite.elsevier.com/samplechapters/9780123741233/Sample_Chapters/...Embedded DSP Processor Design: Application Specific Instruction Set Processors

Embed Size (px)

Citation preview

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page i — #1

Embedded DSPProcessor Design

Application Specific InstructionSet Processors

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page ii — #2

The Morgan Kaufmann Series in Systems on SiliconSeries Editor: Wayne Wolf, Georgia Institute of Technology

The Designer’s Guide to VHDL, Second EditionPeter J. Ashenden

The System Designer’s Guide to VHDL-AMSPeter J. Ashenden, Gregory D. Peterson, and Darrell A. Teegarden

Modeling Embedded Systems and SoCsAxel Jantsch

ASIC and FPGA Verification: A Guide to Component ModelingRichard Munden

Multiprocessor Systems-on-ChipsEdited by Ahmed Amine Jerraya and Wayne Wolf

Functional VerificationBruce Wile, John Goss, and Wolfgang Roesner

Customizable and Configurable Embedded ProcessorsEdited by Paolo Ienne and Rainer Leupers

Networks-on-Chips:Technology and ToolsEdited by Giovanni De Micheli and Luca Benini

VLSI Test Principles & ArchitecturesEdited by Laung-Terng Wang, Cheng-Wen Wu, and Xiaoqing Wen

Designing SoCs with Configured ProcessorsSteve Leibson

ESL Design and VerificationGrant Martin, Andrew Piziali, and Brian Bailey

Aspect-Oriented Programming with eDavid Robinson

Reconfigurable Computing:The Theory and Practice of FPGA-Based ComputationEdited by Scott Hauck and André DeHon

System-on-Chip Test ArchitecturesEdited by Laung-Terng Wang, Charles Stroud, and Nur Touba

Verification Techniques for System-Level DesignMasahiro Fujita, Indradeep Ghosh, and Mukul Prasad

VHDL-2008: Just the New StuffPeter J. Ashenden and Jim Lewis

On-Chip Communication Architectures: System on Chip InterconnectSudeep Pasricha and Nikil Dutt

Embedded DSP Processor Design: Application Specific Instruction Set ProcessorsDake Liu

Processor Description Languages: Applications and MethodologiesEdited by Prabhat Mishra and Nikil Dutt

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page iii — #3

Embedded DSPProcessor Design

Application Specific InstructionSet Processors

Dake Liu

AMSTERDAM • BOSTON • HEIDELBERG • LONDON

NEW YORK • OXFORD • PARIS • SAN DIEGO

SAN FRANCISCO • SINGAPORE • SYDNEY •TOKYO

Morgan Kaufmann Publishers is an imprint of Elsevier

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page iv — #4

Morgan Kaufmann Publishers is an imprint of Elsevier.30 Corporate Drive, Suite 400, Burlington, MA 01803, USA

This book is printed on acid-free paper. �©Copyright © 2008 by Elsevier Inc. All rights reserved.

Designations used by companies to distinguish their products are often claimed as trademarks or registeredtrademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appearin initial capital or all capital letters. All trademarks that appear or are otherwise referred to in this work belong totheir respective owners. Neither Morgan Kaufmann Publishers nor the authors and other contributors of thiswork have any relationship or affiliation with such trademark owners nor do such trademark owners confirm,endorse or approve the contents of this work. Readers, however, should contact the appropriate companies formore information regarding trademarks and any related registrations.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by anymeans—electronic, mechanical, photocopying, scanning, or otherwise—without prior written permission of thepublisher.

Permissions may be sought directly from Elsevier’s Science & Technology Rights Department inOxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail: [email protected] may also complete your request online via the Elsevier homepage (http://elsevier.com), byselecting “Support & Contact” then “Copyright and Permission”and then “Obtaining Permissions.”

Library of Congress Cataloging-in-Publication DataLiu, Dake, 1957-

Embedded DSP processor design: application specific instruction set processors / Dake Liu.p. cm. – (The Morgan Kaufmann series in systems on silicon)

Includes index.ISBN 978-0-12-374123-3

1. Embedded computer systems. 2. Signal processing–Digital techniques. 3. Digital integrated circuits.4. Application-specific integrated circuits. I. Title.

TK7895.E42D35 2008621.39’16–dc22

2008012910

ISBN: 978-0-12-374123-3

For information on all Morgan Kaufmann publications,visit our website at www.mkp.com or www.books.elsevier.com

Printed in the United States of America08 09 10 11 12 5 4 3 2 1

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page v — #5

To Meiying and Angie

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page vi — #6

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page vii — #7

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

List of Trademarks and Product Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

CHAPTER 1 Introduction 11.1 How to Read the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 DSP Theory for Hardware Designers . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Review of DSP Theory and Fundamentals . . . . . . . . . . . . 51.2.2 ADC and Finite-Length Modeling . . . . . . . . . . . . . . . . . . . . . . 61.2.3 Digital Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.4 Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.5 Adaptive Filter and Signal Enhancement . . . . . . . . . . . . . . 121.2.6 Random Process and Autocorrelation . . . . . . . . . . . . . . . . . 14

1.3 Theory, Applications, and Implementations . . . . . . . . . . . . . . . . . . . 151.4 DSP Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.4.1 Real-Time Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.2 Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.3 Multimedia Signal Processing Systems . . . . . . . . . . . . . . . . 191.4.4 Review on Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.5 DSP Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.5.1 DSP Implementation on GPP . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5.2 DSP Implementation on GP DSP Processors . . . . . . . . . 251.5.3 DSP Implementation on ASIP . . . . . . . . . . . . . . . . . . . . . . . . . . 261.5.4 DSP Implementation on ASIC. . . . . . . . . . . . . . . . . . . . . . . . . . 261.5.5 Trade-off and Decision of Implementations. . . . . . . . . . . 28

1.6 Review of Processors and Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.6.1 DSP Processor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.6.2 DSP Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301.6.3 Embedded System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 321.6.4 DSP in an Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . 341.6.5 Fundamentals of Embedded Computing . . . . . . . . . . . . . . 35

1.7 Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.7.1 Hardware Design Flow in General . . . . . . . . . . . . . . . . . . . . 361.7.2 ASIP Hardware Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 381.7.3 ASIP Design Automation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 vii

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page viii — #8

viii Contents

CHAPTER 2 Numerical Representation and Finite-Length DSP 472.1 Fixed-Point Numerical Representation . . . . . . . . . . . . . . . . . . . . . . . . 47

2.1.1 An Intuitive Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.1.2 Fixed-Point Numerical Representation. . . . . . . . . . . . . . . . 502.1.3 Fixed-Point Binary Representation . . . . . . . . . . . . . . . . . . . . 512.1.4 Integer Binary Representation . . . . . . . . . . . . . . . . . . . . . . . . . 522.1.5 Fractional Binary Representation. . . . . . . . . . . . . . . . . . . . . . 532.1.6 Fixed-Point Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.1.7 Integer or Fractional. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.1.8 Other Binary Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.2 Data Quality Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652.2.1 Noise, Distortion, Dynamic Range, and Precision. . . . . 652.2.2 Quantitative Concept of Dynamic Range and

Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682.3 Floating-Point Numerical Representation . . . . . . . . . . . . . . . . . . . . . 692.4 Block Floating-Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732.5 DSP Based on Finite Precision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.5.1 The Way of Quantization—Rounding and Truncation 762.5.2 Overflow Saturation and Guards . . . . . . . . . . . . . . . . . . . . . . 782.5.3 Requirements on Guards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812.5.4 Execution Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

2.6 Examples of Corner Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

CHAPTER 3 DSP Architectures 873.1 DSP Subsystem Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873.2 Processor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.2.1 Inside a DSP Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.2.2 DSP (Memory Bus) Architecture. . . . . . . . . . . . . . . . . . . . . . . 913.2.3 Functional Description at Top Architecture Level . . . . 953.2.4 DSP Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.3 Inside a DSP Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013.3.1 The Datapath and Register Bus . . . . . . . . . . . . . . . . . . . . . . . . 1013.3.2 MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013.3.3 ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1033.3.4 Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1043.3.5 Control Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053.3.6 Address Generator (AGU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.4 The Difference between GPP and ASIP DSP . . . . . . . . . . . . . . . . . . . 1093.4.1 The Difference between Designing a GPP

and ASIP DSP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1093.4.2 Comparing DSP Processors to Other Processors . . . . . 1103.4.3 CISC or RISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page ix — #9

Contents ix

3.5 Advanced DSP Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.5.1 DSP with Extreme Specification. . . . . . . . . . . . . . . . . . . . . . . 1163.5.2 ILP DSP Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.5.3 Dual MAC and SIMD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1223.5.4 VLIW and Superscalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283.5.5 On-Chip Multicore DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

CHAPTER 4 DSP ASIP Design Flow 1594.1 Design and Use of ASIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

4.1.1 What Is ASIP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1594.1.2 DSP ASIP Design Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4.2 Understanding Applications Through Profiling. . . . . . . . . . . . . . . . 1624.3 Architecture Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

4.3.1 General Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1634.3.2 Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1684.3.3 Quantitative Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

4.4 Designing Instruction Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1734.5 Designing the Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1744.6 Microarchitecture Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1784.7 Firmware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.7.1 Real-time Firmware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1804.7.2 Firmware with Finite Precision . . . . . . . . . . . . . . . . . . . . . . . . 1814.7.3 Firmware Design Flow for One Application . . . . . . . . . . 1814.7.4 Firmware Design Flow for Multiapplications . . . . . . . . . 183

4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

CHAPTER 5 A Simple DSP Core—The Junior Processor 1875.1 Junior—A Simple DSP Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1875.2 Instruction Set and Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

5.2.1 Load/Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1885.2.2 Addressing for Data Memory Access . . . . . . . . . . . . . . . . . . 1905.2.3 Instructions for Basic Arithmetic Operations . . . . . . . . . 1905.2.4 Logic and Shift Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1915.2.5 Program Flow Control Instructions . . . . . . . . . . . . . . . . . . . 192

5.3 Assembly Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1945.4 Assembly Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

5.4.1 Benchmarking of Block Transfer . . . . . . . . . . . . . . . . . . . . . . . 1995.4.2 Benchmarking of Single-Sample FIR. . . . . . . . . . . . . . . . . . . 1995.4.3 Benchmarking of Frame FIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015.4.4 Benchmarking of Single-Sample Biquad IIR. . . . . . . . . . . 204

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page x — #10

x Contents

5.4.5 Benchmarking of 16-bit Division . . . . . . . . . . . . . . . . . . . . . . 2055.4.6 Benchmarking of Vector Maximum Tracking . . . . . . . . . 2065.4.7 Benchmarking of 8 � 8 DCT . . . . . . . . . . . . . . . . . . . . . . . . . . 2075.4.8 Benchmarking of 256-point FFT . . . . . . . . . . . . . . . . . . . . . . . 2105.4.9 Benchmarking of Windowing. . . . . . . . . . . . . . . . . . . . . . . . . . 211

5.5 Discussion of Junior DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2125.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

CHAPTER 6 Code Profiling for ASIP Design 2176.1 Source Code Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

6.1.1 What Is Source Code Profiling? . . . . . . . . . . . . . . . . . . . . . . . . 2186.1.2 Why Profiling? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2206.1.3 What to Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2216.1.4 How to Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2246.1.5 The Language to Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

6.2 Static Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2266.2.1 Dynamic and Static Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . 2266.2.2 Static Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2266.2.3 Fine-grained Static Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2276.2.4 Coarse-grained Static Profiling . . . . . . . . . . . . . . . . . . . . . . . . . 229

6.3 Dynamic Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2316.3.1 Instrumentation for Coarse-grained Profiling . . . . . . . . . 2316.3.2 Instrumentation for Fine-grained Profiling . . . . . . . . . . . . 2316.3.3 Implement Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

6.4 Use of Reference Assembly Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2346.4.1 Expose Hidden Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2346.4.2 Understanding Assembly Codes . . . . . . . . . . . . . . . . . . . . . . . 235

6.5 Quality Evaluation of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2366.5.1 Evaluating Results of Source Code Profiling . . . . . . . . . . 2366.5.2 Using Profiling Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

CHAPTER 7 Assembly Instruction Set Design 2397.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

7.1.1 Opportunities and Constraints . . . . . . . . . . . . . . . . . . . . . . . . 2397.1.2 Classification of General Instructions . . . . . . . . . . . . . . . . . 2447.1.3 Design of General RISC Subset Instructions . . . . . . . . . . 2457.1.4 Specify CISC Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2487.1.5 For Undergraduates: From Junior to Senior . . . . . . . . . . . 249

7.2 Designing RISC Subset Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xi — #11

Contents xi

7.2.1 Data Access Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2507.2.2 Basic Arithmetic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 2567.2.3 Unsigned ALU Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2647.2.4 Program Flow Control Instructions . . . . . . . . . . . . . . . . . . . 265

7.3 CISC Subset Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2717.3.1 MAC and Multiplication Instructions. . . . . . . . . . . . . . . . . . 2717.3.2 Double-Precision Arithmetic Instructions. . . . . . . . . . . . . 2747.3.3 Other CISC Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

7.4 Accelerated Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2777.4.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2777.4.2 Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

7.5 Instructions for Instruction Level Parallel (ILP) Architecture 2807.5.1 Superscalar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2807.5.2 VLIW Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2807.5.3 SIMD Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

7.6 Memory and Register Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2867.6.1 Register Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2877.6.2 Data Memory Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2907.6.3 Hardware Accelerated Memory Addressing . . . . . . . . . . . 295

7.7 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3017.7.1 Assembly Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3017.7.2 Machine Code Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3047.7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

CHAPTER 8 Software Development Toolchain 3158.1 What Is Toolchain and IDE? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

8.1.1 ASIP User’s View on IDE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3168.1.2 ASIP Designer’s View on IDE. . . . . . . . . . . . . . . . . . . . . . . . . . . 317

8.2 Code Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3188.2.1 Lexical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3198.2.2 Syntax Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3198.2.3 Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

8.3 Profiler and WCET Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3248.4 Compiler Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

8.4.1 Intermediate Code Generation . . . . . . . . . . . . . . . . . . . . . . . . 3268.4.2 Code Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3288.4.3 Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3328.4.4 Error Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3348.4.5 Compiler Generator and Verification

of a Generated Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3358.5 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xii — #12

xii Contents

8.6 Linker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3378.7 Simulator and Debugger Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339

8.7.1 Instruction Set Simulator (ISS). . . . . . . . . . . . . . . . . . . . . . . . . 3418.7.2 Processor Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3498.7.3 Architecture Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

8.8 Debugger and GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3508.8.1 Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3508.8.2 SW Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3518.8.3 GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352

8.9 Evaluation of Programming Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3538.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

CHAPTER 9 Evaluation of an Instruction Set 3579.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357

9.1.1 Benchmarking DSP Kernel Algorithms . . . . . . . . . . . . . . . . 3609.1.2 Some Benchmarking Examples . . . . . . . . . . . . . . . . . . . . . . . . 365

9.2 Instruction Use Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3659.3 Coverage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3669.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

CHAPTER 10 Design of DSP Microarchitecture 36910.1 Introduction to Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

10.1.1 Microarchitecture versus Architecture . . . . . . . . . . . . . . . . 36910.1.2 Microarchitecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

10.2 Microarchitecture-level Components . . . . . . . . . . . . . . . . . . . . . . . . . . 37010.2.1 Basic Logic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37110.2.2 Arithmetic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373

10.3 Hardware Design Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37410.3.1 Function Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37410.3.2 Function Allocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37510.3.3 HW Multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37610.3.4 Scheduling of Hardware Execution . . . . . . . . . . . . . . . . . . . 37910.3.5 Modeling and Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

10.4 Functional Specification at Microarchitecture Level . . . . . . . . . . 38110.4.1 Intermodule Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 38110.4.2 Microarchitecture Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . 38210.4.3 Module Functional Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . 38210.4.4 Finite State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38710.4.5 Truth Table for Coding and Decoding . . . . . . . . . . . . . . . . . 389

10.5 ASIP Microarchitecture Design Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . 39010.5.1 Exposing Microoperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xiii — #13

Contents xiii

10.5.2 Allocation and Partitioning of Microoperations . . . . . . 39110.5.3 Pipeline Scheduling Microoperations . . . . . . . . . . . . . . . . . 39310.5.4 HW Multiplexing of Microoperations . . . . . . . . . . . . . . . . . 39310.5.5 Microoperations Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 394

10.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

CHAPTER 11 Design of Register File and Register Bus 39911.1 Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39911.2 Design of Register Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400

11.2.1 General Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40011.2.2 Design of a Simple Register File . . . . . . . . . . . . . . . . . . . . . . . 40111.2.3 Pipeline around Register File . . . . . . . . . . . . . . . . . . . . . . . . . . 40311.2.4 Special Registers in a General Register File . . . . . . . . . . . 404

11.3 Design of Advanced Register Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40611.3.1 Register File for Cluster Datapath . . . . . . . . . . . . . . . . . . . . . 40611.3.2 Ultra Large Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408

11.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

CHAPTER 12 ALU HW Implementation 41312.1 Arithmetic and Logic Unit (ALU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41312.2 Design of Arithmetic Unit (AU). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415

12.2.1 Implementation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 41512.2.2 Select Kernel Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41612.2.3 Implementing Simple AU Instructions . . . . . . . . . . . . . . . . 41812.2.4 Implementing Special AU Instructions . . . . . . . . . . . . . . . . 423

12.3 Shift and Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42612.3.1 Design a Shifter Using a Shifter Primitive . . . . . . . . . . . . . 42712.3.2 Design a Shifter Using Truth Tables. . . . . . . . . . . . . . . . . . . . 43012.3.3 Logic Operation and Data Manipulation . . . . . . . . . . . . . . 430

12.4 ALU Integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43312.4.1 Preprocessing and Postprocessing . . . . . . . . . . . . . . . . . . . . 43312.4.2 ALU Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

12.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

CHAPTER 13 MAC Hardware Implementation 43913.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439

13.1.1 Review of Convolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43913.1.2 MAC Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xiv — #14

xiv Contents

13.2 MAC Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44213.2.1 MAC Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44213.2.2 Implementing Multiplications . . . . . . . . . . . . . . . . . . . . . . . . . 44213.2.3 Implementing MAC Instructions . . . . . . . . . . . . . . . . . . . . . . 44613.2.4 Implementing Double-Precision Instructions . . . . . . . . 44913.2.5 Accessing ACR Context. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45113.2.6 Flag Operations and Other Postoperations . . . . . . . . . . . 455

13.3 A MAC Design Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45613.4 MAC Integrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

13.4.1 Physical Critical-Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46513.4.2 Pipeline in a MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

13.5 Dual MAC, Multiple MAC, and VLIW . . . . . . . . . . . . . . . . . . . . . . . . . . . 46813.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474

CHAPTER 14 Control Path Design 47514.1 Control Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47514.2 Control Path Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476

14.2.1 Pipeline Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47814.2.2 Interrupt Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483

14.3 Control Path Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48614.3.1 Top-level Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48614.3.2 Design of Program Memory and Peripherals . . . . . . . . . 48814.3.3 Loading Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48914.3.4 Instruction Flow Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 49114.3.5 Loop Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49414.3.6 PC Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49614.3.7 Senior PC FSM Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499

14.4 Instruction Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50214.4.1 Control Signal Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50314.4.2 Decoding Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50514.4.3 Decoding for Exception, Interrupt, Jump,

and Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50514.4.4 Issues of Multicycle Execution . . . . . . . . . . . . . . . . . . . . . . . . 50614.4.5 VLIW Machine Decoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50814.4.6 Decoding for Superscalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509

14.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512

CHAPTER 15 Design of Memory Subsystems 51315.1 Memory and Peripherals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513

15.1.1 Memory Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51315.1.2 Memory Peripheral Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xv — #15

Contents xv

15.2 Design of Memory Addressing Circuitry . . . . . . . . . . . . . . . . . . . . . . . 52415.2.1 General Addressing Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52415.2.2 Modulo Addressing Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527

15.3 Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53115.4 Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532

15.4.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53215.4.2 Memory Hierarchy of DSP Processors . . . . . . . . . . . . . . . . 533

15.5 DMA.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53515.5.1 DMA Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53515.5.2 Configuring a Program for a DMA Task . . . . . . . . . . . . . . . 53915.5.3 A SoC View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543

15.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545

CHAPTER 16 DSP Core Peripherals 54716.1 Peripherals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54716.2 Design a Peripheral Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549

16.2.1 Design of a Common Interface in PeripheralModules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550

16.2.2 Protocol Design of Peripheral Modules . . . . . . . . . . . . . . . 55416.3 Interrupt Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555

16.3.1 Interrupt Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55516.3.2 Interrupt Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55516.3.3 Interrupt Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55716.3.4 Interrupt Handling Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55816.3.5 A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561

16.4 Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56716.5 Direct Memory Access (DMA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570

16.5.1 DMA Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57016.5.2 Design a Simple DMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57316.5.3 Advanced DMA Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58116.5.4 DMA Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

16.6 Serial Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58916.6.1 Bit Synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58916.6.2 Packet Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59216.6.3 Arbitration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59316.6.4 Control of a Serial Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

16.7 Parallel Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59416.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xvi — #16

xvi Contents

CHAPTER 17 Design for DSP Functional Acceleration 59717.1 Functional Acceleration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597

17.1.1 Loosely Connected Accelerator. . . . . . . . . . . . . . . . . . . . . . . . 59817.1.2 Tightly Connected Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . 599

17.2 Accelerator Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60117.2.1 Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60117.2.2 An Accelerator with One Single Instruction . . . . . . . . . . 60117.2.3 An Accelerator with Multiple Instructions . . . . . . . . . . . . 60217.2.4 An Accelerator as a Slave Processor . . . . . . . . . . . . . . . . . . . 603

17.3 Scalable Processor and Accelerator Interface . . . . . . . . . . . . . . . . . 60417.3.1 Configurability and Extendibility . . . . . . . . . . . . . . . . . . . . . . 60417.3.2 Extendible Hardware Interface . . . . . . . . . . . . . . . . . . . . . . . . 60817.3.3 Extendible Programmer Tools . . . . . . . . . . . . . . . . . . . . . . . . . 611

17.4 Accelerator Design Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61617.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618

CHAPTER 18 Real-time Fixed-point DSP Firmware 61918.1 Firmware (FW) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61918.2 Application Modeling Under HW Constraints . . . . . . . . . . . . . . . . . 620

18.2.1 Understanding Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 62018.2.2 Understanding Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62418.2.3 Algorithm Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62618.2.4 Language Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63318.2.5 Real-time Firmware Implementation . . . . . . . . . . . . . . . . . . 63518.2.6 Firmware for Fixed-point Data . . . . . . . . . . . . . . . . . . . . . . . . 638

18.3 Assembly Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64618.3.1 General Flow and C-Compiling . . . . . . . . . . . . . . . . . . . . . . . . 64618.3.2 Plan and Specify for Assembly Coding . . . . . . . . . . . . . . . . 64718.3.3 Fixed-point Assembly Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 64818.3.4 Low Cycle Cost Assembly Coding . . . . . . . . . . . . . . . . . . . . . 64918.3.5 Storage Efficient Assembly Kernels . . . . . . . . . . . . . . . . . . . . 65218.3.6 Function Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65618.3.7 Optimize Control Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658

18.4 Assembly-level Integration and Release . . . . . . . . . . . . . . . . . . . . . . . . 65918.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661

CHAPTER 19 ASIP Integration and Verification 66319.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663

19.1.1 HW Integration of an ASIP Core . . . . . . . . . . . . . . . . . . . . . . . 66519.1.2 Integration of a DSP Subsystem and a DSP Processor 668

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xvii — #17

Contents xvii

19.1.3 HW Integration of a SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67519.1.4 Integration of SoC Simulator. . . . . . . . . . . . . . . . . . . . . . . . . . . 685

19.2 Functional Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68619.2.1 The Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68619.2.2 Verification Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68919.2.3 Verification Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69119.2.4 Speed-up Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69719.2.5 Simulation or Emulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69919.2.6 Verification of an ASIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70019.2.7 Writing Testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700

19.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703

CHAPTER 20 Parallel Streaming Signal Processing 70520.1 Streaming DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705

20.1.1 Streaming Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70520.1.2 Parallel Streaming DSP Processors . . . . . . . . . . . . . . . . . . . . 705

20.2 Parallel Architecture, Divide and Conquer . . . . . . . . . . . . . . . . . . . . . 70720.2.1 Review of Parallel Architectures . . . . . . . . . . . . . . . . . . . . . . . 70720.2.2 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710

20.3 Expose Control Complexities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71220.3.1 General Control Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71220.3.2 Exposing Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71320.3.3 SIMT Architecture for Low-level Parallel

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71620.3.4 Design of Multicore DSP Subsystems . . . . . . . . . . . . . . . . . 721

20.4 Streaming Data Manipulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72620.4.1 Data Complexity of Streaming DSP . . . . . . . . . . . . . . . . . . . 72620.4.2 Data Complexity: Case 1—Video . . . . . . . . . . . . . . . . . . . . . . 72620.4.3 Data Complexity: Case 2—Radio Baseband. . . . . . . . . . . 732

20.5 NoC for Parallel Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73520.5.1 Design Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73520.5.2 Analyses of Parallel Memory Access

for NoC Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73620.6 Parallel Memory Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739

20.6.1 Requirements for Parallel Algorithms . . . . . . . . . . . . . . . . . 73920.6.2 Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74020.6.3 Ultra-large Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743

20.7 P3RMA for Streaming DSP Processors . . . . . . . . . . . . . . . . . . . . . . . . . 74420.7.1 Parallel Vector Scratchpad Memories. . . . . . . . . . . . . . . . . . 74520.7.2 The Memory Subsystem Hardware . . . . . . . . . . . . . . . . . . . . 747

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xviii — #18

xviii Contents

20.7.3 Parallel Programming by Hand . . . . . . . . . . . . . . . . . . . . . . . . 74820.7.4 Programming Toolchain for P3RMA . . . . . . . . . . . . . . . . . . . 754

20.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758

Glossary 761

Appendix 769

Index 771

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xix — #19

Preface

In the late 1990s, when I was preparing a course called “Design of EmbeddedDSP (Digital Signal Processing) Processors”at Linköping University,Sweden, I couldnot find a textbook describing the fundamentals of embedded processor design. Itbecame my first and prime motivation for writing this book. During my work timein industry, I could not find any suitable and comprehensive reference book either,which led to my second motivation for writing such a book. It has been my beliefthat this book will be a valuable textbook or reference book for anyone interested inthe design of embedded systems in all its aspects,from hardware design to firmwaredesign.Although this book was written mainly forASIP (application-specific instruc-tion set processor) or ASIC (application-specific integrated circuit) designers, it willalso benefit software programmers who want more hardware knowledge, such asDSP application engineers.

While reading this book, you will get opportunities to go through the design ofa programmable device for a class of applications, step by step. The material in thisbook is suitable for teaching senior undergraduate students and graduate studentsin Electrical Engineering and Computer Engineering.This book can also be used asa reference book for engineers who are designing or want to design application-specific DSP processors, general processors, accelerators, peripheral modules, andeven microcontrollers. Embedded system designers (e.g., DSP firmware designers)will also benefit from the knowledge of real-time system design elaborated in thisbook. Classical CPU designers will benefit from the exposed difference betweenCPU and ASIP. This book is also a fundamental reference book for researchers.

Fundamental DSP theory and basic digital logic design are addressed in this bookas background knowledge. Very basic concepts and methods were used withoutredundant introduction. Readers without the fundamental knowledge should readrelevant books about DSP theory, logic design, and computer architecture beforereading this book.

DSP, as opposed to general-purpose computing systems, has been a major tech-nology driven by embedded applications and the semiconductor technologies.Thisis evident from the growing market of DSP-based products.The increasing need forDSP and DSP processors can be found everywhere in today’s society in areas suchas multimedia, wireless communications, Internet terminals, car electronics, robot,healthcare, environment monitoring and control, education, scientific computing,industrial control, transportation, and defense.

DSP is used widely for various applications such as data enhancement, datacompression, pattern recognition, simulation, emulation, and optimization. Sig-nal recovery in advanced digital communications is a good example of dataenhancement. Other data enhancement applications include error correction,echocancellation, and noise suppression. Data compression is another important areain today’s daily used facilities: for example, voice, music, image, video, and Internetdata needs to be compressed to fit into a limited bandwidth for transmission and xix

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xx — #20

xx Preface

storage. If voice was not compressed by a DSP processor in a mobile phone, thecost of a mobile phone call would be 10 to 15 times higher. If video signalswere not compressed, DVD players and digital video broadcasting would not bepossible.

Pattern recognition techniques are used for voice recognition, language recogni-tion, and image target recognition for healthcare, car driving, and defense. Physicalsimulation has been used for gaming, training (education), scientific computing,defense, and experiments that are expensive or even impossible to realize in thereal world.

The global market share of DSP processors and microcontrollers is more than95% of the total volume of processors sold in 2006. DSP processors for embeddedapplications have led to a major shift in the semiconductor industry. The sales ofDSP processors have reached 20% of the global semiconductor market since 2002.Taking only the DSP processors in mobile phones as an example, the total sales in2006 was more than $10 billion US.

General-purpose DSP processors (commercial off-the-shelf DSP processors) usu-ally have a high degree of flexibility, a friendly design environment, and sufficientdesign references. General-purpose DSP processors are preferred when require-ments on power, performance, and silicon area are not very critical. When theserequirements are strict, embedded DSP processors as ASIP will become a necessity.Figure P.1 shows the trend of the different DSP market shares. The figure clearlyshows that the future of ASIP DSP is obviously exciting—this is my third motivationfor writing this book.

Most DSP applications can be categorized as streaming signal processing, inwhich the processing speed is higher than the speed of incoming signals. ClassicDSP hardware for streaming signal processing was usually implemented on non-programmable ASICs to minimize the silicon cost. Recently programmability hasbecome a vital issue because the complexity as well as design costs keep goinghigher. Programmability has been required by industries in order to support mul-timodes or multistandard applications. Thanks to the ongoing progress of modern

FIGURE P.1

Trend of different DSP market shares (FreehandDSP, Sweden).

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xxi — #21

Preface xxi

VLSI (very large scale integrated circuits) technologies,programmable features havebeen realistic since the 1990s.

An ongoing trend is that the architecture of MPU (microprocessors as the centralprocessor in personal computers) is converging.The architecture of DSP processorsis diverging. One reason is that applications running on DSPs are diverging. However,the functionalities will be relatively fixed when a DSP processor is embedded in asystem. Another reason is that the requirements are very critical on silicon efficiency,power consumption, and design cost of embedded processors or ASIP.

General-purpose processor designers think of ultimate performance and ultimateflexibility. The instruction set must be general because the application is unknown,and the programmer’s behavior is unknown. ASIP designers have to think aboutthe application and cost first. Usually the biggest challenge for ASIP designers isthe efficiency issue. Based on the carefully specified function coverage, the goalof an ASIP design is to reach the highest performance over silicon and the highestperformance over power consumption as well as the highest performance over thedesign cost. The requirement on flexibility should be sufficient instead of ultimate.The performance is application-specific instead of the highest.This book will exposeand analyze the differences between general-purpose processor designers and ASIPdesigners.

AnASIP is often a SIP (Silicon Intellectual Property or Silicon IP,or IP). More SoC(system on a chip) solutions use ASIP IP. Therefore, the focus of this book will bethe design of IP cores of ASIP for embedded systems on a chip. Silicon IP has beenused as components in silicon chip designs since the mid-1990s. The requirementsfor quality design of silicon IP became higher after 2000 because silicon IP hasbeen well accepted and widely used. In Figure P.2, the system design complexity isdivided into the system design complexity and the component design complexity.Around the middle to late 1980s, RTL components (for example, multipliers andadders) were optimized as the lowest level components of system designs. RTLcomponents took a certain degree of design complexity from the system design sothat the system could be relatively more advanced comparing the system designedon a transistor level. During the mid-1990s, the system design became so advancedand complicated that programmable IP has to be used as the lowest level componentto relax the system design complexity.

Because an IP usually is designed by the third party or another design team, thesystem complexity can therefore be shared. Also, because an IP usually is designedfor multiusers, the design cost usually is shared by multiusers; relatively high IPdesign cost is therefore acceptable. This is the fourth motivation of writing thebook—to show ways to design high quality programmable IP as components formultiusers.

The fourth motivation became even more important when the platform-baseddesign concept was introduced recently.A platform is a partly designed application-specific system that can be used to adapt to a custom design with minimum cost.The platform-based system design requires the minimum design cost while plug-ging a programmable IP on the platform and running firmware on it. It means

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xxii — #22

xxii Preface

FIGURE P.2

Handling complexity of the design using ASIP IP and platform.

that the design of ASIP must be both silicon-efficient and platform-oriented. Theplatform-adaptive ASIP design skills offered by this book will thus be even moreinteresting.

I deeply acknowledge the research and teaching contributions from my PhDstudents in my Division of Computer Engineering, Department of Electrical Engi-neering at Linköping University, Sweden. The labs of the course and part ofthe contents of this book are based on their research work. Di Wu and JohanEilert read through the book and provided enormous suggestions. Per Karlströmmanaged all MS Word problems and figure formatting using his fantastic Word-VBAprogramming skills. Andreas Ehliar went through all code examples in the book.Per Karlström, Johan Eilert, Andreas Ehliar, Di Wu, and Master students VinodhRavinath and Bobo Svangård implemented the Senior DSP processor core.Research engineer Anders (S) Nilsson made the assembler and instruction setsimulator of Senior, the processor used as the example of the book. Acknowledg-ment also goes to other PhD students: Rizwan Asghar, Dr. Anders Nilsson, Dr. EricTell, Dr. Tomas Henriksson, Dr. Daniel Wiklund, Dr. Ulf Nordqvist, and Lic. MikaelOlausson. I thank all Master students who participated in the course “Design ofEmbedded DSP Processors” from 1999 to 2007.

My sincere thanks go to Freehand DSP AB (Ltd.), Sweden (or VIA Tech Swedenafter 2002),a leading company developing DSP processors for communications andhome electronic applications. Special thanks to my friend and boss, CEO HaraldBergh, who went through several chapters and provided very professional andvaluable suggestions. I was the cofounder, CTO, and vice president of Freehand

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xxiii — #23

Preface xxiii

DSP AB (Ltd.) Stockholm, Sweden, during 1999 and 2002, which was later acquiredby VIA Technologies in 2002.

I thank CoresonicAB (Ltd.),Linköping,Sweden,a leading DSP core SIP companyfor programmable radio baseband solutions. I am a cofounder and currently the CTOof this company. I sincerely thank my best friend,Professor Christer Svensson,at theDepartment of Electrical Engineering (ISY),Linköping University, Sweden,who hadbeen my supervisor (1990–1994) during my research toward my technology doctordegree. Christer is the cofounder and the Chairman of the Board of Coresonic.I also thank cofounders Dr. Eric Tell, Dr. Anders Nilsson, and Daniel Svensson formany useful discussions and encouragements. All staff of Coresonic AB are greatlyacknowledged.

I greatly acknowledge the following experts for their insightful discussions:Vodafone chair Professor Gerhard Fettweis,TU Dresden;Professor Christoph Kesslerof Linköping University; Professor Lars Svensson of Chalmers University; ProfessorViktor Öwall of Lund University; Professor Petru Eles of Linköping University;Dr. Carl-Fredrik Lenderson of Sony Ericsson; Professor Dr. Xiaoning Nie of InfineonMunich;Dr. Franz Dielacher,CSO, Infineon connections Villach; and Infineon fellowProfessor Dr. Lajos Gazsi, Düsseldorf.

Finally, the most acknowledgment and gratitude goes to my dear wife Meiyingand my daughter Angie. Without their love, understanding, and support, this bookwould never have been possible.

Dake LiuDecember 2007

Linköping, Sweden

REFERENCES[1] Strauss, W. (2000). Digital signal processing, the new semiconductor industry technology

driver. IEEE Signal Processing Magazine, March, 52–56.

[2] http://www.fwdconcepts.com.

[3] BDTI, DSP selection guide http://www.bdti.com.

[4] Claasen,T. (2006). An industry perspective on current and future state-of-the-art in system-on-chip (SoC) technology. Proceedings of the IEEE 94(6).

[5] http://www.da.isy.liu.se/∼dake.

[6] http://www.viatech.se.

[7] http://www.coresonic.com.

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xxiv — #24

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xxv — #25

List of Trademarks and Product Names

ADI processors

ARM processors

CEVA DSP processors

Coresonic LeoCore processors

FreehandDSP

Freescale

Infineon Camel processor

Intel Pentium and 8x86 processors

NXP EVP16 baseband processor

xxv

“Liu: fm-p374123” — 2008/5/6 — 12:00 — page xxvi — #26

xxvi List of Trademarks and Product Names

Openrisc of Opencores

SPI CELL (Sony Panasonic IBM) processors

TI (Texas Instrument) DSP processors

Xilinx FPGA

Tools and programs include MATLAB andSimulink of Mathworks

Design compiler of Synopsys

LISA and Processor Designer of CoWare

ZSP of LSI

GCC from GNU