13
Embedded DSP Processor Design Application Specific Instruction Set Processors Dake Liu i^ :-t\ AMSTERDAM • BOSTON • HEIDELBERG • LONDON ? T^Wäfll NEW YORK • OXFORD • PARIS • SAN DIEGO 8 * äBpL. SAN FRANCISCO «SINGAPORE« SYDNEY »TOKYO |Vfl ^^ ELSEVIER Morgan Kaufmann Publishcrsis an imprint of Elsevier MORGAN KAUFMANN PUBLISHERS

Embedded DSP Processor Design - gbv.de

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Embedded DSP Processor Design - gbv.de

Embedded DSP Processor Design

Application Specific Instruction Set Processors

Dake Liu

• i^ :-t\ AMSTERDAM • BOSTON • HEIDELBERG • LONDON

? T ^ W ä f l l NEW YORK • OXFORD • PARIS • SAN DIEGO 8

* äBpL. SAN FRANCISCO «SINGAPORE« SYDNEY »TOKYO | V f l ^ ^

E L S E V I E R Morgan Kaufmann Publishcrsis an imprint of Elsevier MORGAN KAUFMANN PUBLISHERS

Page 2: Embedded DSP Processor Design - gbv.de

Contents

Preface xix

List of Trademarks and Product Names xxv

CHAPTER1 Introduction 1 1.1 How to Read the Book 1 1.2 DSPTheory for Hardware Designers 5

1.2.1 Review of DSPTheory and Fundamentals 5 1.2.2 ADC and Finite-Length Modeling 6 1.2.3 Digital Filters 8 1.2.4 Transform 10 1.2.5 Adaptive Filter and Signal Enhancement 12 1.2.6 Random Process and Autocorrelation 14

1.3 Theory, Applications, and Implementations 15 1.4 DSP Applications 17

1.4.1 Real-Time Concept 17 1.4.2 Communication Systems 17 1.4.3 Multimedia Signal Processing Systems 19 1.4.4 Review on Applications 23

1.5 DSP Implementations 24 1.5.1 DSP Implementation on GPP 25 1.5.2 DSP Implementation on GP DSP Processors 25 1.5.3 DSP Implementation onASIP 26 1.5.4 DSP Implementation onASIC 26 1.5.5 Trade-off and Decision of Implementations 28

1.6 Review of Processors and Systems 29 1.6.1 DSP Processor Architecture 29 1.6.2 DSP Firmware 30 1.6.3 Embedded System Overview 32 1.6.4 DSP in an Embedded System 34 1.6.5 Fundamentals of Embedded Computing 35

1.7 Design Flow 36 1.7.1 Hardware Design Flow in General 36 1.7.2 ASIP Hardware Design Flow 38 1.7.3 ASIP Design Automation 40

1.8 Conclusions 43 Exercises 44 References 45 VÜ

Page 3: Embedded DSP Processor Design - gbv.de

viii Contents

CHAPTER 2 Numerical Representation and Finite-Length DSP 47 2.1 Fixed-Point Numerical Representation 47

2.1.1 An Intuitive Example 48 2.1.2 Fixed-Point Numerical Representation 50 2.1.3 Fixed-Point Binary Representation 51 2.1.4 Integer Binary Representation 52 2.1.5 Fractional Binary Representation 53 2.1.6 Fixed-Point Operands 54 2.1.7 Integer or Fractional 55 2.1.8 Other Binary Data Formats 63

2.2 Data Quality Measure 65 2.2.1 Noise, Distortion, Dynamic Range, and Precision 65 2.2.2 Quantitative Concept of Dynamic Range and

Precision 68 2.3 Floating-Point Numerical Representation 69 2.4 Block Floating-Point 73 2.5 DSP Based on Finite Precision 76

2.5.1 The Way of Quantization—Rounding and Truncation 76 2.5.2 Overflow Saturation and Guards 78 2.5.3 Requirements on Guards 81 2.5.4 Execution Order 82

2.6 Examples of Corner Cases 82 2.7 Conclusions 83

Exercises 84 References 85

CHAPTER 3 DSP Architectures 87 3.1 DSP Subsystem Architecture 87 3.2 Processor Architecture 88

3.2.1 Inside a DSP Subsystem 89 3-2.2 DSP (Memory Bus) Architecture 91 3 2.3 Functional Description at Top Architecture Level 95 32.4 DSP Architecture Design 97

3.3 Inside a DSP Core 101 3.31 The Datapath and Register Bus 101 3.3-2 MAC 101 3-3-3 ALU 103 3.3.4 Register File 104 3.3.5 Control Path 105 3-3.6 Address Generator (AGU) 108

3.4 The Difference between GPP and ASIP DSP 109 3.4.1 The Difference between Designing a GPP

and ASIP DSP 109 3.4.2 Comparing DSP Processors to Other Processors 110 3.4.3 CISC or RISC 113

Page 4: Embedded DSP Processor Design - gbv.de

Contents ix

3.5 Advanced DSP Architecture 116 3.5.1 DSP with Extreme Specification 116 3.5.2 ILP DSP Processors 120 3.5.3 Dual MAC and SIMD 122 3.5.4 VLIW and Superscalar 128 3.5.5 On-Chip Multicore DSP 145

3.6 Conclusions 153 Exercises 154 References 157

CHAPTER 4 DSP ASIP Design Flow 159 4.1 Design and Use of ASIP 159

4.1.1 What Is ASIP? 159 4.1.2 DSP ASIP Design Flow 160

4.2 Understanding Applications Through Profiling 162 4.3 Architecture Selection 163

4.31 General Methodology 163 4.3.2 Architectures 168 4.3-3 Quantitative Approach 172

4.4 Designing Instruction Sets 173 4.5 Designing theToolchain 174 4.6 Microarchitecture Design 178 4.7 Firmware Design 179

4.7.1 Real-time Firmware 180 4.7.2 Firmware with Finite Precision 181 4.7.3 Firmware Design Flow for One Application 181 4.7.4 Firmware Design Flow for MultiappHcations 183

4.8 Conclusions 184 Exercises 184 References 185

CHAPTER 5 A Simple DSP Core—The Junior Processor 187 5.1 Junior—A Simple DSP Processor 187 5.2 Instruction Set and Operations 188

5.2.1 Load/Store Instructions 188 5.2.2 Addressing for Data Memory Access 190 5.2.3 Instructions for Basic Arithmetic Operations 190 5.2.4 Logic and Shirt Operations 191 5.2.5 Program Flow Control Instructions 192

5.3 Assembly Coding 194 5.4 Assembly Benchmarking 197

5.4.1 Benchmarking of Block Transfer 199 54.2 Benchmarking of Single-Sample FIR 199 5.4.3 Benchmarking of Frame FIR 201 5.4.4 Benchmarking of Single-Sample Biquad HR 204

Page 5: Embedded DSP Processor Design - gbv.de

x Contents

5.4.5 Benchmarking of 16-bit Division 205 5.4.6 Benchmarking of Vector MaximumTracking 206 5.4.7 Benchmarking of 8 X 8 DCT 207 5.4.8 Benchmarking of 256-point FFT 210 54.9 Benchmarking ofWindowing 211

5.5 Discussion of Junior DSP 212 5.6 Conclusions 214

Exercises 215 References 215

CHAPTER 6 Code Profiling for ASIP Design 217 6.1 Source Code Profiling 217

6.1.1 What Is Source Code Profiling? 218 6.1.2 Why Profiling? 220 6.1.3 What to Profile 221 6.1.4 How to Profile 224 6.1.5 The Language to Profile 225

6.2 Static Profiling 226 6.2.1 Dynamic and Static Profiling 226 6.2.2 Static Profiling 226 6.2.3 Fine-grained Static Profiling 227 6.2.4 Coarse-grained Static Profiling 229

6.3 Dynamic Profiling 231 6.3.1 Instrumentation for Coarse-grained Profiling 231 6.3.2 Instrumentation for Fine-grained Profiling 231 6.3.3 Implement Instrumentation 232

6.4 Use of Reference Assembly Codes 234 6.4.1 Expose Hidden Costs 234 6.4.2 Understanding Assembly Codes 235

6.5 Quality Evaluation of Results 236 6.5.1 Evaluating Results of Source Code Profiling 236 6.5.2 Using Profiling Results 236

6.6 Conclusions 237 Exercises 237 References 237

CHAPTER 7 Assembly Instruction Set Design 239 7.1 Methodology 239

7.1.1 Opportunities and Constraints 239 7.1.2 Classification of General Instructions 244 7.1.3 Design of General RISC Subset Instructions 245 7.1.4 Specify CISC Instructions 248 7.1.5 For Undergraduates: From Junior to Senior 249

7.2 Designing RISC Subset Instructions 250

Page 6: Embedded DSP Processor Design - gbv.de

Contents xi

7.2.1 Data Access Instructions 250 7.2.2 BasicArithmetic Instructions 256 7.2.3 Unsigned ALU Instructions 264 7.2.4 Program Flow Control Instructions 265

7.3 CISC Subset Instructions 271 7.3-1 MAC and Multiplication Instructions 271 7.3-2 Double-Precision Arithmetic Instructions 274 7.3.3 Other CISC Instructions 277

7.4 Accelerated Extensions 277 7.4.1 Challenges 277 7.4.2 Methodology 278

7.5 Instructions for Instruction Level Parallel (ILP)Architecture 280 7.5.1 Superscalar 280 7.5.2 VLIW Instructions 280 7.5.3 SIMD Instructions 282

7.6 Memory and Register Addressing 286 7.6.1 Register Addressing 287 7.6.2 Data Memory Addressing 290 7.6.3 Hardware Accelerated Memory Addressing 295

7.7 Coding 301 7.7.1 Assembly Encoding 301 7.7.2 Machine Code Coding 304 7.7.3 Examples 306

7.8 Conclusions 309 Exercises 310 References 312

CHAPTER 8 Software Development Toolchain 315 8.1 What Is Toolchain and IDE? 315

8.1.1 ASIP User's View on IDE 316 8.1.2 ASIP Designer's View on IDE 317

8.2 Code Analysis 318 8.2.1 LexicalAnalysis 319 8.2.2 Syntax Analysis 319 8.2.3 Semantic Analysis 323

8.3 Profiler and WCET Analyzer 324 8.4 Compiler Overview 326

8.4.1 Intermediate Code Generation 326 8.4.2 Code Optimization 328 8.4.3 Code Generation 332 8.4.4 Error Handler 334 8.4.5 Compiler Generator and Verification

of a Generated Compiler 335 8.5 Assembler 335

Page 7: Embedded DSP Processor Design - gbv.de

xii Contents *

8.6 Unker 337 8.7 Simulator and Debugger Basics 339

8.7.1 Instruction Set Simulator (ISS) 341 8.7.2 Processor Simulator 349 8.7.3 Architecture Simulator 350

8.8 Debugger and GUI 350 8.8.1 Debugger 350 8.8.2 SWDebugging 351 8.8.3 GUI 352

8.9 Evaluation of Programming Tools 353 8.10 Conclusions 354

Exercises 354 References 355

CHAPTER 9 Evaluation of an Instruction Set 357 9.1 Benchmarking 357

9 1 1 Benchmarking DSP Kernel Algorithms 360 9 1 2 Some Benchmarking Examples 365

9.2 Instruction Use Profiling 365 9.3 Coverage Analysis 366 9.4 Conclusions 366

References 367

CHAPTER 10 Design of DSP Microarchitecture 369 10.1 Introduction to Microarchitecture 369

10.1.1 Microarchitecture versus Architecture 369 10.1.2 Microarchitecture Design 370

10.2 Microarchitecture-level Components 370 10.2.1 Basic Logic Components 371 10.2.2 Arithmetic Components 373

10.3 Hardware Design Fundamentals 374 10.31 Function Partitioning 374 10.3.2 FunctionAllocation 375 10.3-3 HW Multiplexing 376 10.3.4 Scheduling of Hardware Execution 379 10.3.5 Modeling and Simulation 381

10.4 Functional Specification at Microarchitecture Level 381 10.4.1 Intermodule Block Diagram 381 10.4.2 Microarchitecture Schematic 382 10.4.3 Module Functional Flowchart 382 10.4.4 Finite State Machine 387 10.4.5 TruthTable for Coding and Decoding 389

10.5 ASIP Microarchitecture Design Flow 390 10.5.1 Exposing Microoperations 391

Page 8: Embedded DSP Processor Design - gbv.de

Contents xiii

10.5.2 Allocation and Partitioning of Microoperations 391 10.5.3 Pipeline Scheduling Microoperations 393 10.54 HW Multiplexing of Microoperations 393 10.5.5 Microoperations Integration 394

10.6 Conclusions 396 Exercises 396 References 397

CHAPTER 11 Design of Register File and Register Bus 399 11.1 Datapath 399 11.2 Design of Register Files 400

11.2.1 General Register File 400 11.2.2 Design of a Simple Register File 401 11.2.3 Pipeline around Register File 403 11.2.4 Special Registers in a General Register File 404

11.3 Design of Advanced Register Files 406 11.3.1 Register File for Cluster Datapath 406 11.3.2 Ultra Large Register File 408

11.4 Conclusions 410 Exercises 410 References 411

CHAPTER 12 ALU HW Implementation 413 12.1 Arithmetic and Logic Unit (ALU) 413 12.2 Design of Arithmetic Unit (AU) 415

12.2.1 Implementation Methodology 415 12.2.2 Select Kernel Components 416 12.2.3 Implementing SimpleAU Instructions 418 12.2.4 Implementing Special AU Instructions 423

12.3 Shirt and Rotation 426 12.31 Design a Shifter Using a Shifter Primitive 427 12.3.2 Design a Shifter UsingTruthTables 430 12.33 Logic Operation and Data Manipulation 430

12.4 ALU Integration 433 12.4.1 Preprocessing and Postprocessing 433 12.4.2 ALU Integration 433

12.5 Conclusions 434 Exercises 435 References 438

CHAPTER 13 MAC Hardware Implementation 439 13.1 Introduction 439

13-1.1 Review of Convolution 439 13.1.2 MAC Fundamentals 440

Page 9: Embedded DSP Processor Design - gbv.de

xiv Contents *

13.2 MAC Implementation 442 13.2.1 MAC Instructions 442 13-2.2 Implementing Multiplications 442 13.2.3 Implementing MAC Instructions 446 13-2.4 Implementing Double-Precision Instructions 449 13.2.5 Accessing ACR Context 451 132.6 Flag Operations and Other Postoperations 455

13.3 A MAC Design Case 456 13.4 MAC Integrations 465

13.4.1 Physical Critical-Path 465 13.4.2 Pipeline in a MAC 466

13.5 Dual MAC, Multiple MAC, and VLIW 468 13.6 Conclusions 470

Exercises 471 References 474

CHAPTER 14 Control Path Design 475 14.1 Control Paths 475 14.2 Control Path Organization 476

14.2.1 Pipeline Consideration 478 14.2.2 Interrupt Management 483

14.3 Control Path Hardware Design 486 14.3-1 Top-level Structure 486 14.3-2 Design of Program Memory and Peripherals 488 14.3-3 Loading Code 489 14.3-4 Instruction Flow Controller 491 14.3.5 Loop Controller 494 14.3.6 PC Stack 496 14.3-7 Senior PC FSM Example 499

14.4 Instruction Decoder 502 14.4.1 Control Signal Decoding 503 14.4.2 Decoding Order 505 14.4.3 Decoding for Exception, Interruptjump,

and Conditional Execution 505 14.4.4 Issues of Multicycle Execution 506 14.4.5 VLIW Machine Decoding 508 14.4.6 Decoding for Superscalar 509

14.5 Conclusions 510 Exercises 510 References 512

CHAPTER 15 Design of Memory Subsystems 513 15.1 Memory and Peripherals 513

15.1.1 Memory Modules 513 15.1.2 Memory Peripheral Circuits 517

Page 10: Embedded DSP Processor Design - gbv.de

Contents xv

15.2 Design of Memory Addressing Circuitry 524 15.2.1 General Addressing Circuit 524 15.2.2 Modulo Addressing Circuit 527

15.3 Buses 531 15.4 Memory Hierarchy 532

15.4.1 Problems 532 154.2 Memory Hierarchy of DSP Processors 533

15.5 DMA 535 15.5.1 DMAConcepts 535 15.5.2 Configuring a Program for a DMA Task 539 15.5.3 A SoCView 543

15.6 Conclusions 543 Exercises 543 References 545

CHAPTER 16 DSP Core Peripherals 547 16.1 Peripherals 547 16.2 Design a Peripheral Module 549

16.2.1 Design of a Common Interface in Peripheral Modules 550

16.2.2 Protocol Design of Peripheral Modules 554 16.3 Interrupt Handler 555

16.3.1 Interrupt Basics 555 16.32 Interrupt Sources 555 16.3.3 Interrupt Requests 557 16.3.4 Interrupt Handling Process 558 16.3.5 A Case Study 561

16.4 Timers 567 16.5 Direct Memory Access (DMA) 570

16.5.1 DMA Basics 570 16.5.2 Design a Simple DMA 573 16.5.3 Advanced DMA Controller 581 16.5.4 DMA Benchmarking 589

16.6 Serial Ports 589 16.6.1 Bit Synchronization 589 16.6.2 Packet Synchronization 592 16.6.3 Arbitration 593 16.6.4 Control of a Serial Port 594

16.7 Parallel Ports 594 16.8 Conclusions 594

Exercises 595 References 596

Page 11: Embedded DSP Processor Design - gbv.de

xvi Contents

CHAPTER 17 Design for DSP Functional Acceleration 597 17.1 Functional Acceleration 597

17.1.1 Loosely Connected Accelerator 598 17.1.2 Tighdy Connected Accelerator 599

17.2 Accelerator Specification 601 17.2.1 Principle 601 17.2.2 An Accelerator with One Single Instruction 601 17.2.3 An Accelerator with Multiple Instructions 602 17.2.4 An Accelerator as a Slave Processor 603

17.3 Scalable Processor and Accelerator Interface 604 17.3.1 Configurability and Extendibility 604 17.3.2 Extendible Hardware Interface 608 17.3.3 Extendible ProgrammerTools 611

17.4 Accelerator Design Flow 616 17.5 Conclusions 6 l6

Exercises 617 References 618

CHAPTER 18 Real-time Fixed-point DSP Firmware 619 18.1 Firmware (FW) 619 18.2 Application Modeling Under HW Constraints 620

18.2.1 UnderstandingApplications 620 18.2.2 Understanding Hardware 624 18.2.3 Algorithm Selection 626 18.2.4 Language Selection 633 18.2.5 Real-time Firmware Implementation 635 18.2.6 Firmware for Fixed-point Data 638

18.3 Assembly Implementation 646 18.3.1 General Flow and C-Compiling 646 18.32 Plan and Specify for Assembly Coding 647 18.3-3 Fixed-point Assembly Kernels 648 18.3.4 Low Cycle Cost Assembly Coding 649 18.3.5 Storage Efficient Assembly Kernels 652 18.3.6 Function Libraries 656 18.3.7 Optimize Control Codes 658

18.4 Assembly-level Integration and Release 659 18.5 Conclusions 661

References 661

CHAPTER 19 ASIP Integration and Verification 663 19.1 Integration 663

19.1.1 HW Integration of an ASIP Core 665 19.1.2 Integration of a DSP Subsystem and a DSP Processor 668

Page 12: Embedded DSP Processor Design - gbv.de

Contents xvii

19.1-3 HW Integration of a SoC 675 19.1.4 Integration of SoC Simulator 685

19.2 Functional Verification 686 19.2.1 The Basics 686 19.2.2 Verification Process 689 19.2.3 VerificationTechniques 691 19-2.4 Speed-upVerification 697 19.2.5 Simulation or Emulation 699 19.2.6 Verification of anASIP 700 19.2.7 WritingTestbench 700

19.3 Conclusions 701 Exercises 703 References 703

CHAPTER 20 Parallel Streaming Signal Processing 705 20.1 Streaming DSP 705

20.1.1 Streaming Signals 705 20.1.2 Parallel Streaming DSP Processors 705

20.2 Parallel Architecture, Divide and Conquer 707 20.2.1 Review of Parallel Architectures 707 20.2.2 Divide and Conquer 710

20.3 Expose Control Complexities 712 20.31 General Control Handling 712 20.3.2 Exposing Challenges 713 20.3.3 SIMTArchitecture for Low-level Parallel

Applications 716 20.3.4 Design of Multicore DSP Subsystems 721

20.4 Streaming Data Manipulations 726 20.4.1 Data Complexity of Streaming DSP 726 20.4.2 Data Complexity: Case 1—Video 726 20.4.3 Data Complexity: Case 2—Radio Baseband 732

20.5 NoC for Parallel Memory Access 735 20.5.1 Design Methods 735 20.5.2 Analyses of Parallel Memory Access

for NoC Design 736 20.6 Parallel Memory Architecture 739

20.6.1 Requirements for Parallel Algorithms 739 20.6.2 Cache 740 20.6.3 Ultra-large Register File 743

20.7 P3RMA for Streaming DSP Processors 744 20.7.1 Parallel Vector Scratchpad Memories 745 20.7.2 The Memory Subsystem Hardware 747

c

Page 13: Embedded DSP Processor Design - gbv.de

xviii Contents

20.7.3 Parallel Programming by Hand 748 20.7.4 Programming Toolchain for P3RMA 754

20.8 Conclusions 757 References 758

Glossary 761

Appendix 769

Index 771