876
Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor Version 1.12 April 3, 2009 Title Page

CellBE Architecture

Embed Size (px)

DESCRIPTION

ps3 architecture

Citation preview

  • Cell Broadband EngineProgramming HandbookIncluding the PowerXCell 8i Processor

    Version 1.12

    April 3, 2009

    Title Page

  • Copyright and Disclaimer Copyright International Business Machines Corporation, Sony Computer Entertainment Inc., Toshiba Corporation 2006, 2009.

    All Rights ReservedPrinted in the United States of America April 2009

    IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occur-rence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trade-marks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark infor-mation at www.ibm.com/legal/copytrade.shtml

    Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.

    Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

    Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

    Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

    UNIX is a registered trademark of The Open Group in the United States and other countries.

    Other company, product, and service names may be trademarks or service marks of others.

    All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this docu-ment was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.

    THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN AS IS BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

    IBM Systems and Technology Group2070 Route 52, Bldg. 330Hopewell Junction, NY 12533-6351

    The IBM home page can be found at ibm.com.The IBM semiconductor solutions home page can be found at ibm.com/chips.

    Version 1.12April 3, 2009

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 3 of 876

    Contents

    List of Figures ............................................................................................................... 19

    List of Tables ................................................................................................................. 23

    Preface ........................................................................................................................... 29Related Publications ............................................................................................................................. 29Conventions and Notation ..................................................................................................................... 30Referencing Registers, Fields, and Bit Ranges .................................................................................... 31Terminology .......................................................................................................................................... 32Reserved Regions of Memory and Registers ....................................................................................... 32

    Revision Log ................................................................................................................. 33

    1. Overview of CBEA Processors ................................................................................ 451.1 Background ..................................................................................................................................... 46

    1.1.1 Motivation .............................................................................................................................. 461.1.2 Power, Memory, and Frequency ........................................................................................... 481.1.3 Scope of this Handbook ........................................................................................................ 48

    1.2 Hardware Environment ................................................................................................................... 491.2.1 The Processor Elements ....................................................................................................... 491.2.2 Element Interconnect Bus ..................................................................................................... 501.2.3 Memory Interface Controller .................................................................................................. 501.2.4 Cell Broadband Engine Interface Unit ................................................................................... 51

    1.3 Programming Environment ............................................................................................................. 521.3.1 Instruction Sets ...................................................................................................................... 521.3.2 Storage Domains and Interfaces ........................................................................................... 521.3.3 Byte Ordering and Bit Numbering .......................................................................................... 541.3.4 Runtime Environment ............................................................................................................ 55

    2. PowerPC Processor Element ................................................................................... 572.1 PowerPC Processor Unit ................................................................................................................ 582.2 PowerPC Processor Storage Subsystem ....................................................................................... 602.3 PPE Registers ................................................................................................................................. 602.4 PowerPC Instructions ...................................................................................................................... 63

    2.4.1 Data Types ............................................................................................................................ 632.4.2 Addressing Modes ................................................................................................................. 632.4.3 Instructions ............................................................................................................................ 64

    2.5 Vector/SIMD Multimedia Extension Instructions ............................................................................. 652.5.1 SIMD Vectorization ................................................................................................................ 652.5.2 Data Types ............................................................................................................................ 672.5.3 Addressing Modes ................................................................................................................. 672.5.4 Instruction Types ................................................................................................................... 682.5.5 Instructions ............................................................................................................................ 682.5.6 Graphics Rounding Mode ...................................................................................................... 68

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 4 of 876

    Version 1.12April 3, 2009

    2.6 Vector/SIMD Multimedia Extension C/C++ Language Intrinsics ..................................................... 682.6.1 Vector Data Types ................................................................................................................. 692.6.2 Vector Literals ........................................................................................................................ 692.6.3 Intrinsics ................................................................................................................................. 69

    3. Synergistic Processor Elements .............................................................................. 713.1 Synergistic Processor Unit .............................................................................................................. 71

    3.1.1 Local Storage ......................................................................................................................... 723.1.2 Register File ........................................................................................................................... 753.1.3 Execution Units ...................................................................................................................... 763.1.4 Floating-Point Support ........................................................................................................... 76

    3.2 Memory Flow Controller .................................................................................................................. 783.2.1 Channels ................................................................................................................................ 803.2.2 Mailboxes and Signalling ....................................................................................................... 803.2.3 MFC Commands and Command Queues .............................................................................. 803.2.4 Direct Memory Access Controller .......................................................................................... 813.2.5 Synergistic Memory Management Unit .................................................................................. 82

    3.3 SPU Instruction Set ......................................................................................................................... 823.3.1 Data Types ............................................................................................................................. 823.3.2 Instructions ............................................................................................................................. 83

    3.4 SPU C/C++ Language Intrinsics ..................................................................................................... 833.4.1 Vector Data Types ................................................................................................................. 843.4.2 Vector Literals ........................................................................................................................ 843.4.3 Intrinsics ................................................................................................................................. 84

    3.5 SPE Isolation Mode ......................................................................................................................... 84

    4. Virtual Storage Environment .................................................................................... 874.1 Introduction ...................................................................................................................................... 874.2 PPE Memory Management ............................................................................................................. 88

    4.2.1 Memory Management Unit ..................................................................................................... 894.2.2 Address-Translation Sequence .............................................................................................. 904.2.3 Enabling Address Translation ................................................................................................ 914.2.4 Effective-to-Real-Address Translation ................................................................................... 914.2.5 Segmentation ......................................................................................................................... 934.2.6 Paging .................................................................................................................................... 954.2.7 Translation Lookaside Buffer ............................................................................................... 1004.2.8 Real Addressing Mode ......................................................................................................... 1084.2.9 Effective Addresses in 32-Bit Mode ..................................................................................... 111

    4.3 SPE Memory Management ........................................................................................................... 1114.3.1 Synergistic Memory Management Unit ................................................................................ 1114.3.2 Enabling Address Translation .............................................................................................. 1124.3.3 Segmentation ....................................................................................................................... 1134.3.4 Paging .................................................................................................................................. 1164.3.5 Translation Lookaside Buffer ............................................................................................... 1164.3.6 Real Addressing Mode ......................................................................................................... 1254.3.7 Exception Handling and Storage Protection ........................................................................ 126

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 5 of 876

    5. Memory Map ............................................................................................................ 1295.1 Introduction ................................................................................................................................... 129

    5.1.1 Configuration-Ring Initialization ........................................................................................... 1315.1.2 Allocated Regions of Memory .............................................................................................. 1315.1.3 Reserved Regions of Memory ............................................................................................. 1345.1.4 The Guarded Attribute ......................................................................................................... 134

    5.2 PPE Memory Map ......................................................................................................................... 1345.2.1 PPE Memory-Mapped Registers ......................................................................................... 1345.2.2 Predefined Real-Address Locations .................................................................................... 135

    5.3 SPE Memory Map ......................................................................................................................... 1355.3.1 SPE Local-Storage Memory Map ........................................................................................ 1365.3.2 SPE Memory-Mapped Registers ......................................................................................... 137

    5.4 BEI Memory-Mapped Registers .................................................................................................... 1385.4.1 I/O ........................................................................................................................................ 139

    6. Cache Management ................................................................................................ 1416.1 PPE Caches .................................................................................................................................. 141

    6.1.1 Configuration ....................................................................................................................... 1426.1.2 Overview of PPE Cache ...................................................................................................... 1426.1.3 L1 Caches ........................................................................................................................... 1446.1.4 Branch History Table and Link Stack .................................................................................. 1496.1.5 L2 Cache ............................................................................................................................. 1496.1.6 Instructions for Managing the L1 and L2 Caches ................................................................ 1546.1.7 Effective-to-Real-Address Translation Arrays ..................................................................... 1576.1.8 Translation Lookaside Buffer ............................................................................................... 1576.1.9 Instruction-Prefetch Queue Management ............................................................................ 1586.1.10 Load Subunit Management ............................................................................................... 158

    6.2 SPE Caches .................................................................................................................................. 1586.2.1 Translation Lookaside Buffer ............................................................................................... 1596.2.2 Atomic Unit and Cache ........................................................................................................ 159

    6.3 Replacement Management Tables ............................................................................................... 1626.3.1 PPE TLB Replacement Management Table ........................................................................ 1626.3.2 PPE L2 Replacement Management Table .......................................................................... 1656.3.3 SPE TLB Replacement Management Table ........................................................................ 166

    6.4 I/O Address-Translation Caches ................................................................................................... 167

    7. I/O Architecture ....................................................................................................... 1697.1 Overview ....................................................................................................................................... 169

    7.1.1 I/O Interfaces ....................................................................................................................... 1697.1.2 System Configurations ........................................................................................................ 1707.1.3 I/O Addressing ..................................................................................................................... 172

    7.2 Data and Access Types ................................................................................................................ 1737.2.1 Data Lengths and Alignments ............................................................................................. 1737.2.2 Atomic Accesses ................................................................................................................. 174

    7.3 Registers and Data Structures ...................................................................................................... 1747.3.1 IOCmd Configuration Register ............................................................................................ 1747.3.2 I/O Segment Table Origin Register ..................................................................................... 1747.3.3 I/O Segment Table .............................................................................................................. 1777.3.4 I/O Page Table .................................................................................................................... 179

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 6 of 876

    Version 1.12April 3, 2009

    7.3.5 IOC Base Address Registers ............................................................................................... 1827.3.6 I/O Exception Status Register .............................................................................................. 184

    7.4 Inbound I/O Address Translation ................................................................................................... 1847.4.1 Translation Overview ........................................................................................................... 1847.4.2 Translation Steps ................................................................................................................. 186

    7.5 I/O Exceptions ............................................................................................................................... 1887.5.1 I/O Exception Causes .......................................................................................................... 1887.5.2 I/O Exception Status Register .............................................................................................. 1897.5.3 I/O Exception Mask Register ............................................................................................... 1897.5.4 I/O-Exception Response ...................................................................................................... 189

    7.6 I/O Address-Translation Caches ................................................................................................... 1897.6.1 IOST Cache ......................................................................................................................... 1897.6.2 IOPT Cache ......................................................................................................................... 191

    7.7 I/O Storage Model ......................................................................................................................... 1967.7.1 Memory Coherence ............................................................................................................. 1967.7.2 Storage-Access Ordering ..................................................................................................... 1977.7.3 I/O Accesses to Other I/O Units through an IOIF ................................................................. 2027.7.4 Examples ............................................................................................................................. 202

    8. Resource Allocation Management ......................................................................... 2098.1 Introduction .................................................................................................................................... 2098.2 Requesters .................................................................................................................................... 212

    8.2.1 PPE and SPEs ..................................................................................................................... 2128.2.2 I/O ........................................................................................................................................ 212

    8.3 Managed Resources ..................................................................................................................... 2138.4 Tokens ........................................................................................................................................... 214

    8.4.1 Tokens Required for Single-CBEA-Processor Systems ...................................................... 2148.4.2 Operations Requiring No Token .......................................................................................... 2188.4.3 Tokens Required for Multi-CBEA-Processor Systems ......................................................... 219

    8.5 Token Manager ............................................................................................................................. 2198.5.1 Request Tracking ................................................................................................................. 2198.5.2 Token Granting .................................................................................................................... 2208.5.3 Unallocated RAG ................................................................................................................. 2218.5.4 High-Priority Token Requests .............................................................................................. 2228.5.5 Memory Tokens ................................................................................................................... 2228.5.6 I/O Tokens ........................................................................................................................... 2268.5.7 Unused Tokens .................................................................................................................... 2268.5.8 Memory Banks, IOIF Allocation Rates, and Unused Tokens ............................................... 2268.5.9 Token Request and Grant Example ..................................................................................... 2278.5.10 Allocation Percentages ...................................................................................................... 2318.5.11 Efficient Determination of TKM Priority Register Values .................................................... 2328.5.12 Feedback from Resources to Token Manager ................................................................... 234

    8.6 Configuration of PPE, SPEs, MIC, and IOC .................................................................................. 2358.6.1 Configuration Register Summary ......................................................................................... 2358.6.2 SPE Address-Range Checking ............................................................................................ 237

    8.7 Changing Resource-Management Registers with MMIO Stores ................................................... 2398.7.1 Changes to the RAID ........................................................................................................... 2398.7.2 Changing a Requesters Token-Request Enable ................................................................. 2408.7.3 Changing a Requesters Address Map ................................................................................ 241

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 7 of 876

    8.7.4 Changing a Requesters Use of Multiple Tokens per Access .............................................. 2428.7.5 Changing Feedback to the TKM .......................................................................................... 2428.7.6 Changing TKM Registers .................................................................................................... 242

    8.8 Latency Between Token Requests and Token Grants .................................................................. 2438.9 Hypervisor Interfaces .................................................................................................................... 243

    9. PPE Interrupts ......................................................................................................... 2459.1 Introduction ................................................................................................................................... 2459.2 Summary of Interrupt Architecture ................................................................................................ 2469.3 Interrupt Registers ......................................................................................................................... 2509.4 Interrupt Handling .......................................................................................................................... 2519.5 Interrupt Vectors and Definitions ................................................................................................... 252

    9.5.1 System Reset Interrupt (Selectable or x00..00000100) ..................................................... 2549.5.2 Machine Check Interrupt (x00..00000200) ......................................................................... 2559.5.3 Data Storage Interrupt (x00..00000300) ............................................................................ 2579.5.4 Data Segment Interrupt (x00..00000380) .......................................................................... 2589.5.5 Instruction Storage Interrupt (x00..00000400) ................................................................... 2599.5.6 Instruction Segment Interrupt (x00..00000480) ................................................................. 2609.5.7 External Interrupt (x00..00000500) .................................................................................... 2609.5.8 Alignment Interrupt (x00..00000600) ................................................................................. 2619.5.9 Program Interrupt (x00..00000700) .................................................................................... 2629.5.10 Floating-Point Unavailable Interrupt (x00..00000800) ..................................................... 2639.5.11 Decrementer Interrupt (x00..00000900) ........................................................................... 2639.5.12 Hypervisor Decrementer Interrupt (x00..00000980) ........................................................ 2649.5.13 System Call Interrupt (x00..00000C00) ............................................................................ 2649.5.14 Trace Interrupt (x00..00000D00) ...................................................................................... 2659.5.15 VXU Unavailable Interrupt (x00..00000F20) .................................................................... 2669.5.16 System Error Interrupt (x00..00001200) .......................................................................... 2669.5.17 Maintenance Interrupt (x00..00001600) ........................................................................... 2679.5.18 Thermal Management Interrupt (x00..00001800) ............................................................ 269

    9.6 Direct External Interrupts .............................................................................................................. 2719.6.1 Interrupt Presentation .......................................................................................................... 2719.6.2 IIC Interrupt Registers ......................................................................................................... 2729.6.3 SPU and MFC Interrupts ..................................................................................................... 2779.6.4 Other External Interrupts ..................................................................................................... 278

    9.7 Mediated External Interrupts ......................................................................................................... 2839.8 SPU and MFC Interrupts Routed to the PPE ................................................................................ 284

    9.8.1 Interrupt Types and Classes ................................................................................................ 2849.8.2 Interrupt Registers ............................................................................................................... 2869.8.3 Interrupt Definitions ............................................................................................................. 2909.8.4 Handling SPU and MFC Interrupts ...................................................................................... 292

    9.9 Thread Targets for Interrupts ........................................................................................................ 2949.10 Interrupt Priorities ........................................................................................................................ 2959.11 Interrupt Latencies ...................................................................................................................... 2969.12 Machine State Register Settings Due to Interrupts ..................................................................... 2979.13 Interrupts and Hypervisor ............................................................................................................ 2989.14 Interrupts and Multithreading ...................................................................................................... 2989.15 Checkstop ................................................................................................................................... 298

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 8 of 876

    Version 1.12April 3, 2009

    9.16 Use of an External Interrupt Controller ........................................................................................ 2999.17 Relationship Between CBEA Processor and PowerPC Interrupts .............................................. 299

    10. PPE Multithreading ................................................................................................ 30110.1 Multithreading Guidelines ............................................................................................................ 30110.2 Thread Resources ....................................................................................................................... 303

    10.2.1 Registers ............................................................................................................................ 30310.2.2 Arrays, Queues, and Other Structures ............................................................................... 30410.2.3 Pipeline Sharing and Support for Multithreading ............................................................... 305

    10.3 Thread States .............................................................................................................................. 30710.3.1 Privilege States .................................................................................................................. 30710.3.2 Suspended or Enabled State ............................................................................................. 30810.3.3 Blocked or Stalled State ..................................................................................................... 308

    10.4 Thread Control and Status Registers .......................................................................................... 30810.4.1 Machine State Register (MSR) .......................................................................................... 30910.4.2 Hardware Implementation Register 0 (HID0) ..................................................................... 31010.4.3 Logical Partition Control Register (LPCR) ......................................................................... 31110.4.4 Control Register (CTRL) .................................................................................................... 31210.4.5 Thread Status Register Local and Remote (TSRL and TSRR) ......................................... 31310.4.6 Thread Switch Control Register (TSCR) ............................................................................ 31410.4.7 Thread Switch Time-Out Register (TTR) ........................................................................... 315

    10.5 Thread Priority ............................................................................................................................. 31510.5.1 Thread-Priority Combinations ............................................................................................ 31510.5.2 Choosing Useful Thread Priorities ..................................................................................... 31610.5.3 Examples of Priority Combinations on Instruction Scheduling ........................................... 318

    10.6 Thread Control and Configuration ............................................................................................... 32110.6.1 Resuming and Suspending Threads .................................................................................. 32110.6.2 Setting the Instruction-Dispatch Policy: Thread Priority and Temporary Stalling ............... 32110.6.3 Preventing Starvation: Forward-Progress Monitoring ........................................................ 32310.6.4 Multithreading Operating-State Switch .............................................................................. 324

    10.7 Pipeline Events and Instruction Dispatch .................................................................................... 32410.7.1 Instruction-Dispatch Rules ................................................................................................. 32410.7.2 Pipeline Events that Stall Instruction Dispatch ................................................................... 325

    10.8 Suspending and Resuming Threads ........................................................................................... 32710.8.1 Suspending a Thread ......................................................................................................... 32710.8.2 Resuming a Thread ........................................................................................................... 32710.8.3 Exception and Interrupt Interactions With a Suspended Thread ....................................... 32910.8.4 Thread Targets and Behavior for Interrupts ....................................................................... 330

    11. Logical Partitions and a Hypervisor .................................................................... 33311.1 Introduction .................................................................................................................................. 333

    11.1.1 The Hypervisor and the Operating Systems ...................................................................... 33411.1.2 Partitioning Resources ....................................................................................................... 33411.1.3 An Example Flowchart ....................................................................................................... 335

    11.2 PPE Logical-Partitioning Facilities ............................................................................................... 33711.2.1 Enabling Hypervisor State ................................................................................................. 33711.2.2 Hypervisor-State Registers ................................................................................................ 33711.2.3 Real Memory Access Control ............................................................................................ 33811.2.4 Controlling Interrupts and Environment ............................................................................. 344

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 9 of 876

    11.3 SPE Logical-Partitioning Facilities .............................................................................................. 34711.3.1 Access Privilege ................................................................................................................ 34711.3.2 Memory-Management Facilities ........................................................................................ 34711.3.3 Controlling Interrupts ......................................................................................................... 35011.3.4 Other SPE Management Facilities .................................................................................... 350

    11.4 I/O Address Translation .............................................................................................................. 35211.4.1 IOC Memory Management Units ....................................................................................... 35211.4.2 I/O Segment and Page Tables .......................................................................................... 352

    11.5 Resource Allocation Management .............................................................................................. 35311.5.1 Combining Logical Partitions with Resource Allocation ..................................................... 35311.5.2 Resource Allocation Groups and the Token Manager ....................................................... 353

    11.6 Power Management .................................................................................................................... 35411.6.1 Entering Low-Power States ............................................................................................... 35411.6.2 Thread State Suspension and Resumption ....................................................................... 354

    11.7 Fault Isolation .............................................................................................................................. 35511.8 Code Sample .............................................................................................................................. 355

    11.8.1 Error Codes ....................................................................................................................... 35511.8.2 C Functions for PowerPC 64-bit ELF Hypervisor Call ....................................................... 356

    12. SPE Context Switching ........................................................................................ 35912.1 Introduction ................................................................................................................................. 35912.2 Data Structures ........................................................................................................................... 360

    12.2.1 Local Storage Context Save Area ..................................................................................... 36012.2.2 Context Save Area ............................................................................................................ 360

    12.3 Overview of SPE Context-Switch Sequence ............................................................................... 36012.3.1 Save SPE Context ............................................................................................................. 36212.3.2 Restore SPE Context ........................................................................................................ 362

    12.4 Implementation Considerations ................................................................................................... 36412.4.1 Locking .............................................................................................................................. 36412.4.2 Watchdog Timers .............................................................................................................. 36412.4.3 Waiting for Events ............................................................................................................. 36412.4.4 PPEs SPU Channel Access Facility ................................................................................. 36412.4.5 SPE Interrupts ................................................................................................................... 36412.4.6 Suspending the MFC DMA Queue .................................................................................... 36512.4.7 SPE Context-Save Sequence and Context-Restore Sequence Code .............................. 36512.4.8 SPE Parameter Passing .................................................................................................... 36512.4.9 Storage for SPE Context-Save Sequence and Context-Restore Sequence Code ............ 36512.4.10 Harvesting an SPE .......................................................................................................... 36612.4.11 Scheduling ....................................................................................................................... 36612.4.12 Light-Weight SPE Context Save ...................................................................................... 366

    12.5 Detailed Steps for SPE Context Switch ...................................................................................... 36712.5.1 Context-Save Sequence .................................................................................................... 36712.5.2 Context-Restore Sequence ............................................................................................... 373

    12.6 Considerations for Hypervisors ................................................................................................... 381

    13. Time Base and Decrementers .............................................................................. 38313.1 Introduction ................................................................................................................................. 383

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 10 of 876

    Version 1.12April 3, 2009

    13.2 Time-Base Facility ....................................................................................................................... 38313.2.1 Clock Domains ................................................................................................................... 38313.2.2 Time-Base Registers ......................................................................................................... 38413.2.3 Time-Base Frequency ........................................................................................................ 38513.2.4 Time-Base Sync Mode Controls ........................................................................................ 38613.2.5 Reading and Writing the TB Register ................................................................................ 39013.2.6 Computing Time-of-Day ..................................................................................................... 391

    13.3 Decrementers .............................................................................................................................. 39113.3.1 PPE Decrementers ............................................................................................................ 39113.3.2 SPE Decrementers ............................................................................................................ 39313.3.3 Using an SPU Decrementer to Monitor SPU Code Performance ...................................... 393

    14. Objects, Executables, and SPE Loading ............................................................. 39914.1 Introduction .................................................................................................................................. 39914.2 ELF Overview and Extensions .................................................................................................... 400

    14.2.1 Overview ............................................................................................................................ 40014.2.2 SPE-ELF Extensions ......................................................................................................... 401

    14.3 Runtime Initializations and Requirements ................................................................................... 40314.3.1 PPE Initial Machine State .................................................................................................. 40314.3.2 SPE Initial Machine State for Linux .................................................................................... 407

    14.4 Linker Requirements ................................................................................................................... 40914.4.1 SPE Linker Requirements .................................................................................................. 40914.4.2 PPE Linker Requirements .................................................................................................. 410

    14.5 The CESOF Format .................................................................................................................... 41014.5.1 CESOF Overview ............................................................................................................... 41114.5.2 CESOF Use Convention of ELF ........................................................................................ 41114.5.3 Embedding an SPE-ELF Executable in a PPE-ELF Object: The .spu.elf Section ............. 41214.5.4 The spe_program_handle Data Structure .......................................................................... 41314.5.5 The TOE: Accessing Symbol Values Defined in EA Space ............................................... 41514.5.6 Future Software Tool Chain Enhancements for CESOF ................................................... 419

    14.6 SPE Runtime Loader ................................................................................................................... 42014.6.1 Runtime Loader Overview ................................................................................................. 42014.6.2 SPE Runtime Loader Requirements .................................................................................. 42114.6.3 Example SPE Runtime Loader Framework Definition ....................................................... 423

    14.7 SPE Execution Environment ....................................................................................................... 42914.7.1 Signal Types for the SPE Stop-and-Signal Instruction ...................................................... 429

    15. Power and Thermal Management ........................................................................ 43115.1 Power Management .................................................................................................................... 431

    15.1.1 Slow State .......................................................................................................................... 43215.1.2 PPE Pause (0) State .......................................................................................................... 43315.1.3 SPU Pause State ............................................................................................................... 43415.1.4 MFC Pause State ............................................................................................................... 434

    15.2 Thermal Management ................................................................................................................. 43415.2.1 Thermal-Management Operation ....................................................................................... 43515.2.2 Configuration-Ring Settings ............................................................................................... 43715.2.3 Thermal Registers .............................................................................................................. 43715.2.4 Thermal Sensor Status Registers ...................................................................................... 437

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 11 of 876

    15.2.5 Thermal Sensor Interrupt Registers .................................................................................. 43815.2.6 Dynamic Thermal-Management Registers ........................................................................ 440

    16. Performance Monitoring ...................................................................................... 44516.1 How It Works ............................................................................................................................... 44616.2 Events (Signals) .......................................................................................................................... 44616.3 Performance Counters ................................................................................................................ 44616.4 Trace Array ................................................................................................................................. 447

    17. SPE Channel and Related MMIO Interface ......................................................... 44917.1 Introduction ................................................................................................................................. 449

    17.1.1 An SPEs Use of its Own Channels ................................................................................... 44917.1.2 Access to Channel Functions by the PPE and other SPEs ............................................... 45017.1.3 Channel Characteristics .................................................................................................... 45017.1.4 Channel Summary ............................................................................................................. 45117.1.5 Channel Instructions .......................................................................................................... 45417.1.6 Channel Capacity and Blocking ......................................................................................... 455

    17.2 SPU Event-Management Channels ............................................................................................ 45517.3 SPU Signal-Notification Channels ............................................................................................... 45617.4 SPU Decrementer ....................................................................................................................... 456

    17.4.1 SPU Write Decrementer Channel ...................................................................................... 45617.4.2 SPU Read Decrementer Channel ..................................................................................... 457

    17.5 MFC Write Multisource Synchronization Request Channel ........................................................ 45717.6 SPU Read Machine Status Channel ........................................................................................... 45817.7 SPU Write State Save-and-Restore Channel ............................................................................. 45817.8 SPU Read State Save-and-Restore Channel ............................................................................. 45917.9 MFC Command Parameter Channels ......................................................................................... 459

    17.9.1 MFC Local Storage Address Channel ............................................................................... 46117.9.2 MFC Effective Address High Channel ............................................................................... 46217.9.3 MFC Effective Address Low or List Address Channel ....................................................... 46217.9.4 MFC Transfer Size or List Size Channel ........................................................................... 46317.9.5 MFC Command Tag Identification Channel ...................................................................... 46417.9.6 MFC Class ID and MFC Command Opcode Channel ....................................................... 465

    17.10 MFC Tag-Group Management Channels .................................................................................. 46517.10.1 MFC Write Tag-Group Query Mask Channel .................................................................. 46617.10.2 MFC Read Tag-Group Query Mask Channel .................................................................. 46617.10.3 MFC Write Tag Status Update Request Channel ............................................................ 46617.10.4 MFC Read Tag-Group Status Channel ........................................................................... 46817.10.5 MFC Read List Stall-and-Notify Tag Status Channel ...................................................... 46817.10.6 MFC Write List Stall-and-Notify Tag Acknowledgment Channel ..................................... 469

    17.11 MFC Read Atomic Command Status Channel .......................................................................... 47017.12 SPU Mailbox Channels ............................................................................................................. 471

    18. SPE Events ............................................................................................................ 47318.1 Introduction ................................................................................................................................. 47318.2 Events and Event-Management Channels .................................................................................. 474

    18.2.1 Event Conditions and Bit Definitions for Event-Management Channels ............................ 47418.2.2 Pending Event Register (Internal, SPE-Hidden) ................................................................ 476

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 12 of 876

    Version 1.12April 3, 2009

    18.2.3 SPU Read Event Status ..................................................................................................... 47618.2.4 SPU Write Event Mask ...................................................................................................... 47718.2.5 SPU Write Event Acknowledgment .................................................................................... 47718.2.6 SPU Read Event Mask ...................................................................................................... 478

    18.3 SPU Interrupt Facility .................................................................................................................. 47818.4 Interrupt Address Save-and-Restore Channels .......................................................................... 479

    18.4.1 SPU Read State Save-and-Restore .................................................................................. 47918.4.2 SPU Write State Save-and-Restore ................................................................................... 47918.4.3 Nested Interrupts Using SPU Write State Save-and-Restore ............................................ 480

    18.5 Event-Handling Protocols ............................................................................................................ 48018.5.1 Synchronous Event Handling Using Polling or Stalling ...................................................... 48018.5.2 Asynchronous Event Handling Using Interrupts ................................................................ 48118.5.3 Protecting Critical Sections from Interruption ..................................................................... 482

    18.6 Event-Specific Handling Guidelines ............................................................................................ 48318.6.1 Protocol with Multiple Events Enabled ............................................................................... 48318.6.2 Procedure for Handling the Multisource Synchronization Event ........................................ 48518.6.3 Procedure for Handling the Privileged Attention Event ...................................................... 48618.6.4 Procedure for Handling the Lock-Line Reservation Lost Event ......................................... 48718.6.5 Procedure for Handling the Signal-Notification 1 Available Event ..................................... 48818.6.6 Procedure for Handling the Signal-Notification 2 Available Event ..................................... 48918.6.7 Procedure for Handling the SPU Write Outbound Mailbox Available Event ...................... 49018.6.8 Procedure for Handling the SPU Write Outbound Interrupt Mailbox Available Event ........ 49118.6.9 Procedure for Handling the SPU Decrementer Event ........................................................ 49118.6.10 Procedure for Handling the SPU Read Inbound Mailbox Available Event ....................... 49318.6.11 Procedure for Handling the MFC SPU Command Queue Available Event ...................... 49418.6.12 Procedure for Handling the DMA List Command Stall-and-Notify Event ......................... 49418.6.13 Procedure for Handling the Tag-Group Status Update Event .......................................... 496

    18.7 Developing a Basic Interrupt Handler .......................................................................................... 49718.7.1 Basic Interrupt Protocol Features and Design ................................................................... 49718.7.2 FLIH Design ....................................................................................................................... 49818.7.3 SLIH Design and Registering SLIH Functions ................................................................... 50018.7.4 Example Application Code ................................................................................................. 502

    18.8 Nested Interrupt Handling ........................................................................................................... 50318.8.1 Nested Handler Design ...................................................................................................... 50418.8.2 FLIH Design for Nested Interrupts ..................................................................................... 504

    18.9 Using a Dedicated Interrupt Stack ............................................................................................... 50618.10 Sample Applications .................................................................................................................. 508

    18.10.1 SPU Decrementer Event .................................................................................................. 50818.10.2 Tag-Group Status Update Event ...................................................................................... 50918.10.3 DMA List Command Stall-and-Notify Event ..................................................................... 51018.10.4 MFC SPU Command Queue Available Event .................................................................. 51218.10.5 SPU Read Inbound Mailbox Available Event ................................................................... 51318.10.6 SPU Signal-Notification Available Event .......................................................................... 51318.10.7 Lock-Line Reservation Lost Event ................................................................................... 51318.10.8 Privileged Attention Event ................................................................................................ 514

    19. DMA Transfers and Interprocessor Communication ......................................... 51519.1 Introduction .................................................................................................................................. 515

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 13 of 876

    19.2 MFC Commands ......................................................................................................................... 51619.2.1 DMA Commands ............................................................................................................... 51819.2.2 DMA List Commands ......................................................................................................... 52019.2.3 Synchronization Commands .............................................................................................. 52019.2.4 Atomic Update Commands ................................................................................................ 52019.2.5 Command Modifiers .......................................................................................................... 52119.2.6 Tag Groups ........................................................................................................................ 52119.2.7 MFC Command Issue ........................................................................................................ 52319.2.8 Replacement Class ID and Transfer Class ID ................................................................... 52319.2.9 DMA-Command Completion .............................................................................................. 525

    19.3 PPE-Initiated DMA Transfers ...................................................................................................... 52519.3.1 MFC Command Issue ........................................................................................................ 52519.3.2 MFC Command-Queue Control Registers ........................................................................ 52719.3.3 DMA-Command Issue Status and Errors .......................................................................... 527

    19.4 SPE-Initiated DMA Transfers ...................................................................................................... 53119.4.1 MFC Command Issue ........................................................................................................ 53219.4.2 MFC Command-Queue Monitoring Channels ................................................................... 53319.4.3 DMA Command Issue Status and Errors .......................................................................... 53419.4.4 DMA List Command Example ........................................................................................... 538

    19.5 Performance Guidelines for MFC Commands ............................................................................ 54119.6 Mailboxes .................................................................................................................................... 541

    19.6.1 Reading and Writing Mailboxes ......................................................................................... 54219.6.2 Mailbox Blocking ................................................................................................................ 54319.6.3 Dealing with Anticipated Messages ................................................................................... 54319.6.4 Uses of Mailboxes ............................................................................................................. 54419.6.5 SPU Outbound Mailboxes ................................................................................................. 54419.6.6 SPU Inbound Mailbox ........................................................................................................ 549

    19.7 Signal Notification ....................................................................................................................... 55319.7.1 SPU Signalling Channels .................................................................................................. 55319.7.2 Uses of Signaling ............................................................................................................... 55419.7.3 Mode Configuration ........................................................................................................... 55519.7.4 SPU Signal Notification 1 Channel .................................................................................... 55519.7.5 SPU Signal Notification 2 Channel .................................................................................... 55519.7.6 Sending Signals ................................................................................................................. 55519.7.7 Receiving Signals .............................................................................................................. 55919.7.8 Differences Between Mailboxes and Signal Notification ................................................... 561

    20. Shared-Storage Synchronization ........................................................................ 56320.1 Shared-Storage Ordering ............................................................................................................ 563

    20.1.1 Storage Model ................................................................................................................... 56320.1.2 PPE Ordering Instructions ................................................................................................. 56520.1.3 SPU Ordering Instructions ................................................................................................. 56920.1.4 MFC Ordering Mechanisms ............................................................................................... 57220.1.5 MFC Multisource Synchronization Facility ......................................................................... 57820.1.6 Scenarios for Using Ordering Mechanisms ....................................................................... 585

    20.2 PPE Atomic Synchronization ...................................................................................................... 58620.2.1 Atomic Synchronization Instructions .................................................................................. 58620.2.2 PPE Synchronization Primitives ........................................................................................ 588

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 14 of 876

    Version 1.12April 3, 2009

    20.3 SPE Atomic Synchronization ....................................................................................................... 59120.3.1 MFC Commands for Atomic Updates ................................................................................ 59120.3.2 The MFC Read Atomic Command Status Channel ........................................................... 59320.3.3 Avoiding Livelocks ............................................................................................................. 59320.3.4 Synchronization Primitives ................................................................................................. 595

    21. Parallel Programming ........................................................................................... 60321.1 Challenges .................................................................................................................................. 60321.2 Patterns of Parallel Programming ............................................................................................... 603

    21.2.1 Terminology ....................................................................................................................... 60421.2.2 Finding Parallelism ............................................................................................................. 60521.2.3 Strategies for Parallel Programming .................................................................................. 606

    21.3 Steps for Parallelizing a Program ................................................................................................ 60821.3.1 Step 1: Understand the Problem ........................................................................................ 60821.3.2 Step 2: Choose Programming Tools and Technology ....................................................... 60821.3.3 Step 3: Develop High-Level Parallelization Strategy ......................................................... 60921.3.4 Step 4: Develop Low-Level Parallelization Strategy .......................................................... 60921.3.5 Step 5: Design Data Structures for Efficient Processing .................................................... 60921.3.6 Step 6: Iterate and Refine .................................................................................................. 61021.3.7 Step 7: Fine-Tune .............................................................................................................. 610

    21.4 Levels of Parallelism in the CBEA Processors ............................................................................ 61121.4.1 SIMD Parallelization ........................................................................................................... 61221.4.2 Superscalar Parallelization ................................................................................................ 61221.4.3 Hardware Multithreading .................................................................................................... 61221.4.4 Multiple Execution Units ..................................................................................................... 61221.4.5 Multiple CBEA Processors ................................................................................................. 613

    21.5 Tools for Parallelization ............................................................................................................... 61421.5.1 Language Extensions: Intrinsics and Directives ................................................................ 61421.5.2 Compiler Support for Single Shared-Memory Abstraction ................................................. 61521.5.3 OpenMP Directives ............................................................................................................ 61521.5.4 Compiler-Controlled Software Cache ................................................................................ 61721.5.5 Compiler and Runtime Support for Code Partitioning ........................................................ 62021.5.6 Thread Library .................................................................................................................... 621

    22. SIMD Programming ............................................................................................... 62322.1 SIMD Basics ................................................................................................................................ 623

    22.1.1 Converting Scalar Data to SIMD Data ............................................................................... 62422.1.2 Approaching SIMD Coding Methodically ........................................................................... 62722.1.3 Coding for Effective Auto-SIMDization ............................................................................... 639

    22.2 Auto-SIMDizing Compilers .......................................................................................................... 64122.2.1 Challenges ......................................................................................................................... 64222.2.2 Examples of Invalid and Valid SIMDization ....................................................................... 644

    22.3 SIMDization Framework for a Compiler ...................................................................................... 64822.3.1 Phase 1: Basic-Block Aggregation ..................................................................................... 65022.3.2 Phase 2: Short-Loop Aggregation ...................................................................................... 65022.3.3 Phase 3: Loop-Level Aggregation ...................................................................................... 65122.3.4 Phase 4: Alignment Devirtualization .................................................................................. 65222.3.5 Phase 5: Length Devirtualization ....................................................................................... 65722.3.6 Phase 6: SIMD Code Generation and Instruction Scheduling ........................................... 658

  • Programming Handbook

    Cell Broadband Engine

    Version 1.12April 3, 2009

    ContentsPage 15 of 876

    22.3.7 SIMDization Example: Multiple Sources of SIMD Parallelism ........................................... 65922.3.8 SIMDization Example: Multiple Data Lengths ................................................................... 66222.3.9 Vector Operations and Mixed-Mode SIMDization ............................................................. 667

    22.4 Other Compiler Optimizations ..................................................................................................... 66822.4.1 OpenMP ............................................................................................................................ 66822.4.2 Subword Data Types ......................................................................................................... 66822.4.3 Backend Scheduling for SPEs ........................................................................................... 66922.4.4 Interacting with Typical Optimizations ............................................................................... 670

    23. Vector/SIMD Multimedia Extension and SPU Programming ............................. 67123.1 Architectural Differences ............................................................................................................. 671

    23.1.1 Registers ........................................................................................................................... 67223.1.2 Data Types ........................................................................................................................ 67323.1.3 Instruction-Set Differences ................................................................................................ 674

    23.2 Porting SIMD Code from the PPE to the SPEs ........................................................................... 67623.2.1 Code-Mapping Considerations .......................................................................................... 67623.2.2 Simple Macro Translation .................................................................................................. 67723.2.3 Full Functional Mapping .................................................................................................... 68023.2.4 Code-Portability Typedefs ................................................................................................. 68123.2.5 Compiler-Target Definition ................................................................................................. 681

    24. SPE Programming Tips ........................................................................................ 68324.1 DMA Transfers ............................................................................................................................ 684

    24.1.1 Initiating DMA Transfers from SPEs .................................................................................. 68424.1.2 Overlapping DMA Transfers and Computation .................................................................. 68424.1.3 DMA Transfers and LS Accesses ...................................................................................... 68924.1.4 Using DMA List Transfers .................................................................................................. 690

    24.2 SPU Pipelines and Dual-Issue Rules .......................................................................................... 69024.3 Eliminating and Predicting Branches .......................................................................................... 691

    24.3.1 Function-Inlining and Loop-Unrolling ................................................................................. 69224.3.2 Predication Using Select-Bits Instruction ........................................................................... 69224.3.3 Branch Hints ...................................................................................................................... 69324.3.4 Program-Based Branch Prediction .................................................................................... 69724.3.5 Profile or Linguistic Branch-Prediction ............................................................................... 69924.3.6 Software Branch-Target Address Cache ........................................................................... 70024.3.7 Using Control Flow to Record Branch History ................................................................... 700

    24.4 Loop Unrolling and Pipelining ..................................................................................................... 70124.5 Offset Pointers ............................................................................................................................ 70424.6 Transformations and Table Lookups ........................................................................................... 704

    24.6.1 The Shuffle-Bytes Instruction ............................................................................................ 70424.6.2 Fast SIMD 8-Bit Table Lookups ......................................................................................... 705

    24.7 Integer Multiplies ......................................................................................................................... 70824.8 Scalar Code ................................................................................................................................ 708

    24.8.1 Scalar Loads and Stores ................................................................................................... 70824.8.2 Promoting Scalar Data Types to Vector Data Types ......................................................... 710

    24.9 Unaligned Loads ......................................................................................................................... 710

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 16 of 876

    Version 1.12April 3, 2009

    Appendix A. PPE Instruction Set and Intrinsics ....................................................... 715A.1 PowerPC Instruction Set ............................................................................................................... 715

    A.1.1 Data Types .......................................................................................................................... 715A.1.2 PPE Instructions .................................................................................................................. 715A.1.3 Microcoded Instructions ....................................................................................................... 725

    A.2 PowerPC Extensions in the PPE .................................................................................................. 732A.2.1 New PowerPC Instructions .................................................................................................. 732A.2.2 Implementation-Dependent Interpretation of PowerPC Instructions ................................... 735A.2.3 Optional PowerPC Instructions Implemented ...................................................................... 738A.2.4 PowerPC Instructions Not Implemented .............................................................................. 739A.2.5 Endian Support .................................................................................................................... 739

    A.3 Vector/SIMD Multimedia Extension Instructions ........................................................................... 740A.3.1 Data Types .......................................................................................................................... 740A.3.2 Vector/SIMD Multimedia Extension Instructions .................................................................. 740A.3.3 Graphics Rounding Mode .................................................................................................... 744

    A.4 C/C++ Language Extensions (Intrinsics) for Vector/SIMD Multimedia Extensions ....................... 746A.4.1 Vector Data Types ............................................................................................................... 746A.4.2 Vector Literals ...................................................................................................................... 747A.4.3 Intrinsics .............................................................................................................................. 748

    A.5 Issue Rules ................................................................................................................................... 752A.6 Pipeline Stages ............................................................................................................................. 754

    A.6.1 Instruction-Unit Pipeline ....................................................................................................... 754A.6.2 Vector/Scalar Unit Issue Queue .......................................................................................... 756A.6.3 Stall and Flush Points .......................................................................................................... 757

    A.7 Compiler Optimizations ................................................................................................................. 759A.7.1 Instruction Arrangement ...................................................................................................... 759A.7.2 Avoiding Slow Instructions and Processor Modes ............................................................... 759A.7.3 Avoiding Dependency Stalls and Flushes ........................................................................... 760A.7.4 General Recommendations ................................................................................................. 762

    Appendix B. SPU Instruction Set and Intrinsics ....................................................... 763B.1 SPU Instruction Set ....................................................................................................................... 763

    B.1.1 Data Types .......................................................................................................................... 763B.1.2 Instructions .......................................................................................................................... 763B.1.3 Fetch and Issue Rules ......................................................................................................... 771B.1.4 Inline Prefetch and Instruction Runout ................................................................................ 775

    B.2 C/C++ Language Extensions (Intrinsics) for SPU Instructions ..................................................... 776B.2.1 Vector Data Types ............................................................................................................... 776B.2.2 Vector Literals ...................................................................................................................... 778B.2.3 Intrinsics .............................................................................................................................. 779B.2.4 Inline Assembly ................................................................................................................... 783B.2.5 Compiler Directives ............................................................................................................. 783

    Appendix C. Performance Monitor Signals .................................................