Click here to load reader

Intel Pentium Processor ... Intel Pentium 4 Processor Optimization Contents vi Use of the lea Instruction..... 2-53 Use of the inc and dec Instructions..... 2-54 Use of the shift and

  • View
    4

  • Download
    0

Embed Size (px)

Text of Intel Pentium Processor ... Intel Pentium 4 Processor Optimization Contents vi Use of the lea...

  • Intel® Pentium® 4 Processor Optimization

    Reference Manual

    Copyright © 1999-2001 Intel Corporation All Rights Reserved Issued in U.S.A. Order Number: 248966

    World Wide Web: http://developer.intel.com

    http://developer.intel.com

  • ii

    Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellec- tual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.

    This Intel Pentium 4 Processor Optimization Reference Manual as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in associa- tion with this document.

    Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.

    Intel may make changes to specifications and product descriptions at any time, without notice.

    Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

    The Pentium 4 processor may contain design defects or errors known as errata which may cause the product to deviate from published spec- ifications. Current characterized errata are available on request.

    * Third-party brands and names are the property of their respective owners.

    Copyright © Intel Corporation 1999-2001.

  • iii

    Contents

    Introduction About This Manual ............................................................................... xxviii Related Documentation......................................................................... xxix Notational Conventions .......................................................................... xxx

    Chapter 1 Intel® Pentium® 4 Processor Overview SIMD Technology and Streaming SIMD Extensions 2 ........................... 1-2

    Summary of SIMD Technologies ....................................................... 1-4 MMX Technology .......................................................................... 1-4 Streaming SIMD Extensions ......................................................... 1-5 Streaming SIMD Extensions 2 ...................................................... 1-5

    Intel® NetBurst™ Micro-architecture...................................................... 1-6 The Design Considerations of the Intel NetBurst

    Micro-architecture............................................................................ 1-7 Overview of the Intel NetBurst Micro-architecture Pipeline ............... 1-8

    The Front End ............................................................................... 1-9 The Out-of-order Core ................................................................ 1-10 Retirement ................................................................................... 1-11

    Front End Pipeline Detail ................................................................ 1-12 Prefetching.................................................................................. 1-12 Decoder ...................................................................................... 1-12 Execution Trace Cache............................................................... 1-13 Branch Prediction........................................................................ 1-13 Branch Hints ............................................................................... 1-15

    Execution Core Detail ...................................................................... 1-15

  • Intel Pentium 4 Processor Optimization Contents

    iv

    Instruction Latency and Throughput............................................ 1-16 Execution Units and Issue Ports ................................................. 1-17 Caches ........................................................................................ 1-18 Data Prefetch .............................................................................. 1-19 Loads and Stores ........................................................................ 1-21 Store Forwarding......................................................................... 1-22

    Chapter 2 General Optimization Guidelines Tuning to Achieve Optimum Performance.............................................. 2-1 Tuning to Prevent Known Coding Pitfalls ............................................... 2-2 General Practices and Coding Guidelines.............................................. 2-3

    Use Available Performance Tools ...................................................... 2-3 Optimize Performance Across Processor Generations ..................... 2-4 Optimize Branch Predictability ........................................................... 2-4 Optimize Memory Access .................................................................. 2-4 Optimize Floating-point Performance ................................................ 2-5 Optimize Instruction Selection ........................................................... 2-5 Optimize Instruction Scheduling ........................................................ 2-6 Enable Vectorization .......................................................................... 2-6

    Coding Rules, Suggestions and Tuning Hints ........................................ 2-6 Performance Tools.................................................................................. 2-7

    Intel® C++ Compiler .......................................................................... 2-7 General Compiler Recommendations................................................ 2-8 VTune™ Performance Analyzer ........................................................ 2-9

    Processor Generations Perspective ....................................................... 2-9 The CPUID Dispatch Strategy and Compatible Code Strategy....... 2-11

    Branch Prediction ................................................................................. 2-12 Eliminating Branches ....................................................................... 2-12 Spin-Wait and Idle Loops................................................................. 2-15 Static Prediction............................................................................... 2-15 Branch Hints .................................................................................... 2-17 Inlining, Calls and Returns............................................................... 2-18 Branch Type Selection..................................................................... 2-19

  • Intel Pentium 4 Processor Optimization Contents

    v

    Loop Unrolling.................................................................................. 2-20 Compiler Support for Branch Prediction .......................................... 2-21

    Memory Accesses ................................................................................ 2-22 Alignment ......................................................................................... 2-22 Store Forwarding ............................................................................. 2-25

    Store-forwarding Restriction on Size and Alignment................... 2-26 Store-forwarding Restriction on Data Availability ........................ 2-30

    Data Layout Optimizations............................................................... 2-31 Stack Alignment ............................................................................... 2-34 Aliasing Cases ................................................................................. 2-35 Mixing Code and Data ..................................................................... 2-36 Write Combining .............................................................................. 2-37 Locality Enhancement ..................................................................... 2-38 Prefetching....................................................................................... 2-39

    Hardware Instruction Fetching .................................................... 2-39 Software and Hardware Cache Line Fetching ............................ 2-39

    Cacheability instructions .................................................................. 2-40 Code ................................................................................................ 2-40

    Improving the Performance of Floating-point Applications ................... 2-41 Guidelines for Optimizing Floating-point Code ................................ 2-41 Floating-point Modes and Exceptions........................................

Search related