Upload
julius-payne
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Apr 19, 2023 (1)
CSC2510 - Computer Organization
Lecture 6: A Historical Perspective of Pentium IA-32
IA-32 Intel ArchitectureIA-32 Intel Architecture
IA-32 processors
• 386 & 486 processors
• Pentium processors
• P6 family processors (Pentium Pro, Pentium II, Pentium III) : based on the P6 family microarchi-tecture
• Pentium 4 processors, Intel Xeon processors, Pentium D processors, Pentium processor Extreme Editions : based on the Intel NetBurst microarchi-tecture
IA-32 Intel Architecture
• A Brief history of the IA-32 Architecture
• Coming from …16-bit processors• 8086 processors
− 16-bit registers, 16-bit external data bus
− 20-bit addressing 1 MByte address space
• 8088 processors : 8-bit external data bus• 8086/8088 introduced ‘segmentation’ to the IA-32
architecture: four 16-bit segment registers point to memory segments of 64 Kbytes
Internal architecture of 8086
Intel 8085 architecture : 8-bit data, 16-bit address
Intel 286 processor (1982)
Provide two programming modes1) Real mode• functions exactly same as 8086• use only 20 least significant address lines (max. 1 MB)• faster than 8086 due to redesigning and higher clock2)Protected mode• 16 new instructions are added• support multi-program environment by giving each
program a predetermined amount of memory (16 MB)• programs no longer have physical addresses, but are
addressed by a segment selector• Several programs can be loaded into memory at the same
time, but protected from each other
The 8086 and 80286 microprocessors.
John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e
Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Intel 386 processor (1985)
• First 32-bit processor in the IA-32 architecture family
• 32-bit registers used both for holding operands and addressing
• 32-bit address bus that supports up to 4 Gbytes of physical memory
• Segmented-memory model and flat memory model
• Paging (fixed 4-Kbyte page) for virtual memory management
• 386CX, 386DX(with FPU inside)
Internal architecture of 80386
Internal registers of 80386
Intel 486 processor (1989)
• Added more parallel execution by using five-stage pipeline
• 8-Kbyte on-chip first-level cache
• Integrated x87 FPU
• Power saving and system management capabilities
• Includes FPU
Intel Pentium processor (1993)
• Added a second execution pipeline to achieve superscalar performance (u & v pipelines executing two instructions per clock)
• Split on-chip caches (8-KByte code cache and 8-KByte data cache)
• Data cache uses MESI (coherence) protocol • Branch prediction with an on-chip branch table• Internal data path : 128, 256 bits• External data bus : 64 bits• Enhanced by MMX technology that uses SIMD
execution model
FIGURE 3-28 Processor model for the Pentium. The BIU supplies instructions to the CPU via two pipelines called the u and v pipes. In addition, two separate 8K data and code caches are provided.
John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e
Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
The U and V Pipes
U and V pipes : dual five-stage pipelines Prefetcher and queue units provide paired instructions for U and V pipes U pipe : executes all Pentium instructions V pipe : executes only simple integer instructions (data is already in the CPU registers) --- sorting of instructions is performed by the prefetcher
Two pipelines and two ALUs Pentium executes two instructions simultaneously (in one clock cycle).
Condition : two instructions are simple and do not depend on each other – no data dependency.
Superpipelined vs. Superscalar
Superpipelining : divide the instruction execution pipeline into the smaller stages.
[ex] 5-stage pipeline (80486, Pentium) 12-stage (P6 processors)
Superscalar : execute two or more instructions per clock cycle by using multiple execution units (include ALUs).
[ex] Pentium executes two instructions simultaneously = 2-way superscalar
Pentium II, III & Celeron : 3-way superscalar
MMX (Multimedia Extension) : provides 2 architecturalenhancements over non-MMX Pentium
① 57 instructions are added for multimedia (audio, video,and graphic data) applications.
② SIMD(Single-Instruction stream Multiple-Data stream)allows the same operation to be performed on multipledata items. Because many multimedia applications require large blocks of data to be manipulated, SIMD provides a significant performance enhancement.
For general applications, 10~20% performance improved.For multimedia applications, nearly 70% improved.
SIMD Execution Model
P6 family processors (1995-1999)
• Intel Pentium Pro processor– Three-way superscalar : decode, dispatch, and complete
execution (retire) of three instructions per clock cycle on average
– Introduced the dynamic execution (micro-data flow analysis, out-of-order execution, superior branch prediction, and speculative execution) in a superscalar implementation
– Enhanced by caches (two on-chip 8-Kbyte 1st-level cache and 256-Kbyte 2nd-level cache in the same package (two-chips in the same package)
– 36 address lines max. 64 GB memory
FIGURE 1-14 The Pentium Pro is two chips in one. The larger die is the processor, the smaller a 256K L2 cache. (Courtesy of Intel Corporation.)
John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e
Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Dynamic Execution : a new approach to processing S/Winstructions, that reduces idle processor time
• Multiple Branch Prediction : Pentium Pro can look as far as 30 instructions ahead to anticipate conditionalbranches reduce waste of pipeline clocks
• Data Flow Analysis : looks at upcoming S/W instruc-tions for the optimal sequence of processing
• Speculative Execution : allows to execute instructionsin a different order from which they are entered theprocessor = “out-of-order execution”. The result ofthese instructions are stored as speculative resultsuntil their final states can be determined
P6 family processors (cont’d)
• Pentium II processor– Added Intel MMX technology– Processor core is packaged in the single edge contact
cartridge (SECC)– 1st-level(L1) caches are enlarged (16 Kbytes each)– 2nd-level(L2) cache sizes of 256 KB, 512 KB, 1 MB
are supported– A half-clock speed backside bus connects 2nd-level
cache and the processor– Multiple low-power states such as AutoHALT, Stop-
Grant, Sleep, and Deep Sleep are supported to conserve power when being idle
P6 family processors (cont’d)
• Pentium II Xeon processor– Includes 4-way and 8-way, 2 Mbyte 2nd-level
cache running on a dual-clock speed backside bus
• Intel Celeron processor – Focused on the PC market– Pentium II without L2 cache– Use the slot 1 connector without the plastic cover
called “naked CPU”
John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e
Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Celeron Board
P6 family processors (cont’d)
• Celeron A : Includes 128KB L2 cache on the same die with processor.
– Drawback : 66 MHz bus cycle– 370-pin PGA package (called Socket 370)
P6 family processors (cont’d)
• Pentium III processor– Introduced Streaming SIMD Extensions (SSE) :
expand SIMD execution model by providing new set of 128-bit registers and the ability to perform SIMD operations on packed single-precision floating-point values
• Pentium III Xeon processor – Enhanced a full-speed, on-die Advanced Transfer
Cache
John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e
Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458
All rights reserved.
Pentium III with integrated L2 cache (more than 22 million transistors)
2.1.7 Pentium 4 Processor Family (2000-2005)
• Based on Intel NetBurst microarchitecture
• Introduced Streaming SIMD Extentions 2 (SSE2)
• Pentium 4 processor 3.40 GHz supports Hyper Threading Technology and Streaming SIMD Extentions 3 (SSE3)
• Pentium 4 Processor Extreme Edition supports Intel Extended Memory 64 Technology and Hyper-Threading Technology
• Pentium 4 Processor 6xx series supports Intel Extended Memory 64 Technology
Streaming SIMD Extensions 2 (SSE2)
• Horizontal Data Movement in ADDSUBPD
2.1.8 Intel Xeon Processor (2001-2005)
• Based on Intel NetBurst microarchitecture
• As a family, this group of IA-32 processors is designed for use in multiprocessor server systems and high-performance workstations
• Intel Xeon processor MP supports for Hyper-Threading Technology
• 64-bit Intel Xeon processor 3.60 GHz with 800 MHz System Bus introduced Intel Extended Memory 64 Technology
2.1.9 Intel Pentium M Processor (2003-2005)
• Low-power mobile processor family
• Designed for extending battery life and seamless integration
• Its extended microarchitecture includes:
– Support for Dynamic Execution
– Low-power core with copper interconnect
– On-die, primary 32-KB instruction cache and 32-KB write-back data cache, and second-level 2 MB cache with Advanced Transfer Cache Architecture
– Advanced Branch Prediction and Data Prefetch Logic
– Support for MMX tech, Streaming SIMD instructions, and SSE2 instruction set
Intel Pentium Processor Extreme Edition (2005)
• Introduced dual-core technology that provides advanced H/W multi-threading support
• Based on Intel NetBurst microarchitecture
• Supports SSE, SSE2, SSE3, Hyper-Threading Technology, and Intel Extended Memory 64 Technology
The Processor War
Apr 19, 2023 (36)