36
Jun 14, 202 2 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Embed Size (px)

Citation preview

Page 1: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Apr 19, 2023 (1)

CSC2510 - Computer Organization

Lecture 6: A Historical Perspective of Pentium IA-32

Page 2: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

IA-32 Intel ArchitectureIA-32 Intel Architecture

Page 3: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

IA-32 processors

• 386 & 486 processors

• Pentium processors

• P6 family processors (Pentium Pro, Pentium II, Pentium III) : based on the P6 family microarchi-tecture

• Pentium 4 processors, Intel Xeon processors, Pentium D processors, Pentium processor Extreme Editions : based on the Intel NetBurst microarchi-tecture

Page 4: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

IA-32 Intel Architecture

• A Brief history of the IA-32 Architecture

• Coming from …16-bit processors• 8086 processors

− 16-bit registers, 16-bit external data bus

− 20-bit addressing 1 MByte address space

• 8088 processors : 8-bit external data bus• 8086/8088 introduced ‘segmentation’ to the IA-32

architecture: four 16-bit segment registers point to memory segments of 64 Kbytes

Page 5: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Internal architecture of 8086

Page 6: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Intel 8085 architecture : 8-bit data, 16-bit address

Page 7: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Intel 286 processor (1982)

Provide two programming modes1) Real mode• functions exactly same as 8086• use only 20 least significant address lines (max. 1 MB)• faster than 8086 due to redesigning and higher clock2)Protected mode• 16 new instructions are added• support multi-program environment by giving each

program a predetermined amount of memory (16 MB)• programs no longer have physical addresses, but are

addressed by a segment selector• Several programs can be loaded into memory at the same

time, but protected from each other

Page 8: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

The 8086 and 80286 microprocessors.

John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e

Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Page 9: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Intel 386 processor (1985)

• First 32-bit processor in the IA-32 architecture family

• 32-bit registers used both for holding operands and addressing

• 32-bit address bus that supports up to 4 Gbytes of physical memory

• Segmented-memory model and flat memory model

• Paging (fixed 4-Kbyte page) for virtual memory management

• 386CX, 386DX(with FPU inside)

Page 10: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Internal architecture of 80386

Page 11: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Internal registers of 80386

Page 12: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Intel 486 processor (1989)

• Added more parallel execution by using five-stage pipeline

• 8-Kbyte on-chip first-level cache

• Integrated x87 FPU

• Power saving and system management capabilities

• Includes FPU

Page 13: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Intel Pentium processor (1993)

• Added a second execution pipeline to achieve superscalar performance (u & v pipelines executing two instructions per clock)

• Split on-chip caches (8-KByte code cache and 8-KByte data cache)

• Data cache uses MESI (coherence) protocol • Branch prediction with an on-chip branch table• Internal data path : 128, 256 bits• External data bus : 64 bits• Enhanced by MMX technology that uses SIMD

execution model

Page 14: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

FIGURE 3-28 Processor model for the Pentium. The BIU supplies instructions to the CPU via two pipelines called the u and v pipes. In addition, two separate 8K data and code caches are provided.

John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e

Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Page 15: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32
Page 16: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

The U and V Pipes

U and V pipes : dual five-stage pipelines Prefetcher and queue units provide paired instructions for U and V pipes U pipe : executes all Pentium instructions V pipe : executes only simple integer instructions (data is already in the CPU registers) --- sorting of instructions is performed by the prefetcher

Two pipelines and two ALUs Pentium executes two instructions simultaneously (in one clock cycle).

Condition : two instructions are simple and do not depend on each other – no data dependency.

Page 17: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Superpipelined vs. Superscalar

Superpipelining : divide the instruction execution pipeline into the smaller stages.

[ex] 5-stage pipeline (80486, Pentium) 12-stage (P6 processors)

Superscalar : execute two or more instructions per clock cycle by using multiple execution units (include ALUs).

[ex] Pentium executes two instructions simultaneously = 2-way superscalar

Pentium II, III & Celeron : 3-way superscalar

Page 18: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

MMX (Multimedia Extension) : provides 2 architecturalenhancements over non-MMX Pentium

① 57 instructions are added for multimedia (audio, video,and graphic data) applications.

② SIMD(Single-Instruction stream Multiple-Data stream)allows the same operation to be performed on multipledata items. Because many multimedia applications require large blocks of data to be manipulated, SIMD provides a significant performance enhancement.

For general applications, 10~20% performance improved.For multimedia applications, nearly 70% improved.

Page 19: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

SIMD Execution Model

Page 20: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

P6 family processors (1995-1999)

• Intel Pentium Pro processor– Three-way superscalar : decode, dispatch, and complete

execution (retire) of three instructions per clock cycle on average

– Introduced the dynamic execution (micro-data flow analysis, out-of-order execution, superior branch prediction, and speculative execution) in a superscalar implementation

– Enhanced by caches (two on-chip 8-Kbyte 1st-level cache and 256-Kbyte 2nd-level cache in the same package (two-chips in the same package)

– 36 address lines max. 64 GB memory

Page 21: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

FIGURE 1-14 The Pentium Pro is two chips in one. The larger die is the processor, the smaller a 256K L2 cache. (Courtesy of Intel Corporation.)

John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e

Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Page 22: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Dynamic Execution : a new approach to processing S/Winstructions, that reduces idle processor time

• Multiple Branch Prediction : Pentium Pro can look as far as 30 instructions ahead to anticipate conditionalbranches reduce waste of pipeline clocks

• Data Flow Analysis : looks at upcoming S/W instruc-tions for the optimal sequence of processing

• Speculative Execution : allows to execute instructionsin a different order from which they are entered theprocessor = “out-of-order execution”. The result ofthese instructions are stored as speculative resultsuntil their final states can be determined

Page 23: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

P6 family processors (cont’d)

• Pentium II processor– Added Intel MMX technology– Processor core is packaged in the single edge contact

cartridge (SECC)– 1st-level(L1) caches are enlarged (16 Kbytes each)– 2nd-level(L2) cache sizes of 256 KB, 512 KB, 1 MB

are supported– A half-clock speed backside bus connects 2nd-level

cache and the processor– Multiple low-power states such as AutoHALT, Stop-

Grant, Sleep, and Deep Sleep are supported to conserve power when being idle

Page 24: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

P6 family processors (cont’d)

• Pentium II Xeon processor– Includes 4-way and 8-way, 2 Mbyte 2nd-level

cache running on a dual-clock speed backside bus

• Intel Celeron processor – Focused on the PC market– Pentium II without L2 cache– Use the slot 1 connector without the plastic cover

called “naked CPU”

Page 25: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e

Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Celeron Board

Page 26: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

P6 family processors (cont’d)

• Celeron A : Includes 128KB L2 cache on the same die with processor.

– Drawback : 66 MHz bus cycle– 370-pin PGA package (called Socket 370)

Page 27: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

P6 family processors (cont’d)

• Pentium III processor– Introduced Streaming SIMD Extensions (SSE) :

expand SIMD execution model by providing new set of 128-bit registers and the ability to perform SIMD operations on packed single-precision floating-point values

• Pentium III Xeon processor – Enhanced a full-speed, on-die Advanced Transfer

Cache

Page 28: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

John UffenbeckThe 80x86 Family: Design, Programming, and Interfacing, 3e

Copyright ©2002 by Pearson Education, Inc.Upper Saddle River, New Jersey 07458

All rights reserved.

Pentium III with integrated L2 cache (more than 22 million transistors)

Page 29: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

2.1.7 Pentium 4 Processor Family (2000-2005)

• Based on Intel NetBurst microarchitecture

• Introduced Streaming SIMD Extentions 2 (SSE2)

• Pentium 4 processor 3.40 GHz supports Hyper Threading Technology and Streaming SIMD Extentions 3 (SSE3)

• Pentium 4 Processor Extreme Edition supports Intel Extended Memory 64 Technology and Hyper-Threading Technology

• Pentium 4 Processor 6xx series supports Intel Extended Memory 64 Technology

Page 30: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Streaming SIMD Extensions 2 (SSE2)

Page 31: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32
Page 32: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

• Horizontal Data Movement in ADDSUBPD

Page 33: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

2.1.8 Intel Xeon Processor (2001-2005)

• Based on Intel NetBurst microarchitecture

• As a family, this group of IA-32 processors is designed for use in multiprocessor server systems and high-performance workstations

• Intel Xeon processor MP supports for Hyper-Threading Technology

• 64-bit Intel Xeon processor 3.60 GHz with 800 MHz System Bus introduced Intel Extended Memory 64 Technology

Page 34: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

2.1.9 Intel Pentium M Processor (2003-2005)

• Low-power mobile processor family

• Designed for extending battery life and seamless integration

• Its extended microarchitecture includes:

– Support for Dynamic Execution

– Low-power core with copper interconnect

– On-die, primary 32-KB instruction cache and 32-KB write-back data cache, and second-level 2 MB cache with Advanced Transfer Cache Architecture

– Advanced Branch Prediction and Data Prefetch Logic

– Support for MMX tech, Streaming SIMD instructions, and SSE2 instruction set

Page 35: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

Intel Pentium Processor Extreme Edition (2005)

• Introduced dual-core technology that provides advanced H/W multi-threading support

• Based on Intel NetBurst microarchitecture

• Supports SSE, SSE2, SSE3, Hyper-Threading Technology, and Intel Extended Memory 64 Technology

Page 36: 7-Aug-15 (1) CSC2510 - Computer Organization Lecture 6: A Historical Perspective of Pentium IA-32

The Processor War

Apr 19, 2023 (36)