28
Introduction and Course Introduction and Course Outline Outline Ajit Pal Ajit Pal Professor Professor Department of Computer Science Department of Computer Science and Engineering and Engineering Indian Institute of Indian Institute of Technology Kharagpur Technology Kharagpur INDIA-721302 INDIA-721302 High Performance Computer High Performance Computer Architecture Architecture

Lec 1 Introduction

Embed Size (px)

Citation preview

Page 1: Lec 1 Introduction

Introduction and Course OutlineIntroduction and Course OutlineIntroduction and Course OutlineIntroduction and Course Outline

Ajit PalAjit Pal

ProfessorProfessorDepartment of Computer Science and EngineeringDepartment of Computer Science and EngineeringIndian Institute of Technology KharagpurIndian Institute of Technology KharagpurINDIA-721302INDIA-721302

High Performance Computer High Performance Computer ArchitectureArchitecture

Page 2: Lec 1 Introduction

Outline

Historical Background

Five generations of computers

Elements of modern computers

Instruction Set Architecture

Instruction Set Processor

Moore’s Law

Parallelism at different levels

Objective of the course

Course Outline

Page 3: Lec 1 Introduction

Historical Background

Two major stages of development■ Mechanical; prior to 1945■ Electronic; after 1945

Mechanical■ Abacus; Dates back to 500 BC■ Mechanical adder/subtractor by Blaise Pascal in

France (1642)■ Difference Engine by Charles Babbage for

polynomial evaluation in England (1827)■ Binary mechanical computer by Konard Zuse in

Germany (1941)■ Electromechanical decimal computer by Howard

Aiken (1944) – Harvard Mark 1 by IBM

Page 4: Lec 1 Introduction

Five Generations of Electronic Computers

First Generation (1945-54)■ Used vacuum tubes and relay memories■ Single user sytem using machine/assembly

language■ ENIAC, Princeton IAS, IBM 701

Second Generation (1955-64)■ Used transistors, diodes, magnetic ferrite cores■ HLL with compilers, batch processing■ IBM 7090, CDC 1604, Univac LARC

Third Generation (1965-74)■ Used integrated circuits (SSI and MSI)■ Multiprogramming and time-sharing OS■ IBM 360/370, CDC 6600, TI-ASC, PDP-8

Page 5: Lec 1 Introduction

Fourth Generation (1975-90)■ Used VLSI circuits (LSI and VLSI)■ Multiprocessor OS, HLL, parallel processing■ IBM 3090, VAX 9000, Cray X-MP

Fifth Generation (1991-present)■ Used ULSI circuits (ULSI/VHSIC)■ Massively parallel processing, heterogeneous

processing■ Intel Paragon, Fujitsu VPP500, Cray-MPP

Five Generations of Electronic Computers

Page 6: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Elements of Modern Computers Elements of Modern Computers A modern computer is an integrated A modern computer is an integrated

system consisting of:system consisting of:■ Machine hardware (processor, etc)Machine hardware (processor, etc)■ System softwareSystem software■ Application programsApplication programs

The system architecture is The system architecture is

represented by three nested circlesrepresented by three nested circles The functionality of a processor The functionality of a processor

is characterized by its is characterized by its Instruction SetInstruction Set All the programs that run on a All the programs that run on a

processor are encoded in that processor are encoded in that instruction setinstruction set

The predefined instruction set is called The predefined instruction set is called the the Instruction set Architecture (ISA)Instruction set Architecture (ISA)

Hardw are

SY

S

T

EM S O

FT

W

AR

E

S OF

T

W

A

R

E

C

AT

IO N

AP

PL

I

Page 7: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Instruction Set Processor DesignInstruction Set Processor Design

ISA serves as an ISA serves as an interfaceinterface between the hardware between the hardware and softwareand software

In terms of processor design methodology, an ISA In terms of processor design methodology, an ISA can be considered as the can be considered as the specificationspecification of a design of a design

The specification is the behavioral description of The specification is the behavioral description of ‘what does it do?’‘what does it do?’

The The SynthesisSynthesis step attempts to find an step attempts to find an implementation based on the specificationimplementation based on the specification

The processor is the implementation of the design The processor is the implementation of the design giving giving ‘How is it constructed?’. ‘How is it constructed?’. It is also referred to It is also referred to as as micro-architecture. micro-architecture.

A A realizationrealization of an implementation, a specific of an implementation, a specific physical embodiment of a design (chip), is done physical embodiment of a design (chip), is done using VLSI technologyusing VLSI technology

Page 8: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Architecture Versus OrganizationArchitecture Versus Organization

What is the difference between: What is the difference between: ■ computer architecture and computer organization?computer architecture and computer organization? ■ Architecture:Architecture:■ Also known as Instruction Set Architecture (ISA)Also known as Instruction Set Architecture (ISA)■ Programmer view of a processorProgrammer view of a processor:: instruction set, instruction set,

registers, addressing modes, etc.registers, addressing modes, etc.

Organization:Organization:■ High-level design: how many caches? how many High-level design: how many caches? how many

arithmetic and logic units? What type of pipelining, arithmetic and logic units? What type of pipelining, control design, etc.control design, etc.

■ Sometimes known as micro-architectureSometimes known as micro-architecture

Page 9: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Computer ArchitectureComputer Architecture

The structure of a computer that a machine The structure of a computer that a machine language programmer must understand:language programmer must understand:■ To be able to write a correct program for To be able to write a correct program for

that machine.that machine. A family of computers of the same A family of computers of the same

architecture should be able to run the same architecture should be able to run the same program.program.■ Thus, the notion of architecture leads to Thus, the notion of architecture leads to

“binary compatibility.”“binary compatibility.”

Page 10: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Moore’s LawMoore’s Law

■ Computer performance has been Computer performance has been increasing phenomenally over the last increasing phenomenally over the last five decades.five decades.

■ Brought out by Moore’s Law:Brought out by Moore’s Law:●Transistors per square inch roughly Transistors per square inch roughly

double every eighteen months.double every eighteen months.■ Moore’s law is not exactly a law: Moore’s law is not exactly a law:

●but has held good for nearly 50 years.but has held good for nearly 50 years.

Page 11: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Moore’s LawMoore’s Law

Gordon Moore (co-founder of Intel) predicted in 1965: “Transistor density of minimum cost semiconductor chips would double roughly every 18 months.”

Transistor density is correlated to processing speed.

“Cramming More Components onto Integrated Circuits” in the April 19, 1995 issue of the Electronics Magazine

Page 12: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Moore’s LawMoore’s Law

Page 13: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Interpreting Moore’s LawInterpreting Moore’s Law

Moore's law is not about just the density of Moore's law is not about just the density of transistors on a chip that can be achieved: transistors on a chip that can be achieved: ■ but about the density of transistors at which but about the density of transistors at which

the cost per transistor is the lowest.the cost per transistor is the lowest. As more transistors are made on a chip:As more transistors are made on a chip:

■ the cost to make each transistor reduces.the cost to make each transistor reduces.■ but the chance that the chip will not work due but the chance that the chip will not work due

to a defect rises.to a defect rises. Moore observed in 1965 there is a transistor Moore observed in 1965 there is a transistor

density or complexity:density or complexity:■ at which "a minimum cost" is achieved. at which "a minimum cost" is achieved.

Page 14: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Improving Processor PerformanceImproving Processor Performance

Initial computer performance improvements came from use of: Innovative manufacturing techniques Advancement of VLSI technology

Improvements due to innovations in manufacturing Improvements due to innovations in manufacturing technologies have slowed down since 1980s:technologies have slowed down since 1980s:

■ Smaller feature size gives rise to increased Smaller feature size gives rise to increased resistanceresistance

■ Larger power dissipation Larger power dissipation

(Aside: What is the power consumption of Intel (Aside: What is the power consumption of Intel Pentium Processor? Roughly 100 watts idle)Pentium Processor? Roughly 100 watts idle)

Page 15: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Current Chip Manufacturing ProcessCurrent Chip Manufacturing Process

A decade ago, chips were built using a 500 A decade ago, chips were built using a 500 nm (0.5 micron) process.nm (0.5 micron) process.

In 1971, 10 micron process was used. In 1971, 10 micron process was used. Most PC processors are currently fabricated Most PC processors are currently fabricated

on a 65 nm or smaller process.on a 65 nm or smaller process.

Intel in January 2007 demonstrated a Intel in January 2007 demonstrated a working 45nm chip: working 45nm chip: ■ Intel started mass-producing in late 2007. Intel started mass-producing in late 2007. ■ Compare: the diameter of an atom is of Compare: the diameter of an atom is of

the order of 0.1 nm. the order of 0.1 nm.

Page 16: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Amazing Decades of (Micro)processor EvolutionAmazing Decades of (Micro)processor Evolution

• Processor performance:Processor performance:• Twice as fast after every 2 years (roughly)Twice as fast after every 2 years (roughly)

• Memory capacity:Memory capacity:• Twice as much after every 18 months Twice as much after every 18 months

(roughly)(roughly)

• Mead and Conway:Mead and Conway:• Described a method of creating hardware Described a method of creating hardware

designs by writing software (HDL)designs by writing software (HDL)

1970-1980 1980-1990 1990-2000 2000-2010

Transistor Count 2K-100K 100K-1M 1M-100M 100M-2B

Clock Frequency 0.1-3 MHz 3-30MHz 30MHz-1GHz 1-15GHz

Instructions/Cycle 0.1 0.1-0.9 0.9-1.9 1.9-2.9

Page 17: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

In later years, performance improvement came from:

Exploitation of some form of parallelism Instruction level parallelism (ILP).

Example: Pipelining Dynamic instruction scheduling Out of order execution Superscalar architecture VLIW architecture, etc.

Improving Processor PerformanceImproving Processor Performance

Page 18: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Thread-level Parallelism

Thread-level (Medium grained): Thread-level (Medium grained): different different threads of a process are executed in threads of a process are executed in parallel on a single processor or multiple parallel on a single processor or multiple processors processors

(Simultaneous Multithreading) SMT is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading

Software multithreading on multiple Software multithreading on multiple processors (cores)processors (cores)

Page 19: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Symmetric Multiprocessors (SMPs)

SMPs are a popular shared memory multiprocessor

architecture: Processors share Memory and I/O Bus based: access time for all memory locations is

equal - “Symmetric MP”

P P P P

Cache Cache Cache Cache

Main memory I/O system

Bus

Page 20: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Process-Level parallelism

Process-level (Coarse grained): Process-level (Coarse grained): different different

processes can be executed in parallel on processes can be executed in parallel on

multiple processors (cores).multiple processors (cores).Symmetric multiprocessors (SMP)Symmetric multiprocessors (SMP)Distributed memory multiprocessors Distributed memory multiprocessors

(DSM)(DSM)

Page 21: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

UMA Versus NUMA ComputersUMA Versus NUMA Computers

Cache

P1

Cache

P2

Cache

Pn

Cache

P1

Cache

P2

Cache

Pn

Network

MainMemory

MainMemory

MainMemory

MainMemory

Bus

UMA Model(Symmetric

Multiprocessors)

NUMA Model(Distributed

Memory Multiprocessors)

Page 22: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Course Course ObjectivesObjectives

Modern processors such as Intel Pentium, AMD Modern processors such as Intel Pentium, AMD

Athlon, etc. use:Athlon, etc. use:■ Many Many architectural and organizational architectural and organizational

innovations not covered in a first-level courseinnovations not covered in a first-level course..■ Innovations in memory, bus, and storage Innovations in memory, bus, and storage

designs as well.designs as well.■ Multiprocessors and clustersMultiprocessors and clusters

In this light, objective of this course:In this light, objective of this course:■ Study the architectural and organizational Study the architectural and organizational

innovations used in modern computers.innovations used in modern computers.

Page 23: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Course Outline: Module-1

Review of Basic Organization and

Architectural Techniques■ RISC processors ■ Characteristics of RISC processors■ RISC Vs CISC■ Classification of Instruction Set Architectures■ Review of performance measurements■ Basic parallel processing techniques: instruction

level, thread level and process level■ Classification of parallel architectures

Page 24: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Instruction Level Parallelism■ Basic concepts of pipelining■ Arithmetic pipelines■ Instruction pipelines■ Hazards in a pipelined processors: structural, data,

and control hazards■ Overview of hazard resolution techniques■ Dynamic instruction scheduling ■ Branch prediction techniques■ Instruction-level parallelism using software

approaches■ Superscalar techniques■ Speculative execution■ Review of modern processors

Course Outline: Module-II

Page 25: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Memory Hierarchies

■ Basic concept of hierarchical memory organization■ Main memories■ Cache memory design and implementation ■ Virtual memory design and implementation■ Secondary memory technology■ RAID

Course Outline: Module-III

Page 26: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Thread Level Parallelism ■ Centralized vs. distributed shared memory ■ Interconnection topologies ■ Multiprocessor architecture■ Symmetric multiprocessors■ Cache coherence problem■ Memory consistency ■ Multicore architecture■ Review of modern multiprocessors

Course Outline: Module-IV

Page 27: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Process Level Parallelism■ Distributed memory computers■ Cluster Computing■ Grid Computing■ Cloud computing

Course Outline: Module-V

Page 28: Lec 1 Introduction

Ajit Pal, IIT KharagpurAjit Pal, IIT Kharagpur

Thanks!Thanks!