June 2005Computer Architecture, Background and MotivationSlide 1 1.1 Signals, Logic Operators, and...

June 2005 Computer Architecture, Background and Motivation Slide 1

1.1 Signals, Logic Operators, and Gates

Figure 1.1 Some basic elements of digital logic circuits, with operator signs used in this book highlighted.

AND Name XOR OR NOT

Graphical symbol

Operator sign and alternate(s)

x y x y xy

x x or x

x y or xy Arithmetic expression

x y 2xy x y xy 1 x

Output is 1 iff: Input is 0

Both inputs are 1s

At least one input is 1

Inputs are not equal

Variations in Gate Symbols

Figure 1.2 Gates with more than two inputs and/or with inverted signals at input or output.

OR NOR NAND AND XNOR

Gates as Control Elements

Figure 1.3 An AND gate and a tristate buffer act as controlled switches or valves. An inverting buffer is logically the same as a NOT gate.

Enable/Pass signal e

Data in x

Data out x or 0

Data in x

Enable/Pass signal e

Data out x or “high impedance”

(a) AND gate for controlled transfer (b) Tristate buffer

(c) Model for AND switch.

No data or x

(d) Model for tristate buffer.

Control/Data Signals and Signal Bundles

Figure 1.5 Arrays of logic gates represented by a single gate symbol.

Enable

(b) 32 AND gates (c) k XOR gates (a) 8 NOR gates

Table 1.2 Laws (basic identities) of Boolean algebra.

Name of law OR version AND versionIdentity x 0 = x x 1 = x

One/Zero x 1 = 1 x 0 = 0

Idempotent x x = x x x = x

Inverse x x = 1 x x = 0

Commutative x y = y x x y = y x

Associative (x y) z = x (y z) (x y) z = x (y z)

Distributive x (y z) = (x y) (x z) x (y z) = (x y) (x z)

DeMorgan’s (x y) = x y (x y) = x y

Manipulating Logic Expressions

1.3 Designing Gate Networks

AND-OR, NAND-NAND, OR-AND, NOR-NOR

Logic optimization: cost, speed, power dissipation

(a) AND-OR circuit

(b) Intermediate circuit

(c) NAND-NAND equivalent

Figure 1.6 A two-level AND-OR circuit and two equivalent circuits.

(x y) = x y

1.4 Useful Combinational Parts

High-level building blocks

Much like prefab parts used in building a house

Arithmetic components will be covered in Part III (adders, multipliers, ALUs)

Here we cover three useful parts: multiplexers, decoders/demultiplexers, encoders

Multiplexers

Figure 1.9 Multiplexer (mux), or selector, allows one of several inputs to be selected and routed to output depending on the binary value of a

set of selection or address signals provided to it.

/ 32 1

(a) 2-to-1 mux (b) Switch view (c) Mux symbol

(d) Mux array (e) 4-to-1 mux with enable (e) 4-to-1 mux design

x x x x

e (Enable)

Decoders/Demultiplexers

Figure 1.10 A decoder allows the selection of one of 2a options using an a-bit address as input. A demultiplexer (demux) is a decoder that

only selects an output if its enable signal is asserted.

y 1 y 0

(a) 2-to-4 decoder (b) Decoder symbol (c) Demultiplexer, or decoder with “enable”

(Enable)

Encoders

Figure 1.11 A 2a-to-a encoder outputs an a-bit binary number

equal to the index of the single 1 among its 2a inputs.

(a) 4-to-2 encoder (b) Encoder symbol

y 1 y 0

1.5 Programmable Combinational Parts

Programmable ROM (PROM)

Programmable array logic (PAL)

Programmable logic array (PLA)

A programmable combinational part can do the job of many gates or gate networks

Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)

Figure 1.12 Programmable connections and their use in a PROM.

Inputs

Outputs

(a) Programmable OR gates

(b) Logic equivalent of part a

(c) Programmable read-only memory (PROM)

PALs and PLAs

Figure 1.13 Programmable combinational logic: general structure and two classes known as PAL and PLA devices. Not shown is PROM with

fixed AND array (a decoder) and programmable OR array.

AND array (AND plane)

OR array (OR

plane)

Inputs

Outputs

(a) General programmable combinational logic

(b) PAL: programmable AND array, fixed OR array

8-input ANDs

(c) PLA: programmable AND and OR arrays

6-input ANDs

4-input ORs

1.6 Timing and Circuit Considerations

Gate delay : a fraction of, to a few, nanoseconds

Wire delay, previously negligible, is now important (electronic signals travel about 15 cm per ns)

Circuit simulation to verify function and timing

Changes in gate/circuit output, triggered by changes in its inputs, are not instantaneous

Glitching

Figure 1.14 Timing diagram for a circuit that exhibits glitching.

a = x y

f = a z 2 2

Using the PAL in Fig. 1.13b to implement f = x y z

CMOS Transmission Gates

Figure 1.15 A CMOS transmission gate and its use in building

a 2-to-1 mux.

(a) CMOS transmission gate: circuit and symbol

(b) Two-input mux built of two transmission gates

2 Digital Circuits with Memory

Second of two chapters containing a review of digital design:• Combinational (memoryless) circuits in Chapter 1• Sequential circuits (with memory) in Chapter 2

Topics in This Chapter

2.1 Latches, Flip-Flops, and Registers

2.2 Finite-State Machines

2.3 Designing Sequential Circuits

2.4 Useful Sequential Parts

2.5 Programmable Sequential Parts

2.6 Clocks and Timing of Events

2.1 Latches, Flip-Flops, and Registers

Figure 2.1 Latches, flip-flops, and registers.

(a) SR latch (b) D latch

(e) k -bit register (d) D flip-flop symbol (c) Master-slave D flip-flop

Latches vs Flip-Flops

Figure 2.2 Operations of D latch and negative-edge-triggered D flip-flop.

D latch: Q

D FF: Q

Setup time

Hold time

Reading and Modifying FFs in the Same Cycle

Figure 2.3 Register-to-register operation with edge-triggered flip-flops.

Computation module (combinational logic)

Clock Propagation delay

2.4 Useful Sequential Parts

High-level building blocks

Much like prefab closets used in building a house

Other memory components will be covered in Chapter 17 (SRAM details, DRAM, Flash)

Here we cover three useful parts: shift register, register file (SRAM basics), counter

Shift Register

Figure 2.8 Register with single-bit left shift and parallel load capabilities. For logical left shift, serial data in line is connected to 0.

Parallel data in / k

Serial data in

k – 1 LSBs

Parallel data out

Serial data out MSB

Register File and FIFO

Figure 2.9 Register file with random access and FIFO.

Write enable

Read address 0

Read address 1

Read data 0

Write data

Read enable

2 k -bit registers h / k

Write address

Read data 1

Write enable

Read addr 0

Read addr 1

Write data Write addr

Read data 0

Read enable

Read data 1

(a) Register file with random access

(b) Graphic symbol for register file

Output Pop

(c) FIFO symbol

Figure 2.10 SRAM memory is simply a large, single-port register file.

Column mux

Address

Square or almost square memory matrix

Row buffer

Column

g bits data out

Write enable

Data in

Address

Data out

Output enable

Chip select

(a) SRAM block diagram (b) SRAM read mechanism

Binary Counter

Figure 2.11 Synchronous binary counter with initialization capability.

Count register

Incrementer

IncrInit

1 c in c out

2.5 Programmable Sequential Parts

Programmable array logic (PAL)

Field-programmable gate array (FPGA)

Both types contain macrocells and interconnects

A programmable sequential part contain gates and memory elements

Programmed by cutting existing connections (fuses) or establishing new connections (antifuses)

PAL and FPGA

Figure 2.12 Examples of programmable sequential logic.

(a) Portion of PAL with storable output (b) Generic structure of an FPGA

8-input ANDs

I/O blocks

Configurable logic block

Programmable connections

Binary Counter

Figure 2.11 Synchronous binary counter with initialization capability.

Count register

Incrementer

IncrInit

1 c in c out

2.6 Clocks and Timing of Events

Clock is a periodic signal: clock rate = clock frequencyThe inverse of clock rate is the clock period: 1 GHz 1 ns

Constraint: Clock period tprop + tcomb + tsetup + tskew

Figure 2.13 Determining the required length of the clock period.

Other inputs

Combinational logic

Clock period

FF1 begins to change

FF1 change observed

Must be wide enough to accommodate

worst-case delays

Clock1 Clock2

Synchronization

Figure 2.14 Synchronizers are used to prevent timing problems

arising from untimely changes in asynchronous signals.

Asynch input

Synch version

Asynch input

Synch version

(a) Simple synchronizer (b) Two-FF synchronizer

(c) Input and output waveforms

Level-Sensitive Operation

Figure 2.15 Two-phase clocking with nonoverlapping clock signals.

Combi- national

logic 1 1

Clock period

Other inputs

Combi- national

logic 2

Clocks with nonoverlapping highs

Other inputs

What Is (Computer) Architecture?

Figure 3.2 Like a building architect, whose place at the engineering/arts and goals/means interfaces is seen in this diagram, a

computer architect reconciles many conflicting or competing demands.

Architect Interface

Interface

Arts Engineering

Client’s taste: mood, style, . . .

Client’s requirements: function, cost, . . .

The world of arts: aesthetics, trends, . . .

Construction technology: material, codes, . . .

3.2 Computer Systems and Their Parts

Figure 3.3 The space of computer systems, with what we normally mean by the word “computer” highlighted.

Computer

Analog

Fixed-function Stored-program

Electronic Nonelectronic

General-purpose Special-purpose

Number cruncher Data manipulator

Digital

Price/Performance Pyramid

Figure 3.4 Classifying computers by computational

power and price range.

Embedded Personal

Workstation

Server

Mainframe

Super $Millions $100s Ks

$10s Ks

$1000s

Differences in scale, not in substance

3.3 Generations of ProgressTable 3.2 The 5 generations of digital computers, and their ancestors.

Generation (begun)

Processor technology

Memory innovations

I/O devices introduced

Dominant look & fell

0 (1600s) (Electro-) mechanical

Wheel, card Lever, dial, punched card

Factory equipment

1 (1950s) Vacuum tube Magnetic drum

Paper tape, magnetic tape

Hall-size cabinet

2 (1960s) Transistor Magnetic core Drum, printer, text terminal

Room-size mainframe

3 (1970s) SSI/MSI RAM/ROM chip

Disk, keyboard, video monitor

Desk-size mini

4 (1980s) LSI/VLSI SRAM/DRAM Network, CD, mouse,sound

Desktop/ laptop micro

5 (1990s) ULSI/GSI/ WSI, SOC

SDRAM, flash Sensor/actuator, point/click

Invisible, embedded

Figure 3.10 Trends in processor performance and DRAM memory chip capacity (Moore’s law).

Moore’s Law

1990 1980 2000 2010 kIPS

Calendar year

80286 68000

68040 Pentium

Pentium II R10000

1.6 / yr

10 / 5 yrs 2 / 18 mos

4 / 3 yrs

Processor

Memory

Pitfalls of Computer Technology Forecasting

“DOS addresses only 1 MB of RAM because we cannot imagine any applications needing more.” Microsoft, 1980

“640K ought to be enough for anybody.” Bill Gates, 1981

“Computers in the future may weigh no more than 1.5 tons.” Popular Mechanics

“I think there is a world market for maybe five computers.” Thomas Watson, IBM Chairman, 1943

“There is no reason anyone would want a computer in their home.” Ken Olsen, DEC founder, 1977

“The 32-bit machine would be an overkill for a personal computer.” Sol Libes, ByteLines

Figure 3.14 Models and abstractions in programming.

High- vs Low-Level Programming

temp=v[i] v[i]=v[i+1] v[i+1]=temp

Swap v[i] and v[i+1]

add $2,$5,$5 add $2,$2,$2 add $2,$4,$2 lw $15,0($2) lw $16,4($2) sw $16,0($2) sw $15,4($2) jr $31

00a51020 00421020 00821020 8c620000 8cf20004 acf20000 ac620004 03e00008

Very high-level language objectives or tasks

High-level language statements

Assembly language instructions, mnemonic

Machine language instructions, binary (hex)

One task = many statements

One statement = several instructions

Mostly one-to-one

More abstract, machine-independent; easier to write, read, debug, or maintain

More concrete, machine-specific, error-prone; harder to write, read, debug, or maintain

4 Computer PerformancePerformance is key in design decisions; also cost and power

• It has been a driving force for innovation• Isn’t quite the same as speed (higher clock rate)

Topics in This Chapter

4.1 Cost, Performance, and Cost/Performance

4.2 Defining Computer Performance

4.3 Performance Enhancement and Amdahl’s Law

4.4 Performance Measurement vs Modeling

4.5 Reporting Computer Performance

4.6 The Quest for Higher Performance

4.1 Cost, Performance, and Cost/PerformanceTable 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft or are averages of cited range of values.

Aircraft PassengersRange

(km)Speed (km/h)

Price ($M)

Airbus A310 250 8 300 895 120

Boeing 747 470 6 700 980 200

Boeing 767 250 12 300 885 120

Boeing 777 375 7 450 980 180

Concorde 130 6 400 2 200 350

DC-8-50 145 14 000 875 80

The Vanishing Computer Cost

1980 1960 2000 2020 $1

Calendar year

Figure 4.1 Performance improvement as a function of cost.

Cost/Performance

Performance

Superlinear: economy of scale

Sublinear: diminishing returns

Linear (ideal?)

4.2 Defining Computer Performance

Figure 4.2 Pipeline analogy shows that imbalance between processing power and I/O capabilities leads to a performance bottleneck.

Processing Input Output

CPU-bound task

I/O-bound task

Different Views of performance

Performance from the viewpoint of a passenger: Speed

Note, however, that flight time is but one part of total travel time. Also, if the travel distance exceeds the range of a faster plane, a slower plane may be better due to not needing a refueling stop

Performance from the viewpoint of an airline: Throughput

Measured in passenger-km per hour (relevant if ticket price were proportional to distance traveled, which in reality is not)

Airbus A310 250 895 = 0.224 M passenger-km/hr Boeing 747 470 980 = 0.461 M passenger-km/hr Boeing 767 250 885 = 0.221 M passenger-km/hr Boeing 777 375 980 = 0.368 M passenger-km/hr Concorde 130 2200 = 0.286 M passenger-km/hr DC-8-50 145 875 = 0.127 M passenger-km/hr

Performance from the viewpoint of FAA: Safety

Cost Effectiveness: Cost/Performance

Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft or are averages of cited range of values.

Aircraft Passen-gers

Range (km)

Speed (km/h)

Price ($M)

A310 250 8 300 895 120

B 747 470 6 700 980 200

B 767 250 12 300 885 120

B 777 375 7 450 980 180

Concorde 130 6 400 2 200 350

DC-8-50 145 14 000 875 80

Cost / Performance

Throughput(M P km/hr)

Smallervaluesbetter

Largervaluesbetter

Concepts of Performance and Speedup

Performance = 1 / Execution time is simplified to

Performance = 1 / CPU execution time

(Performance of M1) / (Performance of M2) = Speedup of M1 over M2

= (Execution time of M2) / (Execution time M1)

Terminology: M1 is x times as fast as M2 (e.g., 1.5 times as fast)

M1 is 100(x – 1)% faster than M2 (e.g., 50% faster)

CPU time = Instructions (Cycles per instruction) (Secs per cycle)

= Instructions CPI / (Clock rate)

Instruction count, CPI, and clock rate are not completely independent, so improving one by a given factor may not lead to overall execution time improvement by the same factor.

Figure 4.3 Faster steps do not necessarily mean shorter travel time.

Faster Clock Shorter Running Time

4 steps

Solution

20 steps

0 10 20 30 40 50Enhancement factor (p )

f = 0.1

f = 0.05

f = 0.02

f = 0.01

4.3 Performance Enhancement: Amdahl’s Law

Figure 4.4 Amdahl’s law: speedup achieved if a fraction f of a task is unaffected and the remaining 1 – f part runs p times as fast.

min(p, 1/f)

1f + (1 – f)/p

f = fraction unaffected

p = speedup of the rest

Example 4.1

Amdahl’s Law Used in Design

A processor spends 30% of its time on flp addition, 25% on flp mult, and 10% on flp division. Evaluate the following enhancements, each costing the same to implement:

a. Redesign of the flp adder to make it twice as fast.b. Redesign of the flp multiplier to make it three times as fast.c. Redesign the flp divider to make it 10 times as fast.

Solution

a. Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18b. Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20c. Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10

What if both the adder and the multiplier are redesigned?

4.4 Performance Measurement vs. Modeling

Figure 4.5 Running times of six programs on three machines.

Execution time

Program

A E F B C D

Machine 1

Machine 2

Machine 3

MIPS Rating Can Be Misleading

Example 4.5

Two compilers produce machine code for a program on a machine with two classes of instructions. Here are the number of instructions:

Class CPI Compiler 1 Compiler 2 A 1 600M 400M B 2 400M 400M

a. What are run times of the two programs with a 1 GHz clock?b. Which compiler produces faster code and by what factor? c. Which compiler’s output runs at a higher MIPS rate?

Solution

a. Running time 1 (2) = (600M 1 + 400M 2) / 109 = 1.4 s (1.2 s)

b. Compiler 2’s output runs 1.4 / 1.2 = 1.17 times as fastc. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667)

4.5 Reporting Computer Performance

Table 4.4 Measured or estimated execution times for three programs.

Time on machine X

Time on machine Y

Speedup of Y over X

Program A 20 200 0.1

Program B 1000 100 10.0

Program C 1500 150 10.0

All 3 prog’s 2520 450 5.6

Analogy: If a car is driven to a city 100 km away at 100 km/hr and returns at 50 km/hr, the average speed is not (100 + 50) / 2 but is obtained from the fact that it travels 200 km in 3 hours.

Table 4.4 Measured or estimated execution times for three programs.

Time on machine X

Time on machine Y

Speedup of Y over X

Program A 20 200 0.1

Program B 1000 100 10.0

Program C 1500 150 10.0

Geometric mean does not yield a measure of overall speedup, but provides an indicator that at least moves in the right direction

Comparing the Overall Performance

Speedup of X over Y

Arithmetic meanGeometric mean

6.72.15

3.40.46

4.6 The Quest for Higher PerformanceState of available computing power ca. the early 2000s:

Gigaflops on the desktopTeraflops in the supercomputer centerPetaflops on the drawing board

Note on terminology (see Table 3.1)

Prefixes for large units:Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015

For memory:K = 210 = 1024, M = 220, G = 230, T = 240, P = 250

Prefixes for small units:micro = 106, nano = 109, pico = 1012, femto = 1015

Figure 4.7 Exponential growth of supercomputer performance.

Supercom-puters

1990 1980 2000 2010

Calendar year

Cray X-MP

MFLOPS

GFLOPS

TFLOPS

PFLOPS

Vector supercomputers

$240M MPPs

$30M MPPs

Massively parallel processors

Figure 4.8 Milestones in the DOE’s Accelerated Strategic Computing Initiative (ASCI) program with extrapolation up to the PFLOPS level.

The Most Powerful Computers

2000 1995 2005 2010

Calendar year

ASCI Red

ASCI Blue

ASCI White

1+ TFLOPS, 0.5 TB

3+ TFLOPS, 1.5 TB

10+ TFLOPS, 5 TB

30+ TFLOPS, 10 TB

100+ TFLOPS, 20 TB

1000 Plan Develop Use

ASCI Purple

ASCI Q

Figure 25.1 Trend in energy consumption per MIPS of computational power in general-purpose processors and DSPs.

Performance is Important, But It Isn’t Everything

1990 1980 2000 2010 kIPS

Calendar year

Absolute processor

performance

GP processor performance

per Watt

DSP performance per Watt

June 2005Computer Architecture, Background and MotivationSlide 1 1.1 Signals, Logic Operators, and...

Documents

Logic Puzzles and Modal Logic. Closure properties in modal logic

Basic Logic Circuits Complete Logic Family Other Logic Styles

Logic - DHBW Stuttgarthladik/Logik/2015/logic - overlays.pdf · Logic, logic, logic... Why do computer scientists need logic? Mr Spock says: Logic is the beginning of wisdom, not

2Basic Ladder Logic Programming - Weebly€¦ · 2Basic Ladder Logic Programming Chapter Topics: • Basic ladder logic symbols • Ladder logic diagram • Ladder logic evaluation

Propositional Logic and Pridicate logic

June 2005Computer Architecture, Instruction-Set ArchitectureSlide 1 Part II Instruction-Set Architecture

Prop Logic 2-Logic

Logic Agents and Propositional Logic

Logic, Logic, Logic · Logic, Logic, Logic: Thinking about Thinking. Interesting Quotes on Logic: “Logic is the technique by which we add conviction to truth.” ~ Jean de la Bruyere

Fixpoint Logic vs. Infinitary Logic in Finite-Model Theoryvardi/papers/lics92rj.pdf · 2015-01-14 · Fixpoint Logic vs. Infinitary Logic ... fixpoint logic, and first-order logic

3 Logic gates and logic circuits...3 Logic gates and logic circuits. 3 Logic gates and logic circuits. In this chapter you will learn about: • logic gates • truth tables • logic

Logic Synthesis Outline –Logic Synthesis Problem –Logic Specification –Two-Level Logic Optimization Goal –Understand logic synthesis problem –Understand

LOGIC 9 / 2 — HI-8 LOGIC PRO x (9-16) (1-8) Logic Pro • : Logic' … · 2019-03-06 · LOGIC 9 / 2 — HI-8 LOGIC PRO x (9-16) (1-8) Logic Pro • : Logic' 74 a . Created Date:

Oct. 2005Computer Arithmetic, Implementation TopicsSlide 1 Part VII Implementation Topics

LOGIC DIAGRAM (POSITIVE LOGIC)

Internal Logic Controller M-Logic

July 2005Computer Architecture, The Arithmetic/Logic UnitSlide 1 Part III The Arithmetic/Logic Unit

Prop Logic 1-Logic

July 2005Computer Architecture, Data Path and ControlSlide 1 Part IV Data Path and Control

Modal Logic, Epistemic Logic