24
Part 2

Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Embed Size (px)

Citation preview

Page 1: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Part 2

Page 2: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Tanenbaum, Structured Computer Organization, Fifth Edition, (c)

2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

A five-level memory hierarchy.

Note cost vs. size.

Page 3: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

1. All instructions are directly executed by hardware.

2. Maximize the rate at which instructions are issued.

3. Instructions should be easy to decode.4. Only loads and stores should reference

memory.5. Provide many registers.

Page 4: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

1. All instructions are directly executed by hardware.

Eliminate the microcode interpreter

Page 5: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

2. Maximize the rate at which instructions are issued.

If you issue 500 MIPS, you have a 500 MIPS machine.

Parallelism

Page 6: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

3. Instructions should be easy to decode.

Made possible by regular, fixed-length instructions w/ a small number of fields.

Fewer instructions are better.

Fewer instruction formats are better.

Page 7: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

4. Only loads and stores should reference memory.

Memory access takes a long time.

Most instructions should use registers.

Separate ops for load & store. can be done in parallel

Page 8: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

5. Provide many registers.

At least 32!

Time consuming to have to save registers temporarily and reload them later.

Page 9: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Ways to increase speed:

a. increase the clock speed

b. parallelism types:1. processor/core level2. instruction level

Page 10: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Fetching instruction from memory is slow.

So use a Prefetch Buffer = set of registers (memory) containing instructions to be executed.

Fetch and execution can now be done in parallel!

Page 11: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Tanenbaum, Structured Computer Organization, Fifth Edition, (c)

2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

A five-stage pipeline The state of each stage as a function of

time. Nine clock cycles are illustrated.

Page 12: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Latency = time to execute instruction

Bandwidth = MIPS (instructions per second – typically in millions)

Cycle time = time to move through 1 stage of the pipeline = clock rate = clock cycle

Page 13: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps.a. What is the bandwidth in MIPS for a machine

without any pipeline (i.e., without any instruction-level parallelism)?

b. What is the bandwidth in MIPS for a machine with a pipeline?

Page 14: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps.a. What is the bandwidth in MIPS for a machine

without any pipeline (i.e., without any instruction-level parallelism)?

6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst

1 inst/18x10-9 sec = 56 MIPS

ninstructio 1

seconds 1018

stage 1

seconds 103

ninstructio 1

stages 6 99

Page 15: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps.a. What is the bandwidth in MIPS for a machine

without any pipeline (i.e., without any instruction-level parallelism)?

6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst

1 inst/18x10-9 sec = 56 MIPS

b. What is the bandwidth in MIPS for a machine with a pipeline?

Page 16: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps.a. What is the bandwidth in MIPS for a machine

without any pipeline (i.e., without any instruction-level parallelism)?

6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst1 inst/18x10-9 sec = 56 MIPS

b. What is the bandwidth in MIPS for a machine with a pipeline?

3x10-9 sec/inst1 inst/3x10-9 sec = 333 MIPS

Page 17: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Tanenbaum, Structured Computer Organization, Fifth Edition, (c)

2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Dual five-stage pipelines with a common instruction fetch unit.

fetches pairs of instructions

Page 18: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Tanenbaum, Structured Computer Organization, Fifth Edition, (c)

2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other.

How can we insure this?

Page 19: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Tanenbaum, Structured Computer Organization, Fifth Edition, (c)

2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other.

How can we insure this? (1) hardware, (2) compiler

Page 20: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

386 – no pipeline 486 – one pipeline first generation Pentium

two 5-stage pipelines:

1. u pipeline - can execute any instruction

2. v pipeline – limited; only integer instructions or FXCH P4 – 20 stages “The later "Prescott" and "Cedar Mill" Pentium 4 cores (and

their Pentium D derivatives) had a 31-stage pipeline, the longest in mainstream consumer computing.” - http://en.wikipedia.org/wiki/Instruction_pipeline

Nehalem (16 pipeline stages), Enhanced Core, and Sandy Bridge microachitecture (next few slides; see http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf)

Page 21: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory
Page 22: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory
Page 23: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory
Page 24: Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 A five-level memory

Tanenbaum, Structured Computer Organization, Fifth Edition, (c)

2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0

A superscalar processor with five functional units.S3 issued every clock cycle S4 may require more than 1 clock cycle