A bit about computer architecture CS 147, Fall Semester 2007 Robert Correll

Preview:

Citation preview

A bit about computer architecture

CS 147, Fall Semester 2007

Robert Correll

Overview

RISC microprocessor design Diagnostic testing Software development Microprocessor features System-on-Chip (SoC)

RISC microprocessor design

12 members on the team:o Design Manager (1)o ASIC Design Engineers (9)o Diagnostics Manager (1)o Software Engineer (1)

Culture:o High-tech (Verilog)o Very quiet

Embedded 32-bit microprocessor Earns Editor's Choice Award Microprocessor Report Names IDT’s

RC32364 Best Embedded Processor for Price/Performance

(Volume 12, Number 7, June 1, 1998)

Embedded processor-based applications Low-end routers and switches Cellular base stations Consumer multimedia game systems

Device Overview

MIPS-II RISC architecture with enhancements

o Scalar 5-stage pipeline minimizes branch and load delays

o DSP engine capable of doing 1 multiply accumulate instruction every 2 clock cycles

Device Overview (continued)

Enhanced instruction set architectureo MIPS-IV compatible conditional move

instructionso MIPS-IV superset PREF (prefetch) instructiono Fast multiplier with atomic multiply-add, multiply-

subo Count leading zero/one instructions

Device Overview (continued)

Large, efficient on-chip cacheso Separate 8KB Instruction cache and 2KB Data

cacheo 2-way set associativeo Write-back and write-through support on a per

page basiso Optional cache locking, with per line resolution, to

facilitate deterministic responseo Simultaneous instruction and data fetch in each

clock cycle, achieves over l GB/sec bandwidth

Device Overview (continued)

Flexible MMU with 32-page TLBo Variable page sizeo Enhanced write algorithm supporto Variable number of locked entrieso No performance penalty for address translation

Device Overview (continued)

Flexible bus interface allows simple, low-cost designs

o Bus interface runs at a fraction of pipeline rate Programmable port-width interface (8-,16-, 32-bit memory and I/O regions)

o Programmable bus turnaround (BTA) timeso Supports single datum or burst transactionso Selectable system byte-ordering

RC32364 Block Diagram

Diagnostic Testing

Began with 300 tests and behavior model Downloaded 10 to 40 new tests per day One test per directory Build each test Run each test on an RTL model Debug and track failures Finished with more than 3,000 tests

Software Development

Test Release Systemo Automated regression processo Distributed jobs based upon cycle countso Provided customized history reports

Accumulated load per signal utility Test vectors Many other value-added scripts Diagnostic tests

CPU Instruction Set

Load Link Store Conditional Opcodes

li $9, 1 sw $9, 0($6)

.word 0xc0850000 # opcode # ll $5, 0($4) bne $5, $0, Fail # verify sem = 0 li $5, 2 li $9, 2 sw $9, 0($6)

.word 0xe0850000 # opcode # sc $5, 0($4)

bne $5, $8, Fail # verify sc indicates success li $8, 2

CPU Pipeline Architecture

CPU Pipeline Stages

1I - Instruction Fetch, Phase oneo Instruction address translation begins

2I - Instruction Fetch, Phase twoo Instruction cache fetch begins o Instruction address translation continues

CPU Pipeline Stages (continued) 1R - Register Fetch, Phase one

o The instruction cache fetch finishes.o The instruction cache tag is checked against the

physical page frame number obtained from the address translation.

CPU Pipeline Stages (continued) 2R - Register Fetch, Phase two

o The instruction decoder decodes the instruction.o Any required operands are fetched from the

register file.o Make a decision to either issue or slip (for an

interlock condition).o For a branch, the branch address is calculated.

CPU Pipeline Stages (continued) 1A - Execution, Phase one

o Any result from the A or D stages are bypassed.o The arithmetic logic unit (ALU) starts the integer

arithmetic, logical or shift operation.o The ALU calculates the data virtual address for

load and store instructions.o The ALU determines whether the branch

condition is true.

CPU Pipeline Stages (continued) 2A - Execution, Phase two

o The integer arithmetic, logical or shift operation will complete.

o A data cache access will start.o Store data is shifted to the specified byte

position(s).o The data virtual to physical address translation

will start.

CPU Pipeline Stages (continued) 1D - Data Fetch, Phase one

o The data cache access will continue.o The data address translation completes.

2D - Data Fetch, Phase twoo The data cache access will finish and the data is

then shifted down and extended.o The data cache tag is checked against the

physical address for any data cache access.

CPU Pipeline Stages (continued) 1W - Write Back, Phase one

o The processor uses this phase internally to resolve all exceptions in preparation for the register file write.

2W - Write Back, Phase twoo For register-to-register and load instructions, the

result is written back to the register file. o Branch instructions perform no operation during

this stage.

Activities during each ALU pipeline stage...

...for load, store, and branch instructions.

Stall Conditions

Detected after the R pipe-stage. The processor will resolve the condition.

o Detect cache misso Start moving dirty cache line data to write buffero Get first doubleword into cache and restart

pipelineo Load remainder of cache line into cache

Slip Conditions

Slipped instructions are retried on subsequent cycles

o Detect cache misso Get entire cache line into cacheo Continue pipelineo Inserted NOP instructions

Memory Management Unit (MMU) Generates translation lookaside buffer (TLB)

exceptions such as:o TLB refillo TLB invalido TLB modified

Offers the following advantages:o Variable page sizeo Enhanced Write Algorithm supporto Mapping of a larger portion of the virtual address spaceo Variable number of locked entries

32-bit Virtual Address Translation

TLB Register Format

TLB Register Field Descriptions

MMU Register Descriptions

Range of wired and random entries

User Mode Address Space

Kernal Mode Address Space

CPU Exception Processing

Begins when the processor receives and detects exceptions such as:

o address translation errorso arithmetic overflowso I/O interruptso system calls

Processor suspends normal instruction sequence and enters Kernel mode

CPU Exception Processing (continued) Processor then disables interrupts, Forces execution of a software handler,

which is located at a fixed address. The handler may save processor context:

o program counter contentso current operating mode (User or Kernel mode)o interrupt status (enabled or disabled)

Exception Processing Registers...

Basic CP0 Registers

Exception Priority

Cache Organization, Operation, and Coherency

Primary I-Cache Line Format

Primary D-Cache Line Format

Conceptual Primary Cache Lookup Seq.

Primary Cache Data and Tag Organization

Primary Cache States

Clocking, Reset, and Initialization Interfaces

Timing Illustration of MasterClock-to-PClock Multiply by 2

EJTAG (In-circuit Emulator) Interface

EJTAG Block Diagram

System-on-Chip (SoC)

SoC (continued)

SoC (continued)

Summary

RISC microprocessor design Diagnostic testing Software development Microprocessor features System-on-Chip (SoC)

References

IDT™ 79RC32364 RISController™ Advanced Architecture, 32-bit Embedded Microprocessor, User’s Reference Manual, 1999, http://www.idt.com/products/files/10750/79RC32364_MA_38374.pdf?CFID=1729583&CFTOKEN=95787432

IDT™ Interprise™ 79RC32351 Integrated Communications Processor Data Sheet, 2004 http://www.idt.com/products/files/10702/RC32351_DS_23066.pdf?CFID=1729583&CFTOKEN=95787432

References (continued)

IDT™ Interprise™ 79RC32365 Integrated Communications Processors User Reference Manual, 2004, http://www.idt.com/products/files/10712/79RC32365_MA_12022.pdf?CFID=1729583&CFTOKEN=95787432

IDT™ Interprise™ 79RC32435 Integrated Communications Processor Data Sheet, 2006, http://www.idt.com/products/files/571508/32435_ds.pdf?CFID=1729583&CFTOKEN=95787432

A bit about computer architecture

CS 147, Fall Semester 2007

Robert Correll

Recommended