BL Eloadas1 2prez

Embed Size (px)

Citation preview

  • 8/13/2019 BL Eloadas1 2prez

    1/33

    Page 1

    1

    Begyazott processzor architektrk

    teljestmny-, kltsg-s

    energiahatkonysgi analzise

    2

    Architektra tmakrk

    Instruction Set Architecture

    Csvezetkezs, llsok kezelse,Szuperskalris md, ttemezs,Becsls, Spekulatv dnts,

    Vektorizls, VLIW, DSP, jrakonfigurci

    Cmzs,Vdelmi mechanizmusok,Kivtelek kezelse

    L1 Cache

    L2 Cache

    DRAM

    Lemezek, WORM, Szalag

    Koherencia,Svszlessg,Lappangs

    jszer technolgiksszefzs

    Snprotokollok

    RAID

    VLSI

    Ki/Bementek s Trolk

    MemriaHierarchia

    Csvezetkests sUtasts Szint Prhuzamosts

  • 8/13/2019 BL Eloadas1 2prez

    2/33

    Page 2

    3

    Architektra tmakrk

    M

    sszekttetsi hlzatS

    PMPMPMP

    Topolgik,Routing,

    Svszlessg,Lappangsi idk,Megbzhatsg

    Hlzati illesztk

    Osztott Memria,zenetkzvetts,Adatprhuzamossg

    Processzor-Memria-Switch

    Multiprocesszorok

    Hlzat s csatlakoztats

    4

    A sikeres Architektra-tervezs titka:Mrs s kirtkels

    Design

    Analysis

    Az architektra tervezs egy iteratv folyamat: Keress a lehetsges tervek terben A begyazott rendszerek minden szintjnek elemzse

    Kreativits

    J tletek

    tlagos tletek

    Rossz tletek

    Kltsg/TeljestmnyAnalzis

  • 8/13/2019 BL Eloadas1 2prez

    3/33

    Page 3

    5

    Tervezsi mdszertan

    j tervekszimulcija

    TechnolgiaTrendek

    Szk keresztmetszetekAzonostsa a ltez

    rendszerekben

    Benchmark

    tesztek

    Feladatok

    j genercisRendszerekmegvalstsa

    Megvalstsi

    komplexits Analzis

    Tervezs

    Imple-

    mentci

    6

    Mrsi eszkzk

    Hardware: Kltsg, ksleltets, erforrsok,teljestmny becsls

    Benchmark tesztek, Trace-ek (vgrehatjs kvets)

    Szimulci (sok szint) ISA, RTL, Kapu, ramkr

    temezsi elmlet (Queuing)

    Rules of Thumb

    Alapvet Trvnyek/Elvek

  • 8/13/2019 BL Eloadas1 2prez

    4/33

    Page 4

    7

    Teljestmny, kltsg, energia

    8

    1. Metrika : Teljestmny

    Time to run the task

    Execution time, response time, latency

    Tasks per day, hour, week, sec, ns Throughput, bandwidth

    Plane

    Boeing 747

    Concorde

    Speed

    610 mph

    1350 mph

    DC to Paris

    6.5 hours

    3 hours

    Passengers

    470

    132

    Throughput

    286,700

    178,200

    In passenger-mile/hour

  • 8/13/2019 BL Eloadas1 2prez

    5/33

  • 8/13/2019 BL Eloadas1 2prez

    6/33

    Page 6

    11

    Example: Calculating CPI

    Typical Mix

    Base Machine (Reg / Reg)

    Op Freq CPIi CPIi*Fi (% Time)

    ALU 50% 1 .5 (33%)

    Load 20% 2 .4 (27%)

    Store 10% 2 .2 (13%)

    Branch 20% 2 .4 (27%)

    1.5

    12

    How to Summarize Performance

    Arithmetic mean (weighted arithmetic mean)tracks execution time: (Ti)/n or (Wi*Ti)

    Harmonic mean (weighted harmonic mean) of

    rates (e.g., MFLOPS) tracks execution time:n/ (1/Ri) or n/(Wi/Ri) Normalized execution time is handy for scaling

    performance (e.g., X times faster thanSPARCstation 10) Arithmetic mean impacted by choice of reference machine

    Use the geometric mean for comparison:(Ti)^1/n Independent of chosen machine

    but not good metric for total execution time

  • 8/13/2019 BL Eloadas1 2prez

    7/33

  • 8/13/2019 BL Eloadas1 2prez

    8/33

  • 8/13/2019 BL Eloadas1 2prez

    9/33

    Page 9

    17

    Instruction Set Architecture (ISA)

    instruction set

    software

    hardware

    18

    Evolution of Instruction Sets

    Major advances in computer architecture aretypically associated with landmark instruction

    set designs Ex: Stack vs GPR (System 360)

    Design decisions must take into account: technology

    machine organization

    programming languages

    compiler technology

    operating systems

    applications

    And they in turn influence these

  • 8/13/2019 BL Eloadas1 2prez

    10/33

    Page 10

    19

    A "Typical" RISC

    32-bit fixed format instruction (3 formats I,R,J)

    32 32-bit GPR (R0 contains zero, DP take pair)

    3-address, reg-reg arithmetic instruction

    Single address mode for load/store:base + displacement no indirection

    Simple branch conditions (based on register values)

    Delayed branch

    see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC,CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

    20

    Example: MIPS ( DLX)

    Op

    31 26 01516202125

    Rs1 Rd immediate

    Op

    31 26 025

    Op

    31 26 01516202125

    Rs1 Rs2

    target

    Rd Opx

    Register-Register

    561011

    Register-Immediate

    Op

    31 26 01516202125

    Rs1 Rs2/Opx immediate

    Branch

    Jump / Call

  • 8/13/2019 BL Eloadas1 2prez

    11/33

    Page 11

    21

    Pipelining Lessons Pipelining doesnt help

    latency of single task, ithelps throughput ofentire workload

    Pipeline rate limited byslowest pipeline stage

    Multiple tasks operatingsimultaneously

    Potential speedup =Number pipe stages

    Unbalanced lengths ofpipe stages reducesspeedup

    Time to fill pipeline andtime to drain it reducesspeedup

    A

    B

    C

    D

    6 PM 7 8 9

    T

    a

    s

    k

    O

    r

    de

    r

    Time

    30 40 40 40 40 20

    22

    5 Steps of DLX Datapath

    MemoryAccess

    WriteBack

    InstructionFetch

    Instr. DecodeReg. Fetch

    ExecuteAddr. Calc

    LMD

    ALU

    MUX

    Memory

    RegFile

    MUX

    MUX

    Data

    Memory

    MUX

    SignExtend

    4

    Add

    erZero?

    Next SEQ PC

    Address

    Next PC

    WB Data

    Inst

    RD

    RS1

    RS2

    Imm

  • 8/13/2019 BL Eloadas1 2prez

    12/33

  • 8/13/2019 BL Eloadas1 2prez

    13/33

  • 8/13/2019 BL Eloadas1 2prez

    14/33

  • 8/13/2019 BL Eloadas1 2prez

    15/33

  • 8/13/2019 BL Eloadas1 2prez

    16/33

    Page 16

    31

    Data Hazard Even with Forwarding

    Time (clock cycles)

    or r8,r1,r9

    Instr.

    Order

    lw r1, 0(r2)

    sub r4,r1,r6

    and r6,r1,r7

    RegALU

    DMemIfetch Reg

    RegIfetchALU

    DMem RegBubble

    IfetchA

    LU

    DMem RegBubble Reg

    IfetchALU

    DMemBubble Reg

    32

    Try producing fast code for

    a = b + c;

    d = e f;

    assuming a, b, c, d ,e, and f in memory.Slow code:

    LW Rb,b

    LW Rc,c

    ADD Ra,Rb,Rc

    SW a,Ra

    LW Re,e

    LW Rf,f

    SUB Rd,Re,Rf

    SW d,Rd

    Software Scheduling to Avoid LoadHazards

    Fast code:

    LW Rb,b

    LW Rc,c

    LW Re,e

    ADD Ra,Rb,Rc

    LW Rf,f

    SW a,Ra

    SUB Rd,Re,Rf

    SW d,Rd

  • 8/13/2019 BL Eloadas1 2prez

    17/33

    Page 17

    33

    Control Hazard on BranchesThree Stage Stall

    10: beq r1,r3,36

    14: and r2,r3,r5

    18: or r6,r1,r7

    22: add r8,r1,r9

    36: xor r10,r1,r11

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    RegALU

    DMemIfetch Reg

    34

    Branch Stall Impact

    If CPI = 1, 30% branch,Stall 3 cycles => new CPI = 1.9!

    Two part solution: Determine branch taken or not sooner, AND

    Compute taken branch address earlier

    DLX branch tests if register = 0 or 0 DLX Solution:

    Move Zero test to ID/RF stage

    Adder to calculate new PC in ID/RF stage

    1 clock cycle penalty for branch versus 3

  • 8/13/2019 BL Eloadas1 2prez

    18/33

  • 8/13/2019 BL Eloadas1 2prez

    19/33

    Page 19

    37

    Delayed Branch

    Where to get instructions to fill branch delay slot? Before branch instruction

    From the target address: only valuable when branch taken

    From fall through: only valuable when branch not taken

    Cancelling branches allow more slots to be filled

    Compiler effectiveness for single branch delay slot: Fills about 60% of branch delay slots

    About 80% of instructions executed in branch delay slots usefulin computation

    About 50% (60% x 80%) of slots usefully filled

    Delayed Branch downside: 7-8 stage pipelines,multiple instructions issued per clock (superscalar)

    38

    Evaluating Branch Alternatives

    Schedu ling Branch CPI speedup v. speedup v.

    scheme penalty unp ipelined s tal l

    Stall pipeline 3 1.42 3.5 1.0

    Predict taken 1 1.14 4.4 1.26

    Predict not taken 1 1.09 4.5 1.29

    Delayed branch 0.5 1.07 4.6 1.31

    Conditional & Unconditional = 14%, 65% change PC

    Pipeline speedup = Pipeline depth1 +Branch frequencyBranch penalty

  • 8/13/2019 BL Eloadas1 2prez

    20/33

    Page 20

    39

    sszefoglagl 2

    Just overlap tasks; easy if tasks are independent

    Speed Up Pipeline Depth; if ideal CPI is 1, then:

    Hazards limit performance on computers: Structural: need more HW resources

    Data (RAW,WAR,WAW): need forwarding, compiler scheduling

    Control: delayed branch, prediction

    pipelined

    dunpipeline

    TimeCycle

    TimeCycle

    CPIstallPipeline1

    depthPipelineSpeedup

    40

    Power PC

    Architecture

  • 8/13/2019 BL Eloadas1 2prez

    21/33

    Page 21

    41

    Introduction

    o PowerPC (Performance Opt imizat ion WithEnhanced RISC Performance Comput ing) isa RISC architecture created by (AIM) AppleIBMMotorola alliance in 1991.

    o The original idea for the PowerPCarchitecture came from IBMs Power

    archi tecture (introdu ced in th e Risc/6000) andretains a high level of compatibility with it.

    o The intention was to build a high-performance, superscalar low-cost processor.

    42

    History

    o The history of the PowerPC began with IBM's 801prototype chip of John Cocke s(IBM Watson ResearchLab) RISC ideas in the late 1970s (with further

    refinements developed by David Paterson).o 801-based cores were used in a number of IBM

    embedded products, eventually becoming the 16-register ROMP (Research Office Products DivisionMicro Processor was a 10 MHz RISC microprocessordesigned by IBM in the early 1980) processor used inthe IBM RT(computer workstation by IBM).

    o The RT had disappointing performance and IBMstarted the project to build the fastest processor on themarket. The result was the POWER architecture,introduced with the RISC System/6000 in early 1990.

  • 8/13/2019 BL Eloadas1 2prez

    22/33

    Page 22

    43

    History.. POWER architecture

    The POWER architecture incorporated lots ofthe RISC characteristics :

    fixed-length instructions,

    register-to-register architecture,

    simple addressing modes,

    large general register file

    three-operand instruction format.

    Additionally, it has other features more characteristic ofmore complex ISAs.

    44

    Power Architecture

    o Designed to be superscalar- dispatched across threeindependent units: branch, fixed-point arithmetic, and floatingpoint units. This allows out of order execution.

    o Compound instructions--updating the base register on a loadand store with the newly calculated effective address, thuseliminating the need for extra add instructions required toincrement the index for array traversals.

    o Does not implement delayed branches- Instead the POWERarchitecture uses a branch target buffer, and the now well knownbranch folding technique.

    o Branching technique- The POWER architecture has eightcondition registers that are set by compare instructions. Oneadditional bit in the opcode of each instruction signaled thatinstructions should be executed only under certain conditions, aform of predicated execution.

  • 8/13/2019 BL Eloadas1 2prez

    23/33

    Page 23

    45

    Shortfalls..

    o The original POWER microprocessor, one ofthe first superscalar RISC implementations,was a high performance, multi-chip design.

    o IBM soon realized that they would need asingle-chip microprocessor to scale theirRS/6000 line from lower-end to high-endmachines.

    o Work on a single-chip POWERmicroprocessor, called the RSC (RISC SingleChip) began. In early 1991 IBM realized thattheir design could potentially become a high-volume microprocessor used across theindustry.

    46

    PowerPC Architecture

    o In order to maintain RS/6000 software compatibility, thePowerPC adapted the POWER architecture, and manyenhancements were added to provide a low-cost, single-chip,superscalar, multiprocessor capable, and 64-bit processor.

    Several bit/field instructions that use three source

    operands were eliminated to avoid the need for extraregister ports.

    Complex string instructions were left out, consistentwith the RISC philosophy.

    Instructions whose operation was dependent on thevalue of source operand were eliminated.

    Precision shifts, integer multiplies, and divide-with-reminder instructions were omitted.

    Support for operation in both big-endian andlittle-endian modes

    Single and double precision floating-point arithmetic

    64-bit architecture, backward compatible to 32-bit

  • 8/13/2019 BL Eloadas1 2prez

    24/33

    Page 24

    47

    PowerPC family

    o PowerPC 601: medium sized and medium performance processor

    includes a more sophisticated branch unit

    capable to dispatch three out-of-order instructions per cycle. up to 8 instructions per cycle can be fetched directly into an

    eight-entry instruction queue (IQ), where they're decodedbefore being dispatched to the execution core.

    Branch folding:

    The instruction queue is used for detecting and dealingwith branches. The branch unit scans bottom four entries ofthe queue, identifying branch instructions and determiningwhat type they are (conditional, unconditional).

    In cases where the branch unit has enough information toresolve the branch right then and there (an unconditionalbranch, or a conditional branch whose condition is dependenton information that's already in the condition register) thenthe branch instruction is simply deleted from the instruction

    queue and replaced with the instruction located at the branchtarget.

    o PowerPC 603: smaller die size than the 601

    smaller cache

    capable to dispatch three out-of-order instructions per cycle.

    48

    Current Status PowerPC e200 - 32 bit power architecture microprocessor - speed ranging up

    to 600 MHz - ideal for embedded applications.

    PowerPC e300 similar to e200 with an increase in speed upto 667 MHz. PowerPC e600 speed upto 2 Ghz ideal for high performance routing and

    telecommunications applications.

    POWER5 IBM dual core P POWER6 IBM Dual core P - A notable difference from POWER5 is that the

    POWER6 executes instructions in-order instead of out-of-order

    PowerPC G3 - Apple Macintosh computers such as the PowerBook G3, themulticolored iMacs, iBooks and several desktops, including both the Beigeand Blue and White Power Macintosh G3s.

    PowerPC G4 - is a designation used by Apple Computer to describe a fourthgenerationof 32-bit PowerPC microprocessors.

    PowerPC G5 - 64-bit Power Architecture processors

    Xenon - based on IBMs PowerPC ISA XBOX 360 game console. Broadway based on IBMs PowerPC ISA Nintendo Wii gaming console

    Blue Gene/L - dual core PowerPC 440, 700 MHz, 2004

    Blue Gene/P - quad core PowerPC 450, 850 MHz, 2007

  • 8/13/2019 BL Eloadas1 2prez

    25/33

  • 8/13/2019 BL Eloadas1 2prez

    26/33

    Page 26

    51

    PowerPC RegistersPowerPC's application-level registers are broken into three categories:

    general purpose, floating point and special purpose registers.

    o General-purpose registers (GPRs) - r0 to r31 flat-scheme of 32 general purpose registers.

    Source and destination for all integer operations

    address source for all load/store operations.

    They also provide access to SPRs.

    All GPRs are available for use with one exception: in certaininstructions, GPR0 simply means the value 0, and no lookup isdone for GPR0's contents.

    o Some of these registers have special tasks assigned to them: r0 Volatile register which may be modified during function linkage

    r1 Stack frame pointer, always valid

    r2 System-reserved register r3-r4 Volatile registers used for parameter passing and return values

    r5-r10 Volatile registers used for parameter passing

    r11-r12 Volatile registers which may be modified during function linkage

    r13 Small data area pointer register

    r14-r30 Registers used for local variables

    r31 Used for local variables or "environment pointers

    52

    Floating point registers

    o Floating-point registers (FPRs)- fr0 to fr31

    32 floating-point registers with 64-bit precision.

    source and destination operands of all floating-point operations

    can contain 32-bit and 64-bit signed and unsigned integer values, aswell as single-precision and double-precision floating-point values.

    FPRs also provide access to the FPSCR(Floating-Point Status and

    Control Register) FPSCR captures status and exceptions resulting from floating-

    point operations, and also provides control bits for enablingspecific exception types.

    Instructions to load and store double precision floating pointnumbers transfers 64-bit of data without conversion.

    Instructions to load from memory single precision floating pointnumbers convert to double precision format before storing them inthe register.

    f0 Volatile register

    f1 Volatile register used for parameter passing and return values

    f2-f8 Volatile registers used for parameter passing

    f9-f13 Volatile registers

    f14-f31 Registers used for local variables

  • 8/13/2019 BL Eloadas1 2prez

    27/33

    Page 27

    53

    Special-purpose registers (SPRs)

    The Fixed-Point Exception Register (XER)- used for indicating conditions forinteger operations, such as carries and overflows.

    The Floating-Point Status and Control Register (FPSCR)- 32-bit register used

    to store the status and control of the floating-point operations.

    The Count Register (CTR)- used to hold a loop count that can be decremented

    during the execution of branch instructions.

    The Condition Register(CR)-32-bit register grouped into eight fields, where

    each field is 4 bits that signify the result of an instructions operation: Equal

    (EQ), Greater Than (GT), Less Than (LT), and Summary Overflow (SO).

    The Link Register (LR) contains the address to return to at the end of a

    function call.

    54

    Data Types

    It can use either little-endian or big-endian style.

    Fixed-point data types include:o Unsigned byte 8bitso Unsigned halfword 16-bits

    o Signed halfword 16-bitso Unsigned word 32-bit

    o Signed word 32-bit

    o Unsigned doubleword 64-bits

    o Byte Strings: From 0 128 bytes in length

    2s complement is used for negative values floating-point data formats

    single-precision, 32 bits long (23 + 8 + 1)

    double-precision, 64 bits long (52 + 11 + 1)

    characters are stored using 8-bit ASCII codes

  • 8/13/2019 BL Eloadas1 2prez

    28/33

    Page 28

    55

    Instruction types

    56

    Instruction Format

    All instruction encodings are 32 bits in length.

    Bit numbering for PowerPC is the opposite of most otherdefinitions: bit 0 is the most significant bit, and bit 31 is theleast significant bit.

    Instructions are first decoded by the upper 6 bits in a field,

    called the pr imary opcode. The remaining 26 bits contain fieldsfor operand specifiers, immediate operands, and extendedopcodes, and these may be reserved bits or fields.

    Common Instruction formats:

    Format 0-5 6-10 11-15 16-20 21-25 26-29 30 31

    D-form opcd tgt/src src/tgt immediate

    X-form opcd tgt/src src/tgt src extended opcd

    A-form opcd tgt/src src/tgt src src extended opcd Rc

    BD-

    form

    opcd BO BI BD AA LK

    I-form opcd LI AA LK

  • 8/13/2019 BL Eloadas1 2prez

    29/33

    Page 29

    57

    Instruction format D-form- provides up to two registers as source operands, one immediate source,

    and up to two registers as target operands. Some variations of this instruction

    format use portions of the target and source register operand specifiers asimmediate fields or as extended opcodes.

    X-form- provides up to two registers as source operands and up to two targetoperands. Some variations of this instruction format use portions of the target andsource operand specifiers as immediate fields or as extended opcodes.

    A-form- provides up to three registers as source operands, and one target operand.Some variations of this instruction format use portions of the target and sourceoperand specifiers as immediate fields or as extended opcodes.

    BD-form- conditional branch instruction. The BO field specifies the type of conditionBI field specifies which CR bit to be used as the condition; BD field is used as thebranch displacement. AA bit specifies whether the branch is an absolute or relativebranch. The LK bit specifies whether the address of the next sequential instructionis saved in the Link Register as a return address for a subroutine call.

    I-form- used by the unconditional branch instruction. Being unconditional, the BOand BI fields of the BD format are exchanged for additional branch displacement toform the LI instruction field. This instruction format also supports the AA and LKbits in the same fashion as the BD format.

    Simplified powerpc instrution set http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set/

    D-form opcd tgt/src src/tgt immediate

    X-form opcd tgt/src src/tgt src extended opcd

    A-form opcd tgt/src src/tgt src src extended opcd Rc

    BD-form Opcd BO BI BD AA LK

    I-form opcd LI AA LK

    58

    PowerPC Addressing Modes

    Load/store architecture

    Indirect

    Instruction includes 16 bit displacement to be added to base register(may be GP register)

    Can replace base register content with new address

    Indirect indexed Instruction references base register and index register (both may be GP)

    EA is sum of contents

    Branch address Target address calculation

    Absolute TA= actual address

    Relative TA= current instruction address + displacement{25 bits, signed}

    Indirect

    Arithmetic

    Operands in registers or part of instruction

    Floating point is register only

    Link Register TA= (LR)Count Register TA= (CR)

    http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set/http://pds.twi.tudelft.nl/vakken/in1200/labcourse/instruction-set/
  • 8/13/2019 BL Eloadas1 2prez

    30/33

  • 8/13/2019 BL Eloadas1 2prez

    31/33

    Page 31

    61

    PowerPC G4e Pipeline Stages

    Stages 1 and 2 - Instruct io n Fetch:

    These two stages are both dedicated primarily tograbbing an instruction from the L1 cache.

    The G4e can fetch four instructions per clock cycle fromthe L1 cache and send them on to the next stage

    Stage 3 - Decode/Dispatch:

    Once an instruction has been fetched, it goes into a 12-entry instruction queue to be decoded.

    The G4e's decoder can dispatch up to three instructionsper clock cycle to the next stage.

    62

    PowerPC G4e Pipeline Stages

    Stage 4 - Issue:

    The first queue Floating-Point Issue Queue (FIQ), whichholds floating-point (FP) instructions that are waiting tobe executed.

    The second is the Vector Issue Queue (VIQ), which holdsvector operations.

    The third queue is the General Instruction Queue (GIQ),which holds everything else.

    Once the instruction leaves its issue queue, it goes to theexecution engine to be executed.

  • 8/13/2019 BL Eloadas1 2prez

    32/33

    Page 32

    63

    PowerPC G4e Pipeline Stages

    Stage 5 - Execute:

    The instructions can pass out-of-order from their issuequeues into their respective functional units and beexecuted.

    Stage 6 and 7 - Comp lete and Write-Back :

    In these two stages, the instructions are put back into theorder in which they came into the processor, and theirresults are written back to memory.

    64

    Design principles

    Simplicity favors' regularity

    Standard 32 bit instruction format for allinstructions

    fixed-length instructions,

    register-to-register architecture

    three-operand instruction format.

    Smaller is faster 3- Categories of registers , but each handles specific

    instructions so presumably faster access time

    Make the common case fast Integer and floating point instructions

    Good design demands good compromises To align with RISC principles many instructions that required

    three source operands were eliminated

    Many complex instructions curtailed to confirm with RISCprinciples but compensated by large number of mnemonics thatincrease the number of instructions .

  • 8/13/2019 BL Eloadas1 2prez

    33/33

    65

    Pros and Cons Instruction Set

    200 machine instructions

    More complex than most RISC machines

    e.g. floating-point multiply and add instructions that takethree input operands

    e.g. load and store instructions may automatically updatethe index register to contain the just-computed targetaddress

    Pipelined execution

    More sophisticated than SPARC

    Input and Output Two different modes

    Direct-store segment: map virtual address space to anexternal address space

    Normal virtual memory access

    Permits a range of implementation from lowcost controllers through high performanceprocessors.