26
Arun Hariharan (N.M.S.U)

Arun Hariharan (N.M.S.U). MOTIVATION Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

Embed Size (px)

Citation preview

Page 1: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

Arun Hariharan (N.M.S.U)

Page 2: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

MOTIVATION

Need for high speed computing and Architecture

More complex compilers (JAVA)

Large Database Systems

Distributed Computing on Internet

Peer competition from other manufacturers

SOLUTION

Instruction Level Parallelism (ILP) in general-purpose Microprocessors

Wide floating-point exponents

Register Stack Engine Hardware exception deferral

Control speculation Register rotation

Large register files Data speculation

Predication Parallel semantics

Page 3: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

GOALS OF ARCHITECTURE

Overcome performance limiters :

Branches

Memory Latency

Sequential Program Model

Long Architectural Life

Large Register File

Fully Interlocked Architecture – Not tied to any particular design

No Fixed Issue – ex. Instructions length.

Page 4: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

REGISTER RESOURCES

• 128 65-bit General Registers (1 KB) ( 64 + 1”NaT” )

• 128 82-bit Floating Point Registers

• Space for up to 128 64-bit special-purpose application registers (1 KB)

• Eight 64-bit branch registers for function call linkage and return

• 64 one-bit predicate

Page 5: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed
Page 6: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed
Page 7: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

INSTRUCTION ENCODING

Key Words• Long life• Instruction bundle

PredicateReg 3Reg 2Reg 1Op code

5 bit 7 bit 7 bit 7 bit 6bit = 32 bit

Also called Template• Helps to decode and route instruction•Marks end of basic block

=41 bits

Page 8: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed
Page 9: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed
Page 10: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

DISTRIBUTING RESPONSIBILITY

Shift a lot of the complexity to the compiler

ILP

Out-of-Order Execution

Control Flow Parallelism

Influencing Dynamic Events – Learn hints from compiler about branch prediction, instruction/data caching & pre-fetching.

Page 11: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

ILP – Instruction Level Parallelism

• Sequential In-Order execution was not enough to have maximum parallelism

• Out-of-order execution – Compilers task to creates instruction groups so that all instructions in an instruction group can be safely executed in parallel

Key Word

• Basic Block

Page 12: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

CONTROL FLOW PARALLELISM

Traditional execution

• Compare a and 0

• Check flag if true

• Store flag value for further computation

• Compare b <= 5

• Check flag if true

• Store flag value for further computation

|

|

• Compare if any one had set the flag.

• Move 8 to r3

In IA-64• Initialize p1 to false• Set compare condition’s prerequisite• Compare in parallel• Branch

Page 13: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

FINDING AND CREATING PARALLELISM

BRANCHES LIMIT ILP:Sequential, no-predict: normal bank tellerSequential, predict: fill out slip in advance (predict whether deposit or withdrawal)Predicated Execution: fill out both slips, throw away whichever is wrong

Page 14: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

FINDING AND CREATING PARALLELISM (cont..)

Scheduling and Speculation

Moving basic blocks ahead of barriers - compilers task to find possible

route and schedule it instead of the processor.

Use of basic blocks (Define)

Best possible Route – Most predicted flow of program (speculation), not all instructions are executed

Compilers – Have a birds eye view of program, unlike the processor.

Page 15: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

CONTROL SPECULATION

Removing branches – Expensive

Not all can be removed

Moving basic blocks call cause Exceptions

=41 bits

Key Word

• Fix-up Code

Page 16: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

DATA SPECULATION

ALAT – Adv. Load Address table

Key Word

• Fix-up Code

Page 17: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

REGISTER MODEL

• 128 – 64bit registers of which 32 are fixed for µP operations (like RISC)• 96 are free to compiler to use.• Unlimited registers use possible as they are paged to memory in background using the RSE (Register Stack Engine)• “Alloc” to specify number for registers for local and output (for parameters to calls.• Programs renames registers to start from 32 to 127.

Page 18: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

RSE (Register Stack Engine)

Automatically saves/restores stack registers without software intervention (Can work synchronously)

• Provides the illusion of infinite physical registers by mapping to a stack of physical registers in memory• Overflow: Alloc needs more registers than available needs more • Underflow: Return needs to restore frame saved in memory

RSE may be designed to utilize unused memory bandwidth to perform register spill and fill operations in the background (Asynchronously - Speculatively to load and store data)

Page 19: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

SOFTWARE PIPELINE

Time complexity is calculated by O(n)This notation is used to count time spent in loops That is because loops take most execution time

Time complexity is calculated by ____ ?

Can we implement loops in parallel ?ANS : Yes. If we resolve some problems.

• Managing the loop count, • Handling the renaming of registers for the pipeline,• Finishing the work in progress when the loop ends,• Starting the pipeline when the loop is entered, and• Unrolling to expose cross-iteration parallelism.

IA-64 Solution• Special architecture

• Loop count LC• Epilog count EC• Use of register rename base (rrb)

Page 20: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed
Page 21: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

SUMMARY

• Synergy

• ILP by compiler and hardware

• Data and Control Speculation

• Multi-chip and multi-processing

• EPIC – Explicit parallel instruction computing

Page 22: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

• “RISC architectures claim to match many of the features of IA-64 with similar sounding instructions. However, just like a tank formed by bolting weapons and armor to an old truck, the benefits are limited to specific conditions, but fall short in the heat of battle.”

• Existing RISC architectures that use ‘cmoves’ and similar instructions may remove branches, but at the cost of adding so many instructions that the benefits are nearly outweighed by the code-bloat (hardly worth the trade-off). The reason why ILP works with IA-64 is the use of completely new architectural constructs such as predicates that are not available to any existing RISC architecture.

• Traditional RISC architectures can use a ‘non-faulting load’ to avoid costly error handling when loading data ahead of time which may not be valid. But if you want to turn off the errors, why have errors in the first place? Traditional RISC architectures face one of two alternatives: add extra error-checking code which, once again, cancels out the performance benefit of speculative execution ; or ‘work without a net,’ risking disastrous undetected errors due to turning off the error messages. IA-64 gets around both problems by offering a novel architectural approach to dealing with errors when loading data.

RISC Vs IA-64– Whitepaper by Intel & HP(1999)

Page 23: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

Benchmark comparison

Page 24: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

BACKWARD COMPATIBILITY

Intel promises compatibility with the 32-bit software (IA-32).

It should be possible to run software in real mode (16 bits), protected mode (32 bits) and virtual mode 86 (16 bits).

Page 25: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed
Page 26: Arun Hariharan (N.M.S.U). MOTIVATION  Need for high speed computing and Architecture More complex compilers (JAVA) Large Database Systems Distributed

Questions?

REFERENCES

1. Ricardo Zelenovsky and Alexandre Mendonca – “Intel 64-bit Architecture” – 2001

2. Bruce Jacob – “The IA-64 Architecture” – University of Maryland (College Park)

3. Whitepaper – “IA-64 Architecture Innovations” –HP & Intel – 19994. Carole Dulong et al. - “An overview of Intel IA-64 Compiler”5. M. F. Guest - “Intel’s Itanium IA-64 Processor: Overview and Initial

Experience” – CLRC Daresburg Laboratory