22
Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Embed Size (px)

Citation preview

Page 1: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Transmeta’s Crusoe Architecture

Umran A. Khan

Microprocessors

Page 2: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Generations of Crusoe’s Processors Original architecture TM3120, TM5400 Later version TM5600-TM5800

The architecture is moreover the same, but is improved Faster clock rate (up to 800 MHz now) Smaller core/size (0.13 micron die) Has special instructions for the OS its emulating Lower power consumption Wider range of applications (from internet appliances to high

density servers)

We will look at the TM5400 here

Page 3: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Instruction Set

Uses a VLIW (Very Long Instruction Word) Instruction format/engine Instruction word is a 128 bit long packet

Each word (also called molecule) has four individual execution units called atoms

These atoms are packed into either a 128 or 64-bit chunks These atoms (operations) execute in parallel (4 operations

per clock) These Operations must be independent from one and

another

Page 4: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Four Execution Units

FPU (Floating Point Unit) Has a 10-stage floating point pipeline Uses conventional x86 80-bit register format

32 FP registers

2 Integer ALU (Arithmetic-Logic Units) Has a 7-stage integer pipeline 64 32-bit registers dedicated to it

LSU (Load/Store Unit) Branch Unit

Page 5: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Sample Instruction

128 bit Instruction

FADD ADD LD BRCC

FPU Integer LSU BU

ALU#0 (Load/Sore) (Branch)

Figure copied from reference#1

Page 6: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Introduction to Code Morphing

Code Morphing Software is a clever translation software layer that dynamically recompiles a x86 program into its native VLIW instruction format Located in the Bios Rom and runs in main memory An entire group of instructions are translated at once and then is

put into the translation cache Basically, an emulation mechanism

It can be used for architectures other than x86 such as the Linux (TM3120), Alpha’s FX!32, but TM5400’s is known for its x86 compatibility Great Potential!

Page 7: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Crusoe Translation layers

CPU Core

X86 ApplicationsOperating System

X86 Bios

Code Morphing Layer

Page 8: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Traditional x86 Architecture

Ia32 instructions are translated by the cpu into more compact and uniformed RISC-like instructions (translates instruction individually)

fancy/complicated translation It has dedicated hardware for

x86 Instruction translation Branch prediction Register Renaming Instruction reOrder

Page 9: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Transmeta’s Simplified Core

Al lot of the processor functionality is implemented in software Its hardware if made up of execution units, the

instruction decode unit and of course, the cache However, the rest of dedicated hardware (in previous

slide) is done in software Advantages

the cpu takes less die space less power demanding Less expensive for production and upgrades

Page 10: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Hardware vs. Software Implemented the hardware in software comes with a cost

Software is slower than hardware But how much slower?

It is not so easy Its reordering registers, renaming registers, predicating branches on the

fly, etc. using the same hardware used for addition, instruction execution, etc. adds complications

Does the benefits outweigh the costs?According to Transmeta, IT DOES!

Page 11: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Execution, Decoding and Scheduling

In x86, Instructions are translated individually An instruction’s binary is fetched and decoded into n

operations These operations are reordered and are fed to the execution

units (i.e. FPU, ALU, etc.) in parallel the sequence is reconstructed for execution

an out-of order execution has to be reconstructed in sequence and retranslated (complicated and costly)

Page 12: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Execution, Decoding and Scheduling (Continued)

In Crusoe, A group of instructions are translated at once Instructions are translated once and are placed into

the translation cache If the same code is run again, the processor can

grab it from the translation cache Instructions can by reordered by the scheduler by

looking at the generated code Thus, the number of instructions executed can be

minimized

Page 13: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Caching and Optimization

Translation cache used more efficiently A translation is optimized every time it is executed However, it will probably require more than pass for it to be truly

optimized Optimization is done in steps Sections of code usually don't get optimized if they occur only once Code is recompiled quickly to keep the processor and programming

running

Uses common optimizations done by a ordinary compiler Optimizer is basically a simple compiler

Page 14: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Optimization Strategies The Code Morphing software has many ways to gather feedback about

a running program “Instrument Translation”

Special code is used to collect information about the block that is going to be executed

This info is later used for optimizations and translation Branch predictions, path speculations and the reordering loads and stores are done by

the Code Morphing layer with some (Alias) hardware support and some condition code

Filtering Determines how much effort must be spent on translation and optimizing a piece

code Executions modes

Interpretation, translation with or without optimization

Page 15: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Translation Example

addl %eax, (%esp)addl %ebx, (%esp)movl %esi, (%ebp)subl %ecx, 5

FRONTENDld %r30, [%esp]add.c %eax, %eax, %r30ld %r31, [%esp]add.c %ebx, %ebx, %r31ld %esi, [%ebp]sub.c %ecx, %ecx, 5

OPTIMIZERld %r30, [%esp]add %eax, %eax, %r30add %ebx, %ebx, %r30ld %esi, [%ebp]sub.c %ecx, %ecx, 5

SCHEDULERld %r30, [%esp]; sub.c %ecx, %ecx, 5ld %esi, [%ebp]; add %eax, %eax, %r30; add %ebx, %ebx, %r30

KEY

ld – load movl - load

Addl – load and add add.c - add with condition codes set

Subl – load and sub sub.c - sub with condition codes set

Example from reference#2

Page 16: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Power Management Typical power saving approaches

Switching off the processor Having duty cycles Causes glitches

Changing the clock rate by suspending to and restarting from the RAM

Crusoe power saving Approaches Longrun power management (next slide) Integrated the north bridge of the chipset and RAM controllers onto the cpu

core Can also integrate video and sound cards Saves power in the overall system

Page 17: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Longrun Power Management

Feature of Code Morphing Software layer by detecting cpu load

Can adjust clock frequency on the fly Can dynamically change the cpu voltage It can reduce power consumption by 30% by

lowering the cpu clock rate by 10% 30% = 100% x (1-(.9 x .99 )) Less heat problems

No need for extra fans take up more power and space

Page 18: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Conclusion Advantages

low power consumption technology Low cost Longer battery life Great for the mobile user, embedded systems and even high

density servers Smaller and lighter computers

Code Morphing technology Can emulate any target architecture

Compatibility Uses special optimization techniques for target Operating

Systems Easier Software debugging (look at reference #1) Cheaper and Simplified upgrades

Page 19: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Conclusion (Continued) Disadvantages

An emulation can not be faster than the real thing Code translation requires extra cycles Code Morphing technology runs in main memory and takes up memory bandwidth Heavy coding

Inherits the some of the same problems with other VLIW processors Need clever Compilers for parallelism Too much fixup code (for speculation, predictions, rollbacks, etc.)

Technology seems to be really geared toward mobile users For desktops (power users) and servers, performance outweighs power

consumption

Performance is a measure of power consumption

Page 20: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Final Thoughts Transmeta only reported a net revenue of $4.1

millions for the first quarter of 2002 No significant share in the mobile industry

Even though Transmeta has a clever technology, the clock speeds of AMD and Intel have overshadowed its impact just like multiflow (clock speed are about 1.0 GHZ faster than the Crusoe)

AMD and Intel have also develop their own power efficient mobile processors (mobile Athlon XP with AMD PowerNow!™ technology and mobile pentium 4 with Intel® SpeedStep® technology)

Page 21: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

Stay Tuned for the next Exciting Episode

VS.

AMD, I am your father! Not any

more!!!

Page 22: Transmeta’s Crusoe Architecture Umran A. Khan Microprocessors

References

http://www.hardwareanalysis.com/content/editorials/article/1237.4/

http://www.transmeta.com/pdf/white_papers/paper_aklaiber_19jan00.pdf

http://www.arstechnica.com/cpu/1q00/crusoe/crusoe-1.html

http://www.erc.msstate.edu/~reese/EE8063/html/transmeta/transmeta.pdf