AVR_RISC

Embed Size (px)

Citation preview

  • 8/2/2019 AVR_RISC

    1/3

    AVREnhanced RISC Microcontrollers

    Alf-Egil BogenVegard Wollan

    ATMEL Corporation

    ATMEL Development Center, Trondheim, Norway

    High level languages (HLLs) are rapidlybecoming the standard programmingmethodology for embedded microcontrollers(MCUs), even for smaller 8-bit devices. The Clanguage is probably the most widely used HLLin MCUs, but will in most applications give anincreased code size compared to assemblyprogramming. ATMEL identified the need of anarchitecture developed specially for the Clanguage in order to reduce this overhead to aminimum. The result is the ATMEL AVR MCU,that in addition to the optimized code size, is atrue single cycle RISC (Reduced InstructionSet Computer) machine with 32 generalpurpose registers (accumulators) running 4-12times faster than currently used MCUs.

    1. Introduction

    The initial AVR product offering is three 8-bitbase-line devices with enhanced 16-bithardware support. Atmels low-power non-volatile memory technology is used forprogram code and data. The on-chip programFlash and data EEPROM are in-systemprogrammable. The three first AVR MCUshave 1K, 2K and 8K bytes program Flashorganized as 16-bit wide instruction words.

    The Atmel AVR Enhanced RISCMicrocontrollers offer an architecture conceptfor high performance and low-powerconsumption simultaneously. A full range ofAVR MCUs - from base-line to top end -feature a RISC architecture and instruction setoptimized for efficient code density with built-insupport for high-level languages.

    Please refer to [1] for more details.

    2. Enhanced RISC

    Many existing RISC architectures requirelarger code size to perform a given task withthe traditional CISC (Complex Instruction SeComputer) architectures. RISC MCUs areoften chosen where a high speed is needed

    The reduced instruction set will be fast, bureduced in complexity.

    The AVRis designed to be a RISC MCU with alarger number of instructions to reduce thecode size and to increase the speed furtherCiscy-like instructions are introduced withouletting the RISC performance and low poweconsumption features suffer. This first majoenhancement was made after thoroughanalysis of several architectures and large

    amounts of application code. Still, the regulaAVRRISC architecture enables cost effectiveimplementations.

    The second enhancement is achieved bytuning the architecture for optimizing codegeneration for the C language. This was donewith a large application oriented benchmarksuite, where the code was pseudo-compiled fothe different enhancement alternatives in thearchitecture. Special tuning of the differen

    addressing modes was important, need tohave instead of nice to have.

    Many MCU architectures have only a smalnumber of general registers or workingregisters (accumulators) - typically 1-8registers. This is a major drawback for the Ccompiler design, where a lot of data moving isnecessary. The AVRhas 32 general-purposeworking registers, that the C compiler fullyutilizes to achieve the highest code density.

  • 8/2/2019 AVR_RISC

    2/3

    3. True Single Cycle Instructions

    With true single cycle instructions the internalclock is identical to the oscillator clock. There isno internal divider to produce the differentclock phases!

    Most of the micros in the 8 - 16-bit market aredividing the clock with a ratio of 1:4 to 1:12,which is a bottleneck for the speed. For a giventask the AVRwill run 4 to 12 times faster, orthe power consumption can be reduced by afactor 4-12 with the same clock frequency. In aCMOS technology, the power consumption ofdigital logic is proportional to the frequency.Figure 1 shows the extreme increase of MIPS(Million Instructions Per Second) with truesingle cycle (1:1 ratio) compared to a clock

    division ratio of 1:4 and 1:12.

    Figure 1: MIPS/Power Consumption

    4. Designed for the C language

    The C language is the most used HLL in theworld today for MCUs. Since most MCUarchitectures are developed with assemblyprogramming in mind the support for typical Cinstructions are poor. ATMELs goal was to

    develop an architecture that was efficient bothfor the C language and assembly. With many Cexperts from C compiler suppliers in the designteam, a very code efficient 8-bit micro with 16-bit support is developed.

    When programming in C, one general rule is touse variables defined within a routine (local),instead of using global variables known in thewhole program. Local variables will only

    allocate RAM memory when executing thespecific routine while global variables wiloccupy RAM all the time. To handle locavariables fast and code efficient, a lot ogeneral-purpose registers are needed. TheAVR has 32 general-purpose registers, all othem in the Arithmetic Logic Unit (ALU) pathallowing true single cycle instructions. Figure 2

    shows how efficient many registers can becompared to traditional CISC architectures withone accumulator.

    Figure 2: Efficient code in the AVR

    Three pairs of the 32 registers can be used as

    16-bit pointers allowing indirect jumps and callsas well as many data memory accessingmodes directly related to the C language. Inaddition to that the traditional stack for returnaddress is available through the instruction setMost MCU architectures have only 1-2accumulators and 1-2 pointers.

    Since pointers are very frequently used in Cthe operations on the pointers are veryimportant due to speed and code size. The

    AVRhas addressing modes that directly preincrements or post-decrements the differenpointers when used to access data memory.

    In addition to that, table lookup or stackoperations can effectively be performed byusing displacement (relative to the currenpointer value) as shown in Figure 3. Thedisplacement range is 0-64 bytes and detailedanalysis shows that the range is sufficient fo

    Function: A = ((A .and. 84h) + (B.eor.C).or.80h

    AVRcode CISC codeEOR B,C MOV ACC,C

    ANDI A,#84h EOR ACC,B

    ADD A,B MOV TMP,ACC

    ORI B,#80h MOV ACC,A

    AND ACC,#84hADD ACC,TMP

    OR ACC,#80h

    MOV A,ACC

    8 bytes 12-16 bytes

    4 clocks 48-96 clocks

  • 8/2/2019 AVR_RISC

    3/3

    most lookups in structures and tables. Onevery important note is that this fits into a singleword instruction!

    To be able to change the pointers more thanby using the pre-decrement or post-incrementmodes addition and subtraction between a 16-bit pointer and a constant are implemented inone cycle and a single word. The threepointers can also be used as eight 8-bit orthree 16-bit general purpose working registers.

    In general it is important to use 8-bit numbersin an 8-bit MCU since all data memories are 8bit wide. However, when using C, largernumber like integers (16 bit), long (32 bit) andfloat (32 bits floating point) are frequently used.Traditionally, computing on such numbersgenerates very large code, but since the AVRis designed to handle it, the AVR codegenerated is extremely small.

    Example 1 shows an example where a smallpart of a routine written in C is compiled to seehow it is translated to assembly code.

    Example 1:void routine(void)

    {

    long n1, n2;

    int n3;...

    if (n1 != n2) n3 +=5;

    ...

    }

    n1 and n2 are both 32-bits numbers thatrequire 4 bytes each. n3 is an integer thatrequires two bytes. All three variables aredefined as local variables within the routine.

    This program example will in AVR assemblycode be extremely compact. n1 = R3-R0, n2 =R7-R4 and n3 = R17:R16.:...

    CP R0,R4 ; n1-n2 (byte 0)

    CPC R1,R5 ; n1-n2-C (byte 1)

    CPC R2,R6 ; n1-n2-C (byte 2)

    CPC R3,R7 ; n1-n2-C (byte 3)

    BREQ EQUAL ; Branch if equalSUBI R16,LOW(0xfffb) ;n3+5 low byte

    SBCI R17,HIGH(0xfffb);n3+C high byte

    EQUAL:...

    In most MCUs the comparison from example 1needs many more instructions since they missCompare with Carry (CPC) and zero flagpropagation. Zero flag propagation has asimilar function as Carry propagation, to adjusthe higher bytes in a number. The Zeropropagation is implemented on compare andsubtract instructions. A Compare instruction isgenerally a subtraction without storing of theresult. The CPC is added to supporcomparisons of larger numbers than 8 bit.

    Example 1 does also show how a SubtracImmediate (SUBI) and a Subtract Immediatewith Carry (SBCI) can be used instead oaddition (ADDI and ADCI) to compute n3 + 5

    by subtracting the 2s complement of 5. SinceADDI and SUBI are complementaryinstructions only one pair is implemented toleave decoding space for other importaninstructions.

    All the discussed features are implemented asa result of code benchmark and input from Ccompiler experts. The result is a very fasmicrocontroller with RISC performance andCISC code density.

    [1] AVR Enhanced RISC microcontrollerdata book, May 1996. Atmel Corporation

    Figure 3: SRAM Direct with Displacement