Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1. Course overview 2. Intro to PICOBLAZE, C and Number systems and Boolean Algebra 3. Course overview with microprocessor MU0 (I) 4. Course overview with microprocessor MU0 (II) 5. Verilog HDL 6. Digital system components using schematics and Verilog 7. Combinational logic standard forms. Karnaugh maps 8. Combinational ccts and configurable logic devices 9. Simple Sequential circuits, flip flops10. Sequential circuits, counters, registers, memories11. Non-ideal effects in digital circuits12. Finite State Machines13. Design of FSMs14. Design of FSMs15. Datapaths
16. An introduction to Processor Design17. The ARM Architecture18. ARM Asssembly Language Programming19. Programming in C ...
3213: Digital Systems & Microprocessors: L#14_15
Follow Steve Furber 'ARM System on a Chip Architecture Lecture Notes
ARM history• 1983 developed by Acorn computers
– To replace 6502 in BBC computers – 4-man VLSI design team– Its simplicity comes from the inexperienced
team– Match the needs for generalized SoC for
reasonable power, performance and die size
• 1990 ARM (Advanced RISC Machine), owned by Acorn, Apple and LSI
3213: Digital Systems & Microprocessors: L#22_23
ARM LtdDesign and license ARM core design but not fabricate
Why ARM?
• One of the most licensed and thus widespread processor cores in the world– Used in PDA, cell phones, multimedia players, handheld
game console, digital TV and cameras– ARM7: GBA, iPod– ARM9: NDS, PSP, Sony Ericsson, BenQ– ARM11: Apple iPhone, Nokia N93, N800– 75% of 32-bit embedded processors– Cortex – beagleboard open hardware
• Used especially in portable devices due to its low power consumption and reasonable performance
3213: Digital Systems & Microprocessors: L#16-17
http://beagleboard.org/
3213: Digital Systems & Microprocessors: L#16-17
http://alwaysinnovating.com
3213: Digital Systems & Microprocessors: L#16-17
ARM LPC2368 Dev. Boardin HLABs (Furturlec)
3213: Digital Systems & Microprocessors: L#16-17
ARM LPC2368 Dev. Board (Furturlec)
3213: Digital Systems & Microprocessors: L#16-17
• Includes NXP LPC2368 ARM Microcontroller with a huge 512kb Internal Flash Program Memory• Operating Speed up to 72 MHz• Direct In-Circuit Programming via RS232 Connection for Easy Program Updates• Up to 25 I/O points with easy to connect standard headers• Full Speed USB 2.0 Port• Ethernet LAN 10/100Mb Connection for full networking• Large 58k Data RAM• 6 Channels 10-Bit A/D• 1 Channel 10-Bit DAC• 2 Channels standard CAN network• Real Time Clock with Battery Back-Up• SD Card Connector for Data Storage and Transfer• JTAG Connector for Program Download and Debug• LCD Connection with Contrast Adjustment• Load and Reset Button• On-Board 3.3V Regulator• Ideal as an Interchangeable Controller for Real-Time Systems
Computers• All modern-general purpose computers employ the principles of
a stored program digital computer (dates to 1940s)• First implemented SSEM ('Baby') June 1948 Uni Manchester
The Small-Scale Experimental Machine, known as SSEM, or the "Baby", was designed and built at The University of Manchester, and made its first successful run of a program on June 21st 1948. It was the first machine that had all the components now classically regarded as characteristic of the basic computer. Most importantly it was the first computer that could store not only data but any (short!) user program in electronic memory and process it at electronic speed.
• Most advances in computing due to electronics... but ..– Computer Architecture = User view: instruction set, visible
registers, memory management...– Computer Organisation: user-invisible – pipeline structure,
transparent cache, ...
3213: Digital Systems & Microprocessors: L#16-17
3213: Digital Systems & Microprocessors: L#16-17
The state in a stored-program digital computer
address
instructions
processor
memory
registers
instructions
data
00..0016
FF..FF16
and data
MU0 – A simple microprocessor• A simple form of processor can be built with Program counter PC
– Accumulator or working register
– Arithmetic logic unit
– Instruction register (IC)
– Instruction decode and control logic that employs the above components to achieve the desired results from each instruction
MU0 is a 16 bit machine with a 12 bit address space Instructions are 16 bits long with a 4 bit opcode and a 12 bit address word
The datapath: All the components carrying (buses), storing (registers) or processing (alu, mux) bits in parallel form the components of the datapath. Use the RTL description – actually datapath for us!
The Control Logic: Everything else such as decode and control use FSM approach.
3213: Digital Systems & Microprocessors: L#16-17
MU0 – Datapath designNeed a guiding principle to limit design possibilities – usually based on clock
constraint in microprocessors
• Each instruction takes the number of clock cycles equal to the number of memory accesses it must make
• We assume an instruction starts when the instruction appears in the instruction register. There are generally two steps to execute an instruction
1) Access the memory operand and perform the desired operation
2) Fetch the next instruction to be executed
The processor must start in a known location – we can do this with a reset.
3213: Digital Systems & Microprocessors: L#16-17
MU0 control logic
3213: Digital Systems & Microprocessors: L#16-17
The MU0 instruction format
opcode S
12 bits4 bits
3213: Digital Systems & Microprocessors: L#16-17
12 bit address space => 8k memory
4096 individually addressable memory locations
4 bit opcode => 16 possible assembly language Commands only use 8! - good practice
The MU0 instruction set
Instruction
Opcode
Effect
LDA S 0000 ACC := mem 16[S]
STO S 0001 mem 16[S] := ACC
ADD S 0010 ACC := ACC +mem 16[S]SUB S 0011 ACC := ACC -mem 16[S]JMP S 0100 PC := S
JGE S 0101 if ACC >= 0 PC := S
JNE S 0110 if ACC !=0 PC := S
STP 0111 stop
3213: Digital Systems & Microprocessors: L#16-17
First four have two memory accesses and will need two clock cycles
Last four could execute in one cycle
The MU0 control
3213: Digital Systems & Microprocessors: L#16-17
Next we need to determine exactly what controls (logic levels) are needed to make the datapath execute the correct functions given the op-code
We assume that all registers change state on the rising edge of the clock (c.f negative edge Furber – probably because external SRAM used for the actually memory is posedge triggered??).
For the registers the control signals prevent or disallow transitions at the clock.
There are also feedback control signals from the datapath to the control FSM … opcode bits, signals from the accumulator indicating whether its contents are zero or negative which control the respective conditional jump instructions.
All we need to do is develop a two state FSM to generate the control signals
Since there are just two states and lots of control outputs => do a NS table and forget the SD approach
FSMs have no memory of outputs
3213: Digital Systems & Microprocessors: L#16-17
State 1______
SomeOutputs...
State 2______
OtherOutputs...
InputsInputs...
InputsInputs...
InputsInputs...
InputsInputs...
3213: Digital Systems & Microprocessors: L#16-17
module vreg16( clk, q, d, en, rs );
output [15:0] q; input [15:0] d; input clk; input en; input rs;
reg [15:0] q;
always @(posedge clk) begin
if(en && ~rs) q <= d; else if(en && rs) q <= 16'h0; else q <= q;
end
endmodule // v_reg16
Memory comes from memory
Here is how – a 16 bit register...
Dealing with output “don’t cares”
From Furber
The “don’t cares” in FSM outputs set have a different meaning.
As the control state must generate these signals we must eventually define what they will be (Not X's)
Note that Eno is always defined; this is because it is essential to know if the totalregister is to change or not on any given clock edge. If it is changing (Eno = 1)then R/W and Clr control what it is doing; however if it is not changing(Eno = 0) then it doesn’t matter what value is presented to the register – it willignore it.
Allowing latitude at this time gives more freedom in the logic reduction, simpler equations and thus smaller (& faster) circuits.
MemoryRegister Enable (Eno)
Clear (Clr)
Read/Write (R/W)
Input Databus
Output Databus
Address bus
MU0 datapath example
IRPC
ACCALU
memory
control
address bus
data bus
3213: Digital Systems & Microprocessors: L#16-17
MU0 datapath example
3213: Digital Systems & Microprocessors: L#16-17
MU0 register transfer
level organization
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
3213: Digital Systems & Microprocessors: L#16-17
MU0 control logicInputs Outputs
Opco de Ex / f t ACC1 5 Bs e l PCce ACCo e MEMrq Ex / f tIns truct i o n Res et ACCz As e l ACCce IRce ALUfs RnWReset xxxx 1 x x x 0 0 1 1 1 0 = 0 1 1 0LDA S 0000
000000
01
xx
xx
10
10
10
01
01
00
= BB+1
11
11
10
STO S 00010001
00
01
xx
xx
10
x0
00
01
01
10
xB+1
11
01
10
ADD S 00100010
00
01
xx
xx
10
10
10
01
01
00
A+BB+1
11
11
10
SUB S 00110011
00
01
xx
xx
10
10
10
01
01
00
A-BB+1
11
11
10
JMP S 0100 0 x x x 1 0 0 1 1 0 B+1 1 1 0JGE S 0101
010100
xx
xx
01
10
00
00
11
11
00
B+1B+1
11
11
00
JNE S 01100110
00
xx
01
xx
10
00
00
11
11
00
B+1B+1
11
11
00
STOP 0111 0 x x x 1 x 0 0 0 0 x 0 1 0
3213: Digital Systems & Microprocessors: L#16-17
MU0 machine language programprog.lst
3213: Digital Systems & Microprocessors: L#16-17
0006
3007
1006
0006
5000
7000
0004
0001
Instruction
Opcode
Effect
LD A S 0000 AC C := m em16[S ]
ST O S 0001 m em16[S ] := AC C
AD D S 0010 AC C := AC C +m em16[S ]SU B S 0011 AC C := AC C -m em16[S ]JM P S 0100 PC := S
JG E S 0101 if AC C > = 0 PC := S
JN E S 0110 if AC C != 0 PC := S
ST P 0111 s to p
MU0 Function
3213: Digital Systems & Microprocessors: L#16-17
MU0 – Extensions• MU0 is a simple processor but not useful as a compiler target.• Some extensions seem appropriate
– Extending the address space
– Adding more addressing modes
– Allowing the PC to be saved in order to support a subroutine mechanism
– Adding more registers, supporting interrupts, etc...
– More peripherals – watchdog timer, ..
. Overall MU0's instruction set is not a good place to start so let us redesign.
3213: Digital Systems & Microprocessors: L#16-17
Where to start?• Let us start with the core of the microprocessor functionality
- The instruction
• Let us looking at a basic ADD for example
– Some bits to differentiate from other instructions
– Some bits to specify operand addresses
– Some bits to specify where the results should be placed - desitnation
– Some bits to specify the address of the next instruction
3213: Digital Systems & Microprocessors: L#16-17
A 4-address instruction format
function op 1 addr. op 2 addr. dest. addr. next_i addr.
n bitsn bitsn bitsn bitsf bits
3213: Digital Systems & Microprocessors: L#16-17
→ Assembly language instruction format might look like
ADD d, s1, s2, next_i ; d := s1 + s2
Requires (4n + f bits)
A 3-address instruction format
function op 1 addr. op 2 addr. dest. addr.
n bitsn bitsn bitsf bits
3213: Digital Systems & Microprocessors: L#16-17
One way to reduce the number of bits required for each instructionis to make the address of the next instruction implicit.
We assume that the next instruction is PC+ Sizeof(instruction),(note that in MU0 the default next instruction was at PC+1. But ifWe generalise then maybe there can be more than one address to contain the contents of an instruction)
ADD d, s1, s2; d := s1 + s2
A 2-address instruction format
function op 1 addr. dest. addr.
n bitsn bitsf bits
3213: Digital Systems & Microprocessors: L#16-17
A further saving can be made by making the destination register the same as one of the source registers
ADD d, s1; d := d + s2
A 1-address (accumulator) instruction format
function op 1 addr.n bitsf bits
3213: Digital Systems & Microprocessors: L#16-17
If the destination register is implicit then it is often Called the accumulator
ADD s1; accumulator := accumulator + s2
A 0-address instruction format
function
f bits
3213: Digital Systems & Microprocessors: L#16-17
Finally all registers may be made implicit by introducing An evaluation stack
ADD ; top_of_stack := top_of_stack + next_on_stack
Examples of n-address use
3213: Digital Systems & Microprocessors: L#16-17
All of the above have been used in processor instruction sets apart from the 4-address form which, although it is used internally in some microcode(??) designs is unnecessarily expensive
The Inmos transputer uses a 0-address evaluation stack architecture
The MU0 example in the previous section is a 1-address architecture
The Thumb instruction set used for high code density in the ARM micros is predominatly of the 2-address form
The standard ARM instruction set uses a 3-addressarchitecture
Addresses
3213: Digital Systems & Microprocessors: L#16-17
An address in the MU0 architecture is the absolute address of the memory location which contains the operand
The addresses in the three adderss ARM instruction format are register specifiers
In general the term 3-address architecture refers to an instruction set in which the two source and the destination can be specified independently
Instruction Types
3213: Digital Systems & Microprocessors: L#16-17
Data processing instructions such as add, subtract, multiply
Data movement instructions that copy data from one place to another such move (mv)
Control flow instructions that switch control of a program from one place to another depending possibly on data values
Special instructions that change the processor's state such as a privileged mode to carry out an operating system instruction
Note that many instructions fit into more than one class
Addressing modes
3213: Digital Systems & Microprocessors: L#16-17
1. Immediate addessing: the operand of an instruction is present in the instruction itself
1 + 2
ADD 0x01 0x02
Addressing modes
3213: Digital Systems & Microprocessors: L#16-17
2. Absolute addessing: the instruction contains the absolute memory address of the operand
ADD 0x0007 0x0008
0x01
0x02
0x0007
0x0008
0x0006
Addressing modes
3213: Digital Systems & Microprocessors: L#16-17
3. Indirect addessing: the instruction contains the binaryaddress of a memory location that contains the binary addressof a value
ADD 0x01 0x0007
0x0008
0x02
0x0007
0x0008
0x0006 Indirect
Immediate
Addressing modes
3213: Digital Systems & Microprocessors: L#16-17
4. Register addessing: the instruction contains the registernumber of a register that contains the operand
ADD 0x01 regnum
0x02 regnum
Registernumber
Immediate
Addressing modes
3213: Digital Systems & Microprocessors: L#16-17
5. Register Indirect addessing: the instruction contains register number of a register which contains the binary address of the operand in memory
ADD 0x01 regnum
0x02
0x0007
0x0008
0x0006 Registernumber
Immediate
0x0008 regnum
Addressing modes
3213: Digital Systems & Microprocessors: L#16-17
6. Base plus offset addessing: the instruction contains a register number (the base) an offset the actual register that contains the operand
Actual example: the parallel port registers
// Read from the status port (BASE+1) printf("status: %d\n", inb(BASEPORT + 1));
// Set the data signals (D0-7) of the port to all low (0) */ outb(255, BASEPORT);
Control Flow Instructions
3213: Digital Systems & Microprocessors: L#16-17
A control flow instruction is used to modify the PC directly
The simplest such instructions are called 'branches' or 'jumps'
Since most of these are short n memory range the formulation is normally relative
B LABEL……
LABEL …
The assembler works out the displacement which must be added to the PC in order to force the PC to point to LABEL
The max range of the branch is determined by the number of bitsalocated to the displacement.
Conditional branches and conditional code registers
3213: Digital Systems & Microprocessors: L#16-17
Some processors allow the values of general purpose registers to control whether a program will branch or not (MU0)
Branch if a particular register is zero / non-zero / negative etc
Branch if the contents of two specified registers are equal or not
Some processors have special purpose registers (conditional code registers or flag registers that store flags which are the results of particular instructions. For example whether it was negative or a carry resulted etc
Subroutines
3213: Digital Systems & Microprocessors: L#16-17
Sometimes a branch occurs to a subroutine via a call which executes And then returns when on termination
Since the same subroutine may be called from many places aRecord of the calling address in the calling program must be kept
The calling program could compute a suitable address and place it in a suitable memory location accessible to the subroutine
The return address could be pushed onto a stack
The return address could be placed in a register
The RET command in the subroutine places the return address in the PC
A CALL/RETURN in MU0
3213: Digital Systems & Microprocessors: L#16-17
To call a subroutine a stack is needed to store the contents of the PC register
CALL/RETURN Timing
3213: Digital Systems & Microprocessors: L#16-17
Call Fetch
3213: Digital Systems & Microprocessors: L#16-17
Call Execute
3213: Digital Systems & Microprocessors: L#16-17
RET Fetch
3213: Digital Systems & Microprocessors: L#16-17
RET Execute
3213: Digital Systems & Microprocessors: L#16-17
System Calls and Exceptions
3213: Digital Systems & Microprocessors: L#16-17
Some instrucitons such as those directly referring to commands in the operating system of a microprocessor are privileged
System calls pass through protection barriers in a controlled way
E.G. The inb and outb in the parallel port program
Execptions are operations performed by a microprocessor to handlean error condition in a controlled way
Continue our Architecture Rethink
Why not ask what a computer does with its time?
→ Typical dynamic instruction usage
Instruction type Dynamic usage
Data movement 43%
Control flow 23%
Arithmetic operations 15%
Comparisons 13%
Logical operations 5%
Other 1%
3213: Digital Systems & Microprocessors: L#16-17
A 3-address instruction format
function op 1 addr. op 2 addr. dest. addr.
n bitsn bitsn bitsf bits
3213: Digital Systems & Microprocessors: L#16-17
One way to reduce the number of bits required for each instructionis to make the address of the next instruction implicit.
We assume that the next instruction is PC+ Sizeof(instruction),(note that in MU0 the default next instruction was at PC+1. But ifWe generalise then maybe there can be more than one address to contain the contents of an instruction)
ADD d, s1, s2; d := s1 + s2
Pipelined instruction execution
3213: Digital Systems & Microprocessors: L#16-17
1. Fetch the instruction from memory (fetch)2. Decode to see what it does (dec)3. Access any operands from the register bank (reg)4. Combine the operands to form the result or a memory address (ALU)5. Access memory for a data operand (mem)6. Write the result back to the register bank (res)
fetch dec reg ALU mem res1
fetch dec reg ALU mem res
fetch dec reg ALU mem res
2
3
time
instruction
Read-after-write pipeline hazard
fetch dec reg ALU mem res1
fetch dec reg ALU mem res2
time
stall
instruction
3213: Digital Systems & Microprocessors: L#16-17
Branches cause even more problems for pipelining
3213: Digital Systems & Microprocessors: L#16-17
The fetch step of the following instruction is affect by the branch target Computation
Unfortunately subsequent fetches will be taking place while the branch is being decoded and before it has been recognised as a branch
If for example the branch target calculation at the ALU operationthen three fetches will have occurred before the branch is available
Solutions:
Extra hardware to allow the branch instruction to be calculated
Speculative calculation of the branch during the DEC cyclei.e. before we know it is a branch!
Pipelined branch behaviour
fetch dec reg ALU mem res1 (branch)
fetch dec reg ALU mem res
fetch dec reg ALU mem res
2
3
time
instruction
fetch dec reg ALU mem res
fetch dec reg ALU mem res
4
5 (branch target)
3213: Digital Systems & Microprocessors: L#16-17
The Reduced Instruction Set Computer(RISC) versus
The Complex Instruction Set Computer (CISC)
Hard wired instruction decode logic like MU0 – CISC processors used large microcode ROMs to decode logic (still in use...)
Pipeline execution. CISC processors usually do not
Single clock cycle execution. CISC typically took many instructions
To execute a single instruction (it was thought to be economically to design computers with a plentiful high level instruction set to reduce the abstraction level to high level languages such as C and to make up for the lack of pipelining
3213: Digital Systems & Microprocessors: L#16-17
Complex Instruction Set Computer(CISC, x86, 68000)
Wired logic microcode control Temptingly easy extensibility
Performance tuning HW implementation of some high-level functions
Marketing
Add successful instructions of competitors “New feature” hype Compatibility: only extensions are possible
3213: Digital Systems & Microprocessors: L#16-17
CISC Problems
Performance tuning unsuccessful Rarely used high-level instructions Sometimes slower than equivalent sequence
High complexity Pipelining bottlenecks → lower clock rates Interrupt handling can complicate even more
Marketing Prolonged design time and frequent microcode errors hurt
competitiveness
3213: Digital Systems & Microprocessors: L#16-17
Reduced Instruction Set Computer (RISC, ARM, MIPS, PPC)
Low complexity Generally results in overall speedup Less error-prone implementation by hardwired logic or
simple microcodes
VLSI implementation advantages Less transistors Extra space: more registers, cache
Marketing Reduced design time, less errors, and more options
increase competitiveness
3213: Digital Systems & Microprocessors: L#16-17
RISC Compiler Issues
The compilers themselves Computationally more complex More portable
The compiler writer Less instructions → probably easier job Simpler instructions → probably less bugs Can reuse optimisation techniques
3213: Digital Systems & Microprocessors: L#16-17
RISC vs CISC?CISC
Effectively realizes one particular High Level Language Computer System in HW - recurring HW development costs when change needed
RISC
Allows effective realisation of any High Level Language Computer System in SW - recurring SW development costs when change needed
MU0 is a RISC and so is easy to learn.
ARM is RISC – the flagship RISC computer
3213: Digital Systems & Microprocessors: L#16-17
RISC vs CISC - google the seminal (non-technical) article
3213: Digital Systems & Microprocessors: L#16-17
ARM Architecture
- Originally Advanced RISC Machines Limited- Later Acorn RISC Machine
Acorn Computers Limited of Cambridge originally developed ARM from 1983-1985 based on a 1 year student project at Stanford/UC Berkeley that led to more cost effective high performance design to compete with CISC (viz PDP-11, VAX by DEC corporation)
3213: Digital Systems & Microprocessors: L#16-17
ARM’s visible registers
r13_und
r14_und r14_irq
r13_irq
SPSR_und
r14_abt r14_svc
user modefiq
modesvc
modeabortmode
irqmode
undefi nedmode
usable in user mode
system modes only
r13_abt r13_svc
r8_fiq
r9_fiqr10_fiq
r11_fiq
SPSR_irq SPSR_abt SPSR_svc SPSR_fiqCPSR
r14_fiq
r13_fiq
r12_fiq
r0r1
r2r3
r4
r5r6
r7
r8r9
r10r11
r12
r13r14r15 (PC)
ARM CPSR format
N Z C V unused mode
31 28 27 8 7 6 5 4 0
I F T
ARM memory organization
half-word4
word16
0123
4567
891011
byte0
byte
12131415
16171819
20212223
byte1byte2
half-word14
byte3
byte6
address
bit 31 bit 0
half-word12
word8
The structure of the ARM cross-development toolkit
assemblerC compiler
C source asm source
.aof
C libraries
linker
.axf
ARMsd
debug
ARMulatordevelopment
system model
board
objectlibraries
ARM shift operations031
00000
LSL #5
031
00000
LSR #5
031
11111 1
ASR #5 , negative operand
031
00000 0
ASR #5 , positive operand
0 1
031
ROR #5
031
RRX
C
C C
Multiple register transfer addressing modes
r5
r1
r9’
r0r9
STMIA r9!, {r0,r1,r5}
100016
100c 16
101816
r1
r5r9
STMDA r9!, {r0,r1,r5}
r0
r9’ 100016
100c 16
101816
r5
r9
STMDB r9!, {r0,r1,r5}
r1
r0r9’ 100016
100c 16
101816
r5
r1
r0
r9’
r9
STMIB r9!, {r0,r1,r5}
100016
100c 16
101816
The mapping between the stack and block copy views of the load and
store multiple instructionsAs cendi ng Des cendi ng
Ful l Empty Ful l Empty
IncrementBefo re STMIB
STMFALDMIBLDMED
After STMIASTMEA
LDMIALDMFD
DecrementBefo re LDMDB
LDMEASTMDBSTMFD
After LDMDALDMFA
STMDASTMED
Branch conditionsBranch Interpretat i o n No rmal us esBBAL
UnconditionalAlways
Always take this branchAlways take this branch
BEQ Equal Comparison equal or zero resultBNE Not equal Comparison not equal or non-zero resultBPL Plus Result positive or zeroBMI Minus Result minus or negativeBCCBLO
Carry clearLower
Arithmetic operation did not give carry-outUnsigned comparison gave lower
BCSBHS
Carry setHigher or same
Arithmetic operation gave carry-outUnsigned comparison gave higher or same
BVC Overflow clear Signed integer operation; no overflow occurredBVS Overflow set Signed integer operation; overflow occurredBGT Greater than Signed integer comparison gave greater thanBGE Greater or equal Signed integer comparison gave greater or equalBLT Less than Signed integer comparison gave less thanBLE Less or equal Signed integer comparison gave less than or equalBHI Higher Unsigned comparison gave higherBLS Lower or same Unsigned comparison gave lower or same
Naming ARM• ARMxyzTDMIEJFS
– x: series– y: MMU– z: cache– T: Thumb– D: debugger– M: Multiplier– I: Interrupt– E: Enhanced – J: Jazelle– F: Floating-point– S: Source
3213: Digital Systems & Microprocessors: L#22_23
Popular ARM architecture• ARM7TDMI
– 3 pipeline stages– One of the most used ARM-version (for low-end
systems)
• ARM9TDMI– Compatible with ARM7– 5 pipeline stages– Separate instruction and data cache
• ARM11
3213: Digital Systems & Microprocessors: L#22_23
ARM architecture
• Load/store architecture• A large array of uniform
registers• Fixed-length 32-bit
instructions• 3-address instructions
3213: Digital Systems & Microprocessors: L#22_23
Processor modes
3213: Digital Systems & Microprocessors: L#22_23
ARM architecture
• 37 registers– 1 Program counter– 1 current program status
registers– 5 saved program status
registers– 30 general purpose
registers
3213: Digital Systems & Microprocessors: L#22_23
Registers• Only 16 registers are visible to a specific mode. A
mode could access– A particular set of r0-r12– r13 (sp, stack pointer)– r14 (lr, link register)– r15 (pc, program counter)– Current program status register (cpsr)
3213: Digital Systems & Microprocessors: L#22_23
Register organization
3213: Digital Systems & Microprocessors: L#22_23
General-purpose registers
08 716 1524 2331
8-bit Byte
16-bit Half word
32-bit word
• 6 data types (signed/unsigned)• All ARM operations are 32-bit. Shorter data types
are only supported by data transfer operations.
3213: Digital Systems & Microprocessors: L#22_23
Program counter• Store the address of the instruction to be executed• All instructions are 32-bit wide and word-aligned• Thus, the last two bits of pc are undefined.
3213: Digital Systems & Microprocessors: L#22_23
mode bits
Program status register (CPSR)
overflow
carry/borrow
zero
negative
state bit
FIQ disable
IRQ disable
3213: Digital Systems & Microprocessors: L#22_23
Summary
Load/store architecture Most instructions are RISCy, operate in single
cycle. Some multi-register operations take longer.
All instructions can be executed conditionally.
3213: Digital Systems & Microprocessors: L#22_23
3213: Digital Systems & Microprocessors: L#18
Linux usage on Intel (4 level Von Neuman)