37
Microprocessors Microprocessors Frame Pointers and the Frame Pointers and the use of use of the –fomit-frame-pointer the –fomit-frame-pointer switch switch Feb 25th, 2002 Feb 25th, 2002

Microprocessors Frame Pointers and the use of the –fomit-frame-pointer switch Feb 25th, 2002

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

MicroprocessorsMicroprocessors

Frame Pointers and the use ofFrame Pointers and the use of the –fomit-frame-pointer the –fomit-frame-pointer

switchswitch

Feb 25th, 2002Feb 25th, 2002

General OutlineGeneral Outline

Usually a function uses a frame Usually a function uses a frame pointer to address the local variables pointer to address the local variables and parametersand parameters

It is possible in some limited It is possible in some limited circumstances to avoid the use of the circumstances to avoid the use of the frame pointer, and use the stack frame pointer, and use the stack pointer instead.pointer instead.

The -fomit-frame-pointer switch of gcc The -fomit-frame-pointer switch of gcc triggers this switch. This set of slides triggers this switch. This set of slides describes the effect of this feature.describes the effect of this feature.

-fomit-frame-pointer-fomit-frame-pointer

Consider this exampleConsider this exampleInt q (int a, int b) {Int q (int a, int b) {

int c; int c; int d; int d;

c = a + 4; c = a + 4; d = isqrt (b); d = isqrt (b); return c + d; return c + d;}}

Calling the functionCalling the function

The caller does something likeThe caller does something like push second-arg (b)push second-arg (b)

push first-arg (a) push first-arg (a) call q call q add esp, 8 add esp, 8

Stack at function entryStack at function entry

Stack contents (top of memory first)Stack contents (top of memory first)Argument bArgument b

Argument aArgument areturn point return point ESP ESP

Code of q itselfCode of q itself

The prologThe prologpush ebppush ebpmov ebp,espmov ebp,esp

sub esp, 8sub esp, 8

Stack after the prologStack after the prolog

Immediately after the sub of espImmediately after the sub of espsecond argument (b)second argument (b)first argument (a)first argument (a)return pointreturn point

old EBP old EBP EBPEBPvalue of cvalue of cvalue of dvalue of d ESP ESP

Addressing using Frame Addressing using Frame PointerPointer

The local variables and arguments The local variables and arguments are addressed by using fixed offsets are addressed by using fixed offsets from the frame pointer (ESP is not from the frame pointer (ESP is not referenced)referenced)A is at [EBP+8]A is at [EBP+8]B is at [EBP+12]B is at [EBP+12]C is at [EBP-4]C is at [EBP-4]D is at [EBP-8]D is at [EBP-8]

Code for qCode for q

Code after the prologCode after the prolog

MOVMOV EAX, [EBP+8] EAX, [EBP+8] ; A; AADD EAX,4ADD EAX,4MOV [EBP-4], EAXMOV [EBP-4], EAX ; C; C

PUSH [EBP+12]PUSH [EBP+12] ; B; BCALL ISQRTCALL ISQRTADD ESP, 4ADD ESP, 4MOV [EBP-8], EAXMOV [EBP-8], EAX ; D; DMOV EAX, [EBP-4]MOV EAX, [EBP-4] ; C; CADD EAX, [EBP-8]ADD EAX, [EBP-8] ; D; D

Optimizing use of ESPOptimizing use of ESP

We don’t really need to readjust ESP We don’t really need to readjust ESP after a CALL, just so long as we do after a CALL, just so long as we do not leave junk on the stack not leave junk on the stack permanently.permanently.

The epilog will clean the entire frame The epilog will clean the entire frame anyway.anyway.

Let’s use this to improve the codeLet’s use this to improve the code

Code with ESP optimizationCode with ESP optimization

Code after the prologCode after the prolog

MOVMOV EAX, [EBP+8] EAX, [EBP+8] ; A; AADD EAX,4ADD EAX,4MOV [EBP-4], EAXMOV [EBP-4], EAX ; C; C

PUSH [EBP+12]PUSH [EBP+12] ; B; BCALL ISQRTCALL ISQRTMOV [EBP-8], EAXMOV [EBP-8], EAX ; D; DMOV EAX, [EBP-4]MOV EAX, [EBP-4] ; C; CADD EAX, [EBP-8]ADD EAX, [EBP-8] ; D; D

We omitted the ADD after the CALL, not We omitted the ADD after the CALL, not neededneeded

EpilogEpilog

Clean up and returnClean up and return

MOV ESP, EBPMOV ESP, EBP

POP EBPPOP EBP

RETRET

OrOr

LEAVELEAVERETRET

-fomit-frame-pointer-fomit-frame-pointer

Now we will look at the effect of omitting Now we will look at the effect of omitting the frame pointer on the same example, the frame pointer on the same example, that is we will compile this with the –fomit-that is we will compile this with the –fomit-frame-pointer switch set.frame-pointer switch set. Int q (int a, int b) {Int q (int a, int b) {

int c; int c; int d; int d;

c = a + 4; c = a + 4; d = isqrt (b); d = isqrt (b); return c + d; return c + d;}}

Calling the functionCalling the function

The caller does something likeThe caller does something like push second-arg (b)push second-arg (b)

push first-arg (a) push first-arg (a) call q call q add esp, 8 add esp, 8

This is exactly the same as before, This is exactly the same as before, the switch affects only the called the switch affects only the called function, not the callerfunction, not the caller

Stack at function entryStack at function entry

Stack contents (top of memory first)Stack contents (top of memory first)Argument bArgument b

Argument aArgument areturn point return point ESP ESP

This is the same as beforeThis is the same as before

Code of q itselfCode of q itself

The prologThe prolog sub esp, 8sub esp, 8

That’s quite different, we have saved That’s quite different, we have saved some instructions by neither saving some instructions by neither saving nor setting the frame pointernor setting the frame pointer

Stack after the prologStack after the prolog

Immediately after the sub of espImmediately after the sub of espsecond argument (b)second argument (b)first argument (a)first argument (a)return pointreturn point

value of cvalue of cvalue of dvalue of d ESP ESP

Addressing using Stack Addressing using Stack PointerPointer

The local variables and arguments The local variables and arguments are addressed by using fixed offsets are addressed by using fixed offsets from the stack pointerfrom the stack pointerA is at [ESP+12]A is at [ESP+12]B is at [ESP+16]B is at [ESP+16]C is at [ESP+4]C is at [ESP+4]D is at [ESP]D is at [ESP]

Code for qCode for q

Code after the prologCode after the prolog

MOVMOV EAX, [ESP+12] EAX, [ESP+12] ; A; AADD EAX,4ADD EAX,4MOV [ESP+4], EAXMOV [ESP+4], EAX ; C; CPUSH [ESP+16]PUSH [ESP+16] ; B; BCALL ISQRTCALL ISQRTADD ESP, 4ADD ESP, 4MOV [ESP], EAXMOV [ESP], EAX ; D; DMOV EAX, [ESP+4]MOV EAX, [ESP+4] ; C; CADD EAX, [ESP]ADD EAX, [ESP] ; D; D

Epilog for –fomit-frame-Epilog for –fomit-frame-pointerpointer

We must remove the 8 bytes of local We must remove the 8 bytes of local parameters from the stack, so that parameters from the stack, so that ESP is properly set for the RET ESP is properly set for the RET instructioninstruction

ADD ESP,8ADD ESP,8 RETRET

Why not always use ESP?Why not always use ESP?

Problems with debuggingProblems with debuggingDebugger relies on hopping back frames Debugger relies on hopping back frames

using saved frame pointers (which form using saved frame pointers (which form a linked list of frames) to do back traces a linked list of frames) to do back traces etc.etc.

If code causes ESP to move then there If code causes ESP to move then there are difficultiesare difficultiesPush of parametersPush of parametersDynamic arraysDynamic arraysUse of allocaUse of alloca

Pushing ParametersPushing Parameters

Pushing parameters modifies ESPPushing parameters modifies ESPSometimes no problem, as in our Sometimes no problem, as in our

example here, since we undo the example here, since we undo the modification immediately after the modification immediately after the call.call.

But suppose we had called FUNC(B,B)But suppose we had called FUNC(B,B)We could not doWe could not do

PUSH [ESP+16]PUSH [ESP+16]PUSH [ESP+16]PUSH [ESP+16]

Since ESP is moved by the first PUSHSince ESP is moved by the first PUSH

More on ESP handlingMore on ESP handling

Once againOnce againPUSH [ESP+16]PUSH [ESP+16]

PUSH [ESP+16]PUSH [ESP+16]Would not work, but we can keep Would not work, but we can keep

track of the fact that ESP has moved track of the fact that ESP has moved and doand doPUSH [ESP+16]PUSH [ESP+16] ; Push B; Push B

PUSH [ESP+20]PUSH [ESP+20] ; Push B again; Push B againAnd that works fineAnd that works fine

More on ESP optimizationMore on ESP optimization

In the case of using the frame In the case of using the frame pointer, we were able to optimize to pointer, we were able to optimize to remove the add of ESP.remove the add of ESP.

Can we still do that?Can we still do that?Answer yes, but we have to keep Answer yes, but we have to keep

track of the fact that there is an track of the fact that there is an extra word on the stack, so ESP is 4 extra word on the stack, so ESP is 4 “off”.“off”.

Code with ESP optimizationCode with ESP optimization

Code after the prologCode after the prolog

MOVMOV EAX, [ESP+12] EAX, [ESP+12] ; A; AADD EAX,4ADD EAX,4MOV [ESP+4], EAXMOV [ESP+4], EAX ; C; CPUSH [ESP+16]PUSH [ESP+16] ; B; BCALL ISQRTCALL ISQRTMOV [ESP+4], EAXMOV [ESP+4], EAX ; D; DMOV EAX, [ESP+8]MOV EAX, [ESP+8] ; C; CADD EAX, [ESP+4]ADD EAX, [ESP+4] ; D; D

Last three references had to be modifiedLast three references had to be modified

Epilog for Optimized codeEpilog for Optimized code

We also have to modify the epilog in We also have to modify the epilog in this case, since now there are 12 this case, since now there are 12 bytes on the stack at the exit, 8 from bytes on the stack at the exit, 8 from the local parameters, and 4 from the the local parameters, and 4 from the push we did.push we did.

Epilog becomesEpilog becomes

ADD ESP,12ADD ESP,12 RETRET

But no instructions were addedBut no instructions were added

Other cases of ESP movingOther cases of ESP moving

Dynamic arrays allocated on the Dynamic arrays allocated on the local stack, whose size is not knownlocal stack, whose size is not known

Explicit call to allocaExplicit call to allocaHow alloca worksHow alloca works

Subtract given value from ESPSubtract given value from ESPReturn ESP value as pointer to new areaReturn ESP value as pointer to new area

These cases are fatalThese cases are fatalMUST use a frame pointer in these casesMUST use a frame pointer in these cases

Even better, More Even better, More optimizationoptimization

Let’s recall our example:Let’s recall our example: Int q (int a, int b) {Int q (int a, int b) {

int c; int c; int d; int d;

c = a + 4; c = a + 4; d = isqrt (b); d = isqrt (b); return c + d; return c + d;}}

We can rewrite this to avoid the use of the We can rewrite this to avoid the use of the local parameters c and d completely, and local parameters c and d completely, and the compiler can do the same thing.the compiler can do the same thing.

Optimized VersionOptimized Version

With some optimization, we can writeWith some optimization, we can writeInt q (int a, int b) {Int q (int a, int b) {

return isqrt (b) + a + 4; return isqrt (b) + a + 4;}}

We are not suggesting that the user We are not suggesting that the user have to rewrite the code this way, we have to rewrite the code this way, we want the compiler to do it automaticallywant the compiler to do it automatically

Optimizations We UsedOptimizations We Used

Commutative OptimizationCommutative OptimizationA + B = B + AA + B = B + A

Associative OptimizationAssociative OptimizationA + (B + C) = (A + B) + CA + (B + C) = (A + B) + C

For integer operands, these For integer operands, these optimizations are certainly valid (well optimizations are certainly valid (well see fine point on next slide)see fine point on next slide)

Floating-point is another matter!Floating-point is another matter!

A fine PointA fine Point

The transformation ofThe transformation of (A + B) + C to A + (B + C)(A + B) + C to A + (B + C)

Works fine in 2’s complement integer Works fine in 2’s complement integer arithmetic with no overflow, which is the arithmetic with no overflow, which is the code the compiler will generatecode the compiler will generate

But strictly at the C source level, B+C But strictly at the C source level, B+C might overflow, so at the source level this might overflow, so at the source level this transformation is not technically correcttransformation is not technically correct

But we are really talking about compiler But we are really talking about compiler optimizations anyway, so this does not optimizations anyway, so this does not matter.matter.

The optimized codeThe optimized code

Still omitting the frame pointer, we Still omitting the frame pointer, we now have the following modified now have the following modified code for the optimized functioncode for the optimized function

The prologThe prolog

(this slide intentionally blank (this slide intentionally blank )) No prolog code is necessary, we can use No prolog code is necessary, we can use

the stack exactly as it came to us:the stack exactly as it came to us: second argument (b)second argument (b)

first argument (a)first argument (a)return pointreturn point ESPESP

And address parameters off unchanged And address parameters off unchanged ESPESP A is at [ESP+4]A is at [ESP+4] B is at [ESP+8]B is at [ESP+8]

The body of the functionThe body of the function

Code after the (empty) prologCode after the (empty) prolog

PUSHPUSH [ESP+8][ESP+8] ; B; BCALLCALL ISQRTISQRTADDADD EAX, [ESP+8]EAX, [ESP+8] ; A; AADD ADD EAX, 4EAX, 4

Note that the reference to A was Note that the reference to A was adjusted to account for the extra 4 adjusted to account for the extra 4 bytes pushed on to the stack before bytes pushed on to the stack before the call to ISQRT.the call to ISQRT.

The epilogThe epilog

We pushed 4 bytes extra on to the We pushed 4 bytes extra on to the stack, so we need to pop them offstack, so we need to pop them off ADDADD ESP,4ESP,4

RETRET

And that’s it, only 6 instructions in all.And that’s it, only 6 instructions in all.Removing the frame pointer really Removing the frame pointer really

helped here, since it saved 3 helped here, since it saved 3 instructions and two memory instructions and two memory referencesreferences

Other advantages of omitting Other advantages of omitting FPFP

If we omit the frame pointer then we have If we omit the frame pointer then we have an extra registeran extra register

For the x86, going from 6 to 7 available For the x86, going from 6 to 7 available registers can make a real differenceregisters can make a real difference

Of course we have to save and restore EBP Of course we have to save and restore EBP to use it freelyto use it freely

But that may well be worth while in a long But that may well be worth while in a long function, anything to keep things in function, anything to keep things in registers and save memory references is a registers and save memory references is a GOOD THING!GOOD THING!

SummarySummary

Now you know what this gcc switch Now you know what this gcc switch doesdoes

But more importantly, if you But more importantly, if you understand what it does, you understand what it does, you understand all about frame pointers understand all about frame pointers and addressing of data in local and addressing of data in local frames.frames.