Deferred segment-loading

Deferred segment-loading

An exercise on implementing the concept of ‘load-on-demand’

The ‘do-it-later’ philosophy

• Modern operating systems often follow a policy of deferring work whenever possible

• The advantage of adopting this practice is most evident in those cases where it turns out that the work was not needed after all

• Example: Many programs contain lots of code and data for diagnosing errors – but it’s not needed if no errors actually occur

Avoiding wasted effort

• Thus it will be more efficient if an OS does not always take time to load those portions of a program (such as its error-diagnostics and error-recovery routines) which may be unnecessary in the majority of situations

• But of course the OS needs to be ready to take a ‘timeout’ for loading those routines when and if the need becomes apparent

Another example

• In a multitasking environment, many tasks are taking turns at executing instructions

• The CPU typically performs task-switching several times every second – and must do a ‘save’ of the outgoing task’s context, and a ‘load’ of the incoming task’s context, any time it switches from one task to the next

• We ask: can any of this work be deferred?

The NPX registers

• Only a few tasks typically make any use of the Pentium’s ‘floating-point’ registers, so it’s wasteful to do a ‘save-and-reload’ for these registers with every task-switch

• The TS-bit (bit #3 in Control Register 0) is designed to assist an OS in implementing a policy of ‘lazy’ context-switching for the set of registers used in floating-point work

Example: effect of TS=1

• Each time the CPU performs a task-switch it automatically sets the TS-bit to 1 (only an OS can execute a ‘clts’ to reset TS=0)

• When any task tries to execute any of the NPX instructions (to do some arithmetic with values in the floating-point registers), an exception 7 fault will occur if the TS-bit hasn’t been cleared since a task-switch

The fault-7 exception-handler

• The work involved in saving the contents of the floating-point registers being used by a no-longer-active task, and reloading those registers with values that the active task expects to work on, can be deferred to the fault-handler for exception-7

• Then it can clear the TS-bit (with ‘clts’) and ‘retry’ the instruction that caused this ‘fault’

The ‘fork()’ system-call

• In a UNIX/Linux operating system, the way any new task get created is by a call to the kernel’s ‘fork()’ service-function

• This function is supposed to ‘duplicate’ the entire program-environment of the calling task (i.e., code, data, stack and heap, plus the kernel’s process-control data-structure

• But much of this work is often wasted!

The ‘fork-and-exec’ senario

• In practice, the most common reason for a program to ‘fork()’ a child-process is so the child-task can launch a separate program:

• In these cases the ‘duplicated’ code, data, and heap are not relevant to the new task -- and so they will simply get discarded!

if ( fork() == 0 ) execl( “newprog”, newargs, 0 );

‘loading-on-demand’

• An OS can avoid all the wasted effort of duplicating a parent-task’s resources (its code, data, heap, etc.) by implementing “only upon demand” loading as a policy

• For an OS that uses the CPU’s memory-segmentation capabilities, an ‘on demand’ policy can be implemented by using the Pentium ‘Segment-Not-Present’ exception

How it works

• Segments remain ‘uninitialized’ until they are actually accessed by an application

• Segment-descriptors are initially marked as ‘Not Present’ (i.e., their P-bit is zero)

• When any instruction attempts to access such a memory-segment (read, write, or fetch), the CPU responds by generating exception-11: “Segment-Not-Present”

An ‘error-code’ is pushed

• Besides pushing the memory-address of the faulting instruction onto the exception-handler’s stack, the CPU also pushes an ‘error-code’ to indicate which descriptor was not yet marked as being ‘Present’

• The handler can then ‘load’ that segment with the proper information and adjust its descriptor’s P-bit, then retry the instruction

Error-Code Format

EXT

IDT

reserved

31 15 3 2 1 0

table-indexTI

Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT)

This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D

Our ‘simulation’ demo

• We can illustrate the ‘just-in-time’ idea by writing a program that performs a ‘far’ call to an ‘uninitialized’ region of memory:

• The code-segment descriptor (referenced here by the selector-value ‘sel_CS’) will be initially marked ‘Not-Present’ (so this ‘lcall’ instruction will trigger an exception-11)

lcall $sel_CS, $draw_message

Our ‘fault-handler’

• Our Interrupt-Service-Routine for fault-11 will do two things:

• Initialize the memory-region with code and data• Mark the code-segment’s descriptor as ‘Present’

• It will carefully preserve the CPU registers, so that it can ‘retry’ the faulting instruction

Where is the ‘error-code’?

FLAGS

CS

IP

error-codeSS:SP

16-bits

Layout of our fault-handler’s stack(because we used a 286 interrupt-gate)

+0

+2

+4

+6

The Pentium provides a special pair of instructions that procedures can use to address any parameter-values that reside on its stack:

‘enter’ and ‘leave’

Code using ‘enter’ and ‘leave’

isrNPF: # Our fault-handler for exception-0x0B

enter $0, $0 # setup stackframe access

call initialize_the_high_arenacall mark_segment_as_ready

leave # discard the frame accessadd $2, %sp # discard the error-codeiret # ‘retry’ the faulting instruction

What does ‘enter’ do?

• The effect of the single instruction

enter $0, $0

is equivalent to this instruction-sequence:

push %bpmov %sp, %bp

How the stack is changed

FLAGS

CS

IP

error-codeSS:SP

16-bits

Layout of our fault-handler’s stack BEFORE executing ‘enter’

+0

+2

+4

+6 FLAGS

CS

IP

error-code

SS:SP

16-bits

Layout of our fault-handler’s stack AFTER executing ‘enter’

+2

+4

+6

+8

old-BP SS:BP

NOTE: Any memory-references that use indirect addressing via register BP will use the SS segment-register by default (not the DS segment-register)

for example: testw $0x0007, 2(%bp)

What does ‘leave’ do?

• The effect of the single instruction

leave is

equivalent to this instruction-sequence:

mov %bp, %sppop %bp

How the stack is changed

FLAGS

CS

IP

error-code

SS:SP

16-bits

Layout of our fault-handler’s stack BEFORE executing ‘leave’

+2

+4

+6

+8

old-BP SS:BP…

other pushedwords

FLAGS

CS

IP

error-codeSS:SP

16-bits

Layout of our fault-handler’s stack AFTER executing ‘leave’

+0

+2

+4

+6

So the effect of ‘leave’ is to undo the effect of ‘enter’

Our demo’s memory-layout

ARENA #3(not used by this demo)

ARENA #2(where our demo expects drawing code will reside)

ARENA #1(where the loader puts ourprogram code and data)

BOOT_LOCN

0x00000000

0x00007C00

0x00010000

0x00020000

0x00030000

Copy contents of ARENA #1 to ARENA #2

Efficient copying

• We use the Pentium’s ‘rep movsw’ instruction to perform memory-to-memory copying operations

• The segment-selector for the segment we copy from (it must be ‘readable’) goes into registers DS, and the segment-selector for the segment we copy to (it must be ‘writable’) goes into ES

• The number of words we will copy should match the size of our code-segment (which is 64KB)

• The Direction-Flag should be cleared (DF=0)

Example assembly code

cld ; use ‘forward’ string-copying

mov $sel_ds, %si ; selector for arena at 0x10000mov %si, %ds ; goes in segment-register DSxor %si, %si ; start copying from offset zero

mov $sel_DS, %di ; selector for arena at 0x20000mov %di, %es ; goes in segment-register DSxor %di, %di ; start copying to offset zero

mov $0x8000, %cx ; number of words to be copiedrep movsw ; perform the arena-copying

Segment-Descriptor Format

Base[31..24] G DRSV

AVL

Limit[19..16]

PDPL

S XC/D

R/

WA Base[23..16]

Base[15..0] Limit[15..0]

63 32

31 0

47

The segment-descriptor’s ‘Present’ bit is bit-number 47

In-class exercise

• To get some practical ‘hands on’ experience with implementing the demand-loading concept we suggest the following exercise:

Modify our ‘notready.s’ demo so that it uses a 32-bit Interrupt-Gate for its Segment-Not-Present entryin the Interrupt Descriptor Table (this will affect thelayout of the fault-handler’s stack)

• You may need to abandon use of the ‘enter’ and ‘leave’ instructions unless you also use a 32-bit data-segment descriptor for your stack-segment

Documents

Deferred segment-loading