Upload
amos-guzman
View
47
Download
3
Embed Size (px)
DESCRIPTION
Deferred segment-loading. An exercise on implementing the concept of ‘load-on-demand’. The ‘do-it-later’ philosophy. Modern operating systems often follow a policy of deferring work whenever possible - PowerPoint PPT Presentation
Citation preview
Deferred segment-loading
An exercise on implementing the concept of ‘load-on-demand’
The ‘do-it-later’ philosophy
• Modern operating systems often follow a policy of deferring work whenever possible
• The advantage of adopting this practice is most evident in those cases where it turns out that the work was not needed after all
• Example: Many programs contain lots of code and data for diagnosing errors – but it’s not needed if no errors actually occur
Avoiding wasted effort
• Thus it will be more efficient if an OS does not always take time to load those portions of a program (such as its error-diagnostics and error-recovery routines) which may be unnecessary in the majority of situations
• But of course the OS needs to be ready to take a ‘timeout’ for loading those routines when and if the need becomes apparent
Another example
• In a multitasking environment, many tasks are taking turns at executing instructions
• The CPU typically performs task-switching several times every second – and must do a ‘save’ of the outgoing task’s context, and a ‘load’ of the incoming task’s context, any time it switches from one task to the next
• We ask: can any of this work be deferred?
The NPX registers
• Only a few tasks typically make any use of the Pentium’s ‘floating-point’ registers, so it’s wasteful to do a ‘save-and-reload’ for these registers with every task-switch
• The TS-bit (bit #3 in Control Register 0) is designed to assist an OS in implementing a policy of ‘lazy’ context-switching for the set of registers used in floating-point work
Example: effect of TS=1
• Each time the CPU performs a task-switch it automatically sets the TS-bit to 1 (only an OS can execute a ‘clts’ to reset TS=0)
• When any task tries to execute any of the NPX instructions (to do some arithmetic with values in the floating-point registers), an exception 7 fault will occur if the TS-bit hasn’t been cleared since a task-switch
The fault-7 exception-handler
• The work involved in saving the contents of the floating-point registers being used by a no-longer-active task, and reloading those registers with values that the active task expects to work on, can be deferred to the fault-handler for exception-7
• Then it can clear the TS-bit (with ‘clts’) and ‘retry’ the instruction that caused this ‘fault’
The ‘fork()’ system-call
• In a UNIX/Linux operating system, the way any new task get created is by a call to the kernel’s ‘fork()’ service-function
• This function is supposed to ‘duplicate’ the entire program-environment of the calling task (i.e., code, data, stack and heap, plus the kernel’s process-control data-structure
• But much of this work is often wasted!
The ‘fork-and-exec’ senario
• In practice, the most common reason for a program to ‘fork()’ a child-process is so the child-task can launch a separate program:
• In these cases the ‘duplicated’ code, data, and heap are not relevant to the new task -- and so they will simply get discarded!
if ( fork() == 0 ) execl( “newprog”, newargs, 0 );
‘loading-on-demand’
• An OS can avoid all the wasted effort of duplicating a parent-task’s resources (its code, data, heap, etc.) by implementing “only upon demand” loading as a policy
• For an OS that uses the CPU’s memory-segmentation capabilities, an ‘on demand’ policy can be implemented by using the Pentium ‘Segment-Not-Present’ exception
How it works
• Segments remain ‘uninitialized’ until they are actually accessed by an application
• Segment-descriptors are initially marked as ‘Not Present’ (i.e., their P-bit is zero)
• When any instruction attempts to access such a memory-segment (read, write, or fetch), the CPU responds by generating exception-11: “Segment-Not-Present”
An ‘error-code’ is pushed
• Besides pushing the memory-address of the faulting instruction onto the exception-handler’s stack, the CPU also pushes an ‘error-code’ to indicate which descriptor was not yet marked as being ‘Present’
• The handler can then ‘load’ that segment with the proper information and adjust its descriptor’s P-bit, then retry the instruction
Error-Code Format
EXT
IDT
reserved
31 15 3 2 1 0
table-indexTI
Legend: EXT = An external event caused the exception (1=yes, 0=no) IDT = table-index refers to Interrupt Descriptor Table (1=yes, 0=no) TI = The Table Indicator flag, used when IDT=0 (1=GDT, 0=LDT)
This same error-code format is used with exceptions 0x0B, 0x0C, and 0x0D
Our ‘simulation’ demo
• We can illustrate the ‘just-in-time’ idea by writing a program that performs a ‘far’ call to an ‘uninitialized’ region of memory:
• The code-segment descriptor (referenced here by the selector-value ‘sel_CS’) will be initially marked ‘Not-Present’ (so this ‘lcall’ instruction will trigger an exception-11)
lcall $sel_CS, $draw_message
Our ‘fault-handler’
• Our Interrupt-Service-Routine for fault-11 will do two things:
• Initialize the memory-region with code and data• Mark the code-segment’s descriptor as ‘Present’
• It will carefully preserve the CPU registers, so that it can ‘retry’ the faulting instruction
Where is the ‘error-code’?
FLAGS
CS
IP
error-codeSS:SP
16-bits
Layout of our fault-handler’s stack(because we used a 286 interrupt-gate)
+0
+2
+4
+6
The Pentium provides a special pair of instructions that procedures can use to address any parameter-values that reside on its stack:
‘enter’ and ‘leave’
Code using ‘enter’ and ‘leave’
isrNPF: # Our fault-handler for exception-0x0B
enter $0, $0 # setup stackframe access
call initialize_the_high_arenacall mark_segment_as_ready
leave # discard the frame accessadd $2, %sp # discard the error-codeiret # ‘retry’ the faulting instruction
What does ‘enter’ do?
• The effect of the single instruction
enter $0, $0
is equivalent to this instruction-sequence:
push %bpmov %sp, %bp
How the stack is changed
FLAGS
CS
IP
error-codeSS:SP
16-bits
Layout of our fault-handler’s stack BEFORE executing ‘enter’
+0
+2
+4
+6 FLAGS
CS
IP
error-code
SS:SP
16-bits
Layout of our fault-handler’s stack AFTER executing ‘enter’
+2
+4
+6
+8
old-BP SS:BP
NOTE: Any memory-references that use indirect addressing via register BP will use the SS segment-register by default (not the DS segment-register)
for example: testw $0x0007, 2(%bp)
What does ‘leave’ do?
• The effect of the single instruction
leave is
equivalent to this instruction-sequence:
mov %bp, %sppop %bp
How the stack is changed
FLAGS
CS
IP
error-code
SS:SP
16-bits
Layout of our fault-handler’s stack BEFORE executing ‘leave’
+2
+4
+6
+8
old-BP SS:BP…
other pushedwords
FLAGS
CS
IP
error-codeSS:SP
16-bits
Layout of our fault-handler’s stack AFTER executing ‘leave’
+0
+2
+4
+6
So the effect of ‘leave’ is to undo the effect of ‘enter’
Our demo’s memory-layout
ARENA #3(not used by this demo)
ARENA #2(where our demo expects drawing code will reside)
ARENA #1(where the loader puts ourprogram code and data)
BOOT_LOCN
0x00000000
0x00007C00
0x00010000
0x00020000
0x00030000
Copy contents of ARENA #1 to ARENA #2
Efficient copying
• We use the Pentium’s ‘rep movsw’ instruction to perform memory-to-memory copying operations
• The segment-selector for the segment we copy from (it must be ‘readable’) goes into registers DS, and the segment-selector for the segment we copy to (it must be ‘writable’) goes into ES
• The number of words we will copy should match the size of our code-segment (which is 64KB)
• The Direction-Flag should be cleared (DF=0)
Example assembly code
cld ; use ‘forward’ string-copying
mov $sel_ds, %si ; selector for arena at 0x10000mov %si, %ds ; goes in segment-register DSxor %si, %si ; start copying from offset zero
mov $sel_DS, %di ; selector for arena at 0x20000mov %di, %es ; goes in segment-register DSxor %di, %di ; start copying to offset zero
mov $0x8000, %cx ; number of words to be copiedrep movsw ; perform the arena-copying
Segment-Descriptor Format
Base[31..24] G DRSV
AVL
Limit[19..16]
PDPL
S XC/D
R/
WA Base[23..16]
Base[15..0] Limit[15..0]
63 32
31 0
47
The segment-descriptor’s ‘Present’ bit is bit-number 47
In-class exercise
• To get some practical ‘hands on’ experience with implementing the demand-loading concept we suggest the following exercise:
Modify our ‘notready.s’ demo so that it uses a 32-bit Interrupt-Gate for its Segment-Not-Present entryin the Interrupt Descriptor Table (this will affect thelayout of the fault-handler’s stack)
• You may need to abandon use of the ‘enter’ and ‘leave’ instructions unless you also use a 32-bit data-segment descriptor for your stack-segment