Upload
damon-conley
View
219
Download
0
Embed Size (px)
Citation preview
L6: Malloc LabWriting a Dynamic Storage Allocator
October 30, 2006
L6: Malloc LabWriting a Dynamic Storage Allocator
October 30, 2006TopicsTopics
Memory Allocator (Heap) L6: Malloc Lab
RemindersReminders L6: Malloc Lab Due Nov 10, 2006
Section A (Donnie H Kim) recitation8.ppt (some slides from lecture notes)
15-213“The course that gives CMU its Zip!”
– 2 – 15-213, F’06
L6: Malloc Lab L6: Malloc Lab
Things that matter in this labThings that matter in this lab:: Performance goal
Maximizing throughputMaximizing memory utilization
Implementation Issues (Design Space)Free Block OrganizationPlacement PolicySplittingCoalescing
And some advice
– 3 – 15-213, F’06
Some sort of useful backgroundsSome sort of useful backgrounds
– 4 – 15-213, F’06
So what is memory allocation?So what is memory allocation?
kernel virtual memory
Memory mapped region forshared libraries
run-time heap (via malloc)
program text (.text)
initialized data (.data)
uninitialized data (.bss)
stack
0
%esp
memory invisible to user code
the “brk” ptr
Allocators requestadditional heap memoryfrom the operating system using the sbrk function.
– 5 – 15-213, F’06
Malloc PackageMalloc Package
#include <stdlib.h>#include <stdlib.h>
void *malloc(size_t size)void *malloc(size_t size) If successful:
Returns a pointer to a memory block of at least size bytes, (typically) aligned to 8-byte boundary.
If size == 0, returns NULL If unsuccessful: returns NULL (0) and sets errno.
void free(void *p)void free(void *p) Returns the block pointed at by p to pool of available memory p must come from a previous call to malloc or realloc.
void *realloc(void *p, size_t size)void *realloc(void *p, size_t size) Changes size of block p and returns pointer to new block. Contents of new block unchanged up to min of old and new size.
– 6 – 15-213, F’06
Allocation ExamplesAllocation Examplesp1 = malloc(4)
p2 = malloc(5)
p3 = malloc(6)
free(p2)
p4 = malloc(2)
– 7 – 15-213, F’06
Performance goalsPerformance goals
Maximizing throughput (Temporal)Maximizing throughput (Temporal) Defined as the number of requests that it completes per unit
time
Maximizing Memory Utilization (Spatial)Maximizing Memory Utilization (Spatial) Defined as the ratio of the requested memory size and the
actual memory size used
There is a tension between maximizing throughput and utilization! Find an appropriate balance between two goals!
Keep this in mind, we will come back to these issues Keep this in mind, we will come back to these issues
– 8 – 15-213, F’06
Implementation IssuesImplementation Issues Free Block OrganizationFree Block Organization
How do we keep track of the free blocks? How do we know how much memory to free just given a pointer?
Placement PolicyPlacement Policy How do we choose an appropriate free block?
SplittingSplitting What do we do with the extra space when allocating a structure that
is smaller than the free block it is placed in?
CoalescingCoalescing How do we reinsert freed block?
p1 = malloc(1)
p0
free(p0)
– 9 – 15-213, F’06
Implementation Issues 1: Free Block Organization
Implementation Issues 1: Free Block Organization
Identifying which block is free or allocatedIdentifying which block is free or allocated Available design choices of how to manage free blocks
Implicit List
Explicit List
Segregated List
Header, Footer organization storing information about the block (size, allocated, freed)
– 10 – 15-213, F’06
Keeping Track of Free BlocksKeeping Track of Free Blocks
Method 1Method 1: : Implicit listImplicit list using lengths -- links all blocks using lengths -- links all blocks
Method 2Method 2: : Explicit listExplicit list among the free blocks using among the free blocks using pointers within the free blockspointers within the free blocks
Method 3Method 3: : Segregated free listSegregated free list Different free lists for different size classes
Method 4Method 4: Blocks sorted by size: Blocks sorted by size Can use a balanced tree (e.g. Red-Black tree) with pointers
within each free block, and the length used as a key
5 4 26
5 4 26
– 11 – 15-213, F’06
Free Block Organization Free Block Organization
Free Block with headerFree Block with header
size
1 word
Format ofallocated andfree blocks
payload
a = 1: allocated block a = 0: free block
size: block size
payload: application data(allocated blocks only)
a
optionalpadding
– 12 – 15-213, F’06
Free Block Organization Free Block Organization
Free Block with Header and FooterFree Block with Header and Footer
size
Format ofallocated andfree blocks
payload andpadding
a = 1: allocated block a = 0: free block
size: total block size
payload: application data(allocated blocks only)
a
size aBoundary tag (footer)
Header
– 13 – 15-213, F’06
Implementation Issues 2: Placement Policy
Implementation Issues 2: Placement Policy
““Placement Policy” choicesPlacement Policy” choices First Fit
Search free list from the beginning and chose the first free block
Next FitStarts search where the previous search has left off
Best FitExamine every free block to find the best free block
– 14 – 15-213, F’06
Implementation Issues 3: Splitting
Implementation Issues 3: Splitting
““Splitting” Design choicesSplitting” Design choices Using the entire free block
Simple, fast Introduces internal fragmentation (good placement policy might
reduce this)
SplittingSplit free block into two parts, when second part can be used
for other requests (reduces internal fragmentation)
p1 = malloc(1)
– 15 – 15-213, F’06
Implementation Issues 4: Coalescing
Implementation Issues 4: Coalescing
False FragmentationsFalse Fragmentations Free block chopped into small, unusable free blocks
Coalesce adjacent free blocks to get bigger free block
Coalescing - Policy decision of when to perform coalescingCoalescing - Policy decision of when to perform coalescing Immediate coalescing
Merging any adjacent blocks each time a block is freed
Deferred coalescing Merging free blocks some time later
» Ex) when allocation request fails.
Trying “Bidirectional Immediate Coalescing” proposed by Donald Knuth would be good enough for this lab
– 16 – 15-213, F’06
Performance goalsPerformance goals
Maximizing throughput (Temporal)Maximizing throughput (Temporal) Defined as the number of requests that it completes per unit
time
Maximizing Memory Utilization (Spatial)Maximizing Memory Utilization (Spatial) Defined as the ratio of the requested memory size and the
actual memory size used
There is a tension between maximizing throughput and utilization! Find an appropriate balance between two goals!
– 17 – 15-213, F’06
Performance goal (1) - ThroughputPerformance goal (1) - Throughput
Throughput is mostly determined by time consumed to search Throughput is mostly determined by time consumed to search free blockfree block
How you keep track of your free block affects search timeHow you keep track of your free block affects search time Naïve allocator
Never frees block, just extend the heap when you need a new block : throughput is extremely fast, but…?
Implicit Free List The allocator can indirectly traverse the entire set of free blocks by
traversing all of the blocks in the heap, definitely slow.
Explicit Free List The allocator can directly traverse entire set of free blocks by
traversing all of the free blocks in the heap
Segregated Free List The allocator can directly traverse a particular free list to find an
appropriate free block
– 18 – 15-213, F’06
Performance goal (2) – Memory Utilization
Performance goal (2) – Memory Utilization
Poor memory utilization caused by Poor memory utilization caused by fragmentationfragmentation Comes in two forms: internal and external fragmentation Internal Fragmentation
Based on previous requestsCauses
» Allocator impose minimal size of block (depending on allocator’s choice of block format)
» Satisfying alignment requirements
External FragmenatationBased on future requestsAggregate free memory is enough, but no single free block is
large enough to handle the request
– 19 – 15-213, F’06
Internal FragmentationInternal Fragmentation
Internal fragmentationInternal fragmentation For some block, internal fragmentation is the difference
between the block size and the payload size.
Caused by overhead of maintaining heap data structures, padding for alignment purposes, or explicit policy decisions (e.g., not to split the block).
Depends only on the pattern of previous requests, and thus is easy to measure.
payloadInternal fragmentation
block
Internal fragmentation
– 20 – 15-213, F’06
External FragmentationExternal Fragmentation
p1 = malloc(4)
p2 = malloc(5)
p3 = malloc(6)
free(p2)
p4 = malloc(6)oops!
Occurs when there is enough aggregate heap memory, but no singlefree block is large enough
External fragmentation depends on the pattern of future requests, andthus is difficult to measure.
– 21 – 15-213, F’06
The Malloc LabThe Malloc Lab
– 22 – 15-213, F’06
AssumptionsAssumptions
Assumptions made in Assumptions made in Malloc LabMalloc Lab Standard C library malloc always returns payload pointer
that is aligned to 8 bytes, so should yours 64-bit Architecture
pointers are 8 bytes long!size_t is now 8 bytes (unsigned long)
But the requested size will be less than 4 bytesYou may use 4 byte headers and footers and get away
Allocated block(4 words)
Free block(2 words)
Free word
Allocated word
– 23 – 15-213, F’06
Porting to 64-bit Machine Porting to 64-bit Machine
Porting the code in your CS:APP text book to 64-bitPorting the code in your CS:APP text book to 64-bit sizeof(long) == 4 // 32-bit sizeof(long) == 8 // 64-bit The only significant difference is in the definitions of the GET
and PUT macros. Changes (To keep our 32-bit header and footers)
#define GET(p) (*(size_t *)(p)) // 32 bits » #define GET(p) (*(unsigned int *)(p)) // 64 bits
#define PUT(p, val) (*(size_t *)(p) = (val)) // 32 bits» #define PUT(p, val) (*(unsigned int *)(p) = (val)) // 64 bits
if ((long)(bp = mem_sbrk(size)) < 0) » if ((int)(bp = mem_sbrk(size)) < 0)
– 24 – 15-213, F’06
Using MACROS – why?Using MACROS – why?#include <stdio.h>
#define GET8(p) (*(unsigned long *)(p))
#define PUT8(p, val) (*(unsigned long *)(p) = (unsigned long)(val))
void test(void *p, void *pval){
unsigned long *newpval;
/* Reading and writing pointers the hard way */
*(unsigned long *)p = (unsigned long) pval;
newpval = (unsigned long *)(*(unsigned long *)p);
printf("pval=%p newpval=%p\n", pval, newpval);
/* Reading and writing pointers the easy way */
PUT8(p, pval);
newpval = (unsigned long *) GET8(p);
printf("pval=%p newpval=%p\n", pval, newpval);
}
int main() {
char *pval = (char *)0x99;
char buf[128];
test(&buf[0], pval);
return 0;
}
– 25 – 15-213, F’06
Approach Advice Approach Advice 1. Start with the implicit list implementation in your text book, and
understand every details of it
2. When you finish your implicit list, start thinking about your heap checker The more time you spend on this, the more time you will save later
3. Go on and start implementing explicit list with several placement policies Modulate, and save each of your placement policy for comparison
4. When you finish your explicit list, you would like to add more checks in your heap checker, do this right away.
5. Now when you feel your explicit list is robust, move on to the segregated free list. We are looking for a good segregated free list implementation. You can
go further by trying other schemes such as balanced trees, but a solid segregated free list implementation is good enough for a full credit
6. You can also try some tweaks on the given trace files
– 26 – 15-213, F’06
Heap Checker (10 pts)Heap Checker (10 pts)
Basic Checks Guidelines (5/10 pts)Basic Checks Guidelines (5/10 pts) Check Heap (while working on implicit list)
1. Check epilogue and prologue blocks
2. Block’s address alignment (8 bytes)
3. Heap boundaries
4. Check your blocks’ header and footer Size (minimum size , alignment) prev/next allocate/free bit consistency (explicit list) header and footer matching each other
5. Check your coalescing
» All blocks are coalesced correctly (no two consecutive free blocks in the heap)
– 27 – 15-213, F’06
Heap Checker (10 pts)Heap Checker (10 pts)
Free List Checks Guidelines (5/10 pts)Free List Checks Guidelines (5/10 pts) Check Free List (while working on explicit free list)
1. All next/prev pointers are consistent (If A’s next pointer points to B, B’s prev pointer should point to A)
2. All free list pointers points between mem_heap_lo() and mem_heap_high()
3. Count free blocks by iterating every block, and traversing free list by pointers, see if they match
4. Recommended to add more as you wish
Check Segregated Free List (segregated free list) All blocks in each list bucket fall within bucket size range Be creative
– 28 – 15-213, F’06
Style (10 pts)Style (10 pts)
It will be some of the most difficult and sophisticated code you have
written so far in your career.
Thing we are looking for:
Explain your high level design at front of your code (2 pts)
Each function should be prepared by a header comment (2 pts)
Comment properly inside each functions (2 pts)
Decompose into functions and use as few global variables as possible (2 pts)
Use macros, inline functions, C preprocessors wisely (2 pts)
Please try to write a clean code that is readable and self-explaining! For you For your Teaching Staff And for world peace
– 29 – 15-213, F’06
Debugging Techniques Debugging Techniques
Guidelines for DebuggingGuidelines for Debugging Intensively testing your code even though it seems to work
is a good programming practice, try to learn the process from this lab
You can print out all the information and monitor itDo this when you just startedWhen the trace file is small
You can also print out error messages only when something is wrong
Printing and monitoring becomes painful when trace files are huge
Just print errors
– 30 – 15-213, F’06
Debugging Tips Debugging Tips
Guidelines for using mdriver’s optionsGuidelines for using mdriver’s options Use ./mdriver –c <file> option to run a particular trace file
just once, which only checks correctness ./mdriver runs your allocator multiple times to estimate the
throughput of your allocator by using k-best measurement scheme (if you are interested, refer to ch 9 and mdriver source code)
Use ./mdriver –v <level> option to set verbosity level It is sometimes useful to have layers of debugging depthCan also use #define, #ifdef, #if
Make sure to turn all checking routines off completely when measuring performance – it does affect performance
– 31 – 15-213, F’06
More Hints?More Hints?
Going further (beyond solid segregated list)Going further (beyond solid segregated list) Before trying this, make sure your allocator is doing what
you intended, using heap/free list checkers If you think you have implemented a solid segregated free
list, try focus on trace files that gives you less performance results
– 32 – 15-213, F’06
More Hints?More Hints?
Some possible tackle pointsSome possible tackle points In malloc(), you have to adjust the requested size to meet
alignment requirements or minimum block size requirements It turns out that how you adjust size affects the performance of
some trace filesAnd sometimes it is better to force your allocator to avoid
splitting the free block by using larger block than the request size
» It will obviously increase internal fragmentation, but can also increase throughput by avoiding repeated splitting and coalescing
How large will you extend your heap, when you have to extend your heap?
How do you classify each free list?
– 33 – 15-213, F’06
Questions ?Questions ?