Upload
amberly-carpenter
View
223
Download
0
Embed Size (px)
Citation preview
Digital UNIX Internals II 4 - 1 Buffer Caches
Buffer Caches
Chapter Four
Digital UNIX Internals II 4 - 2 Buffer Caches
File System I/O Using a Cache
buffer
user process
buffer
kernel
in memory cacheOn-disk Data
read/write
buffer
user process
mmap
Digital UNIX Internals II 4 - 3 Buffer Caches
Process Reading One Byte
read ( ... ,1)
user process
kernel
A Buffer
Digital UNIX Internals II 4 - 4 Buffer Caches
File System Caches and I/O
• Read-ahead– When a file system notices a file being read sequentially, it
can order the physical read of the next block(s) before the application actually requests them.
• Write-behind– Data blocks do not have to be immediately written to disk.
File systems can cluster together writes to contiguous disk blocks to improve performance.
Digital UNIX Internals II 4 - 5 Buffer Caches
File System Caches in Digital UNIX
• (Traditional BSD UNIX) Buffer Cache– From BSD– Fixed pool of physical memory
• Unified Buffer Cache– Similar to SunOS and SVR4– Flexible pool of physical memory– Supports memory mapping
Digital UNIX Internals II 4 - 6 Buffer Caches
Example: UFS uses both
v_type = VDIR
v_object
v_cleanblkhd
v_dirtyblkhd
vnode
v_type = VREG
v_object
v_cleanblkhd
v_dirtyblkhd
vnode
vm_object
vo_vp
vo_cleanpl
vo_cleanwpl
vo_dirtywpl
ob_memq
vm_page
vm_vp_object
buf
Digital UNIX Internals II 4 - 7 Buffer Caches
Traditional Buffer Cache
• Pool of Memory– Allocated at boot time– Shared with no other subsystem or allocator
• Buffer Structures– Links into access hash chain, LRU and same vnode lists– Device containing buffer– Pointer to vnode– Logical block in vnode– Pointer to routine called when I/O is done
• Linked lists of Buffers– Hash chain bucket, LOCKED, LRU, AGE and EMPTY lists
Digital UNIX Internals II 4 - 8 Buffer Caches
struct buf
b_flags
b_forw, b_back
av_forw, av_back
b_blockf, b_blockb
b_bufsizeb_bcount
b_dev b_error
b_un
b_lblkno, b_blknob_residb_proc
b_hash_chainb_iodone()b_pagelist
b_vp, b_rvp
b_rcred, b_wcred
b_dirtyoff, b_dirtyend
b_iocompleteb_lock
driver fields
buf bufHash list
buf bufQueue
buf bufVnode buffer list
Buffer
proc
buf
vm_page
vnode
ucred
Credentials
Head of hash lst
Digital UNIX Internals II 4 - 9 Buffer Caches
Buffer Cache Lists
bufhd
bufhash
buf buf
buf
bufbuf
buf
bufbufbuf
buf
buf
buf
bfreelist[0]LOCKED
bfreelist[1]LRU
bfreelist[2]AGE
bfreelist[3]EMPTY
BufferMemory Pages
Digital UNIX Internals II 4 - 10 Buffer Caches
To Find a Buffer
1. Calculate hash index using disk block number (b_blkno) and vnode (b_vp) (see BUFHASH macro in /sys/include/sys/buf.h).
2. Index into the hash list.
3. Follow hash pointer to buf structure in queue.
4. Identify the correct buf structure using vnode and block numbers.
5. If no match, follow hash pointer (b_forw) to next buf structure in queue.
6. If you get to the end of the list (wraps back to beginning) without finding the buf structure, it does not exist; allocate a new one from the free list.
Digital UNIX Internals II 4 - 11 Buffer Caches
Getting a Buffer
bread()
getnewbuf()
getblk()
VOP_STRATEGY()
allocbuf()
Digital UNIX Internals II 4 - 12 Buffer Caches
UBC - Unified Buffer Cache(1)
• Motivation– File Systems and Virtual Memory (Process Management)
compete for physical memory.– UBC unifies previously separate pools of physical memory.– Available Memory can be used by File Systems (UBC) or
VM on a first come first serve basis.– VM can memory map a file using same memory object as
UBC.
• Utilizes memory from the available pool– vm_page_queue_free– vm_page_array
Digital UNIX Internals II 4 - 13 Buffer Caches
Unified Buffer Cache (2)
• Uses memory objects of type OT_UBC– includes a pointer to a vnode – associates cached pages with a specific file– accessed by
• a file system looking for cached data
• memory management on pagefault for an mmap’d file
• Utilizes lists;– vm_page_buckets to find vm_pages belonging to an object– ubc_lru to time order when pages were cached
Digital UNIX Internals II 4 - 14 Buffer Caches
UBC Memory Object (OT_UBC)struct vm_ubc_object
ob_ref_count
ob_res_count
ob_size
ob_resident_pages
ob_flags
ob_memq
<lock>
ob_ops = u_anon_oopvm_object_ops
ob_type
vm_page
vu_cleanpl
vu_cleanwpl
vu_dirtywpl
vu_ops
vu_vfp
vu_wirecnt
vu_object
vu_nsequential
vu_loffset
vu_stamp
vu_seglock
vu_seglist
vfs_ubcops
vu_pshared
vu_freelists
Digital UNIX Internals II 4 - 15 Buffer Caches
UBC LRU Page Queue
• Least recently used list of UBC pages– One per memory affinity domain
• vm_mads[N].md_ubc.ubc_lru
• Each is a struct vm_page – vm_page -> vm_ubc_object -> vnode
• For each vnode's VM object,– clean page list– clean wired page list– dirty page list– dirty wired page list
Digital UNIX Internals II 4 - 16 Buffer Caches
UBC Routines (1)
Routine Functionubc_object_allocate() Allocates a vm_ubc_object if the vnode is
a regular type and one has not already been allocated.
ubc_object_free() Frees the vm_ubc_object when the vnode is about to be reused.
ubc_page_lookup() Looks up the page at the specified offset and specified vm_vp_object.
ubc_incore() Looks for resident pages in the specified range.
ubc_page_alloc() Allocates a page or returns a found page in the page hash list.
ubc_page_release() Releases a page to the UBC LRU list or system memory if possible.
Digital UNIX Internals II 4 - 17 Buffer Caches
UBC Routines (2)
Routine Functionubc_lookup() Performs a hash search lookup on the page
at the specified offset. If found, removes the page from the ubc_lru list and holds it.
ubc_page_dirty() Transitions a page from the vnode's clean page list to its dirty page list.
ubc_msync() Calls for mmap to free all clean pages and writes all dirty pages.
ubc_invalidate() Invalidates some (or all) resident pages for a vnode.
ubc_flush_dirty() Starts I/O on all dirty pages for a vnode. Does not wait for I/O completion if flag B_ASYNC is used.
Digital UNIX Internals II 4 - 18 Buffer Caches
UBC Routines (3)
Routine Function
ubc_dirty_kluster() Creates a list of sorted pages for a vnode. Assumes pages are scheduled for writing.
ubc_bufalloc() Allocates a buf structure.
ubc_sync_iodone() Waits for synchronous I/O transfer to complete, then frees buf and pages.
ubc_async_iodone_lwc() Called as LWC when asyncronous I/O transfer completes.
Digital UNIX Internals II 4 - 19 Buffer Caches
File System and VM Routines
System Callread()write()
VFSVOP_READVOP_WRITE
File Systemufs_read()ufs_write()
uiomove()
UBCResidentPageManagement
ufs_getpage()returns
VM page
mmapPage Fault Handler
I/O
Digital UNIX Internals II 4 - 20 Buffer Caches
Finding a UBC page from a file system
VOP_READ(vnode, ...)
ufs_read(vnode, ...)
ufs_getpage(vnode, ...)
ufs_getapage(vnode,...)
ubc_lookup(vnode, ...)
vm_page_lookup(mem_obj, ..)
Digital UNIX Internals II 4 - 21 Buffer Caches
Limiting UBC
• ubc_dirty_thread– Calls ubc_memory_flushdirty
• Launders excessive dirty pages via calls to FSOP_PUTPAGE()
• vm_pageout thread (pageout daemon)– Runs vm_pageout_loop()– When number of free pages is low and UBC has borrowed to many
pages,• UBC pages are reclaimed off ubc_lru
• If no free pages, vm_page_alloc() may also come to ubc_lru.
Digital UNIX Internals II 4 - 22 Buffer Caches
ubc_memory_purge() Flow
Start
Get ubc_lru page
Free the page
Referencedbit on?
Yes
No
Yes
NoFreed enough?
Yes
Dirty?
Move page from vm_vp_obectdirty list to clean list
Write the page out (VOP_PUTPAGE())asynchronously
No
Turn off and moveto tail of bc_lru
Stop
Digital UNIX Internals II 4 - 23 Buffer Caches
Limiting the Amount of Dirty Data in UBC
• UBC limits the percent of its cached data that is modified– improves performance by spreading out IO load– minimizes loss of data if system crash
• Managed by separate kernel daemon thread
Digital UNIX Internals II 4 - 24 Buffer Caches
ubc_dirty_thread_loop() FlowStart
Sleep on timer
Remove page from ubc_lru
Too manydirty pages
YesNo
Get ubc_lru_page
Yes
NoToo manydirty pages
Yes
Dirty
Move page from vm_vp_obectdirty list to clean list
Write the page out (VOP_PUTPAGE())asynchronously
No
Digital UNIX Internals II 4 - 25 Buffer Caches
UBC Parameters and Thresholds (1)
Field Descriptionubc_pages Count of UBC pages.
ubc_minpages Smallest number of pages UBC will shrink to. ubc_minpages = (vm_managed_pages * ubc_minpercent)/100 where ubc_minpercent is tunable (Default =10).
ubc_maxpages Upper limit of size of UBC. ubc_maxpages = (vm_managed_pages * ubc_maxpercent)/100 where ubc_maxpercent is tunable (Default = 100).
ubc_lru_count Number of pages on the UBC LRU queue.
ubc_dirty_limit Determines if UBC should flush and free dirty pages. ubc_dirty_limit=MAX(ubc_min_dirtypages, ((vm_tune_value(ubcdirtypercent) * ubc_pages)/100)) where ubcdirtypercent is tunable (Default =10).
Digital UNIX Internals II 4 - 26 Buffer Caches
UBC Parameters and Thresholds (2)
Field Descriptionubc_dirty_pages UBC page currently dirty; tracked by system.
ubc_borrowlimit Number of pages UBC can have. If ubc_pages>ubc_borrowlimt then UBC is asked to free pages. ubc_borrowlimit=(ubc_borrowpercent * vm_managed_pages)/100 where ubc_borrowpercent is 10 by default.
vm_perf.vpf_ubchit Rate of UBC pages transitioning to the tail of the UBC LRU list because a pmap_is_referenced returned TRUE.
vm_perf.vpf_ubcalloc Rate of UBC page allocation
vm_perf.vpf_ubcpagepushes Rate of pages being evicted from the UBC because of memory reclamation activity.
vm_free_count Current count of free pages.
Digital UNIX Internals II 4 - 27 Buffer Caches
Source Reference (1 of 4)
Buf Cache• kernel/sys/buf.h
– definition of struct buf
• kernel/vfs/vfs_bio.c– bfreelist[], bufhash and buf routines (bread() etc.)
Digital UNIX Internals II 4 - 28 Buffer Caches
Source Reference (2 of 4)
UBC• kernel/vm/vm_page.h
– definitions of vm_page, vm_page_array
• kernel/vm/vm_resident.c– definition of vm_page_bucket hashing array
• kernel/vfs/vfs_ubc.c– definition of ubc lru list
• kernel/vm/vm_ubc.h– definition of vm_ubc_object
• kernel/vfs/vfs_ubc.c– implementation of ubc routines interface routines.
Digital UNIX Internals II 4 - 29 Buffer Caches
Source Reference (3 of 4)
Reading Data From a UBC Cached UFS File• kernel/ufs/ufs_vnops.c ufs_read()
ufs_getpage() ufs_getapage()
• kernel/vfs/vfs_ubc.c ubc_lookup()• kernel/vm/vm_resident.c vm_page_lookup()
Digital UNIX Internals II 4 - 30 Buffer Caches
Source Reference (4 of 4)
Pagefaulting on a UBC MMAPed Page• kernel/arch/alpha/locore.s XentMM• kernel/arch/alpha/trap.c trap()• kernel/vm/vm_fault.c vm_fault()• kernel/vm/vm_umap.c u_map_fault()• kernel/vm/u_mape_vp.c u_vp_fault()