Windows memory manager internals

Windows Memory/Cache Manager Internals

Sisimon Soman

Locality Theory

• If access page/cluster n, high possibility to access blocks near to n.

• All memory based computing system working on this principle.

• Windows has registry keys to configure pre-fetch how many blocks/pages.

• Application specific memory manager like Databases, multimedia workload, have application aware pre-fetching.

Virtual Memory Manager (VMM)

• Apps feels memory is infinity – magic done by VMM.

• Multiple apps run concurrently with out interfering other apps data.

• Apps feel the entire resource is mine.• Protect OS memory from apps.• Advanced app may need to share

memory. Provide solution to memory sharing easily.

VMM Continued..

• VMM reserve certain amount of memory to Kernel.

• 32 bit box , 2GB for Kernel and 2GB for User apps.

• Specific area in Kernel memory reserved to store process specific data like PDE, PTE etc called Hyper Space

Segmentation and Paging

• X86 processor has segmentation and paging support.

• Can disable or enable paging, but segmentation is enabled by default.

• Windows uses paging.

• Since not able to disable segmentation, it consider the entire memory for segments (also called ‘flat segments’).

Paging

• Divide entire physical memory in to equal size pages (4K size for x86 platforms). This is called ‘page frames’ and list called ‘page frame database’ (PF DB).

• X86 platform PF DB contains 20bit physical offset (remaining 12 bit to address offset in the page).

• PF DB also contains flags stating, read/write underway , shared page , etc.

VMM Continued..

• Upper 2GB Kernel space is common for all process.

• What is it mean – Half of PDE is common to all process !.

• Experiment – See the PDE of two process and make sure half of the PDE is same

Physical to Virtual address translation

• Address translation in both direction – When write PF to pagefile, VMM need to update proper PDE/PTE stating page is in disk.

• Done by– Memory Management Unit (MMU) of the processor.– The VMM help MMU.

• VMM keep the PDE/PTE info and pass to MMU during process context switch.

• MMU translate virtual address to physical address.

Translation Lookaside Buffer (TLB)

• Address translation is costly operation• It happen frequently – when even touches virtual

memory.• TLB keeps a list containing most frequent

address translations.• The list is tagged by process ID.• TLB is a generic OS concept - implementation is

architecture dependent.• Before doing the address translation MMU

search TLB for the PF.

Address Translation

• In x86 32 bit address – 10 bits of MSB points to the PTE offset in PDE. Thus PDE size of process is 1024 bytes.

• Next 10 bits point to the PF starting address in PTE. Thus each PTE contains 1024 bytes.

• Remaining 12 bits to address the location in the PF. Thus page size is 4K.

What is a Zero Page

• Page frames not specific to apps.• If App1 write sensitive data to PF1, and later VMM push

the page to page file, attach PF 1 to App2. App2 can see these sensitive info.

• It’s a big security flaw, VMM keep a Zero Page list.• Cannot clean the page while freeing memory – it’s a

performance problem.• VMM has dedicated thread who activate when system

under low memory situation and pick page frames from free PF list, clean it and push to zero page list.

• VMM allocate memory from zero page list.

Arbitrary Thread Context

• Top layer of the driver stack get the request (IRP) in the same process context.

• Middle or lower layer driver MAY get the request in any thread context (Ex: IO completion), the current running thread context.

• The address in the IRP is specific to the PDE/PTE in the original process context.

Arbitrary Thread Context continued..

• How to solve the issue ?.

• Note the half of the PDE (Kernel area) is common in all process.

• If some how map to the kernel memory (Upper half of PDE), the buffer is accessible from all process.

Mapping buffer to Kernel space

• Allocate kernel pool from the calling process context, copy user buffer to this Kernel space.

• Memory Descriptor List (MDL) – Most commonly used mechanism to keep data in Kernel space.

Memory Descriptor List (MDL)• //• // I/O system definitions.• //• // Define a Memory Descriptor List (MDL)• //• // An MDL describes pages in a virtual buffer in terms of physical pages. The• // pages associated with the buffer are described in an array that is allocated• // just after the MDL header structure itself.• //• // One simply calculates the base of the array by adding one to the base• // MDL pointer:• //• // Pages = (PPFN_NUMBER) (Mdl + 1);• //• // Notice that while in the context of the subject thread, the base virtual• // address of a buffer mapped by an MDL may be referenced using the following:• //• // Mdl->StartVa | Mdl->ByteOffset• //

• typedef struct _MDL {• struct _MDL *Next;• CSHORT Size;• CSHORT MdlFlags;• struct _EPROCESS *Process;• PVOID MappedSystemVa;• PVOID StartVa;• ULONG ByteCount;• ULONG ByteOffset;• } MDL, *PMDL;

MDL Continued..• #define MmGetSystemAddressForMdlSafe(MDL, PRIORITY) \• (((MDL)->MdlFlags & (MDL_MAPPED_TO_SYSTEM_VA | \• MDL_SOURCE_IS_NONPAGED_POOL)) ? \• ((MDL)->MappedSystemVa) : \• (MmMapLockedPagesSpecifyCache((MDL), \• KernelMode, \• MmCached, \• NULL, \• FALSE, \• (PRIORITY))))

• #define MmGetMdlVirtualAddress(Mdl) \• ((PVOID) ((PCHAR) ((Mdl)->StartVa) + (Mdl)->ByteOffset))

Standby list

• To reclaim pages from a process, VMM first move pages to Standby list.

• VMM keep it there for a pre-defined ticks.• If process refer the same page, VMM remove from

standby list and assign to process.• VMM free the pages from Standby list after the timeout

expire.• Pages in standby list is not free, not belong to a process

also.• VMM keep a min and max value for free and standby

page count. If its out of the limits, appropriate events will signaled and adjust the appropriate lists.

Miscellaneous VMM Terms

• ZwAllocateVirtualMemory – allocate process specific memory in lower 2GB

• Paged Pool

• Non Paged Pool

• Copy on write (COW)

Part 2

Cache Manager

Cache Manager concepts

• If disk heads run in the speed of super sonic jets, Cache Manager not required.

• Disk access is the main bottleneck that reduce the system performance. Faster CPU and Memory, but disk is still in stone age.

• Common concept in Operating Systems, Unix flavor called ‘buffer cache’.

What Cache Manager does

• Keep the system wide cache data of frequently used secondary storage blocks.

• Facilitate read ahead , write back to improve the overall system performance.

• With write-back, cache manager combine multiple write requests and issue single write request to improve performance. There is a risk associated with write-back.

How Cache Manager works

• Cache Manager implement caching using Memory Mapping.

• The concept is similar to an App uses memory mapped file.

• CreateFile(…dwFlagsAndAttributes ,..)• dwFlagsAndAttributes ==

FILE_FLAG_NO_BUFFERING means I don’t want cache manager.

How Cache Manager works..

• Cache Manager reserve area in higher 2GB (x86 platform) system area.

• The Cache Manager reserved page count adjust according to the system memory requirement.

• If system has lots of IO intensive tasks, system dynamically increase the cache size.

• If system under low memory situation, reduce the buffer cache size.

How cached read operation works

File System Cache Manager

VMM

User Space

Kernel SpaceCached Read (1)

Get the Pages From CM (2)

Do Memory Mapping (3)

Page Fault (4)

Disk stack(SCSI/Fibre Channel)

Get the blocks from disk (5)

How cached write operation works

File System Cache Manager

VMM

User Space

Kernel SpaceCached Write (1)

Copy Pages to CM (2)

Do Memory Mapping (3), Copy data to VMM pages.

Disk stack(SCSI/Fibre Channel)

Write the blocks to disk (5)

Modified Page Writer Thread

of VMM

Write to disk later(4)

Questions ?

Documents

Windows memory manager internals