24
1

Memory management in Linux kernel

Embed Size (px)

DESCRIPTION

Основные темы, затронутые на семинаре: Задачи и компоненты подсистемы управления памятью; Аппаратные возможности платформы x86_64; Как описывается в ядре физическая и виртуальная память; API подсистемы управления памятью; Высвобождение ранее занятой памяти; Инструменты мониторинга; Memory Cgroups; Compaction — дефрагментация физической памяти.

Citation preview

Page 1: Memory management in Linux kernel

1

Page 2: Memory management in Linux kernel

2

Memory management in Linux kernel

Page 3: Memory management in Linux kernel

3

Memory management tasks

• Physical memory allocator• Physical memory management• Virtual memory allocator• PTE management• Memory allocator for kernel

needs

Page 4: Memory management in Linux kernel

4

Memory management subsystem

• >100K lines• Buddy allocator• Page replacement (“LRU” reclaim model)• PTE management• Slab/slob/slub kernel allocator• Pagecache/writeback/readahead/swap• Cgroup memory controller• Compaction

Page 5: Memory management in Linux kernel

5

Hardware

• X86_64• Paging (MMU, TLB, ...)• 4KB, 2MB and 1GB pages• NUMA• 4-level PTE's• Hardware referenced bit

Page 6: Memory management in Linux kernel

6

Physical memory description

• Node (pg_data_t)• Zone (struct zone)• Page (struct page)

$ cat /proc/zoneinfo | grep NodeNode 0, zone DMANode 0, zone DMA32Node 0, zone NormalNode 1, zone Normal

Page 7: Memory management in Linux kernel

7

Virtual memory description

• Address space (struct mm_struct)• VM area (struct vm_area_struct)

$ cat /proc/self/maps 00400000-0040c000 r-xp 00000000 08:03 2359718 /usr/bin/cat

0060b000-0060c000 r--p 0000b000 08:03 2359718 /usr/bin/cat0060c000-0060d000 rw-p 0000c000 08:03 2359718 /usr/bin/cat011a7000-011c8000 rw-p 00000000 00:00 0 [heap]7f4d072e5000-7f4d0d80e000 r--p 00000000 08:03 2369473 /usr/lib/locale/locale-archive7f4d0d80e000-7f4d0d9c2000 r-xp 00000000 08:03 2366682 /usr/lib64/libc-2.18.so7f4d0d9c2000-7f4d0dbc2000 ---p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so

7f4d0dbc2000-7f4d0dbc6000 r--p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so...

Page 8: Memory management in Linux kernel

8

File mappings

• File mappings (struct address_space)

• Radix tree with all resident pages• Pagecache• Major/minor pagefault

Page 9: Memory management in Linux kernel

9

Kernel API

• __get_free_page()• kmalloc()/kfree()• vmalloc()• ...

Page 10: Memory management in Linux kernel

10

Userspace API

• pagefault• mmap()/munmap()• brk()• mlock()/munlock()• fadvise(), madvise()• ...

Page 11: Memory management in Linux kernel

11

Memory reclaim• Normal/direct reclaim (free pool)• Per-node kswapd• Working set• Memory pressure• File memory vs anonymous memory• Swap• OOM

Page 12: Memory management in Linux kernel

12

“LRU” model

• 5 double linked lists: inactive file, active file, inactive anon, active anon, unevictable

• Referenced flag in struct page_struct flag

Page 13: Memory management in Linux kernel

13

List transition rules• mark_page_accessed():

– unreferenced -> referenced– inactive && referenced -> active

• shrink_inactive_list():– if (ptes referenced)

• anonymous -> active• referenced -> active• (ptes referenced > 1) -> active (3.2)• (vm_flags & VM_EXEC) -> active (3.2)• set referenced• rotate

– else• reclaim

• shrink_active_list():– If referenced

• file & VM_EXEC -> rotate

– -> inactive

Page 14: Memory management in Linux kernel

14

Memory pressure balancing

• nr_pages_to_scan = nr_pages/2^priority

• priority = [12..0]1/4096, 1/2048, 1/1024, ...

• swappiness• active > inactive

Page 15: Memory management in Linux kernel

15

Yasearch-specific problems & solutions

• Working set > 1/2 available memory

• Memory thrashing• promote_mapped_pages• file_inactive_ratio

Page 16: Memory management in Linux kernel

16

Monitoring & tools• top• vmtouch• /proc/vmstat• /proc/buddyinfo• /proc/slabinfo• perf top• oom-message in dmesg

Page 17: Memory management in Linux kernel

17

Demonstration

Page 18: Memory management in Linux kernel

18

Cgroups

• Each cgroup has own LRU lists.• No common LRU (since 3.3)!• Common free pool(s)• Common kswapd thread(s)• Global reclaim vs target reclaim

Page 19: Memory management in Linux kernel

19

Memory controller

• memory.limit_in_bytes• memory.soft_limit_in_bytes (will

be deprecated)• memory.use_hierarchy• ...

Page 20: Memory management in Linux kernel

20

Monitoring

• memory.usage_in_bytes• memory.max_usage_in_bytes• memory.stat

Page 21: Memory management in Linux kernel

21

Accounting

• Each page belongs to one cgroup• First accessed - owner• memory.move_charge_at_immigr

ate

Page 22: Memory management in Linux kernel

22

Yasearch-specific problems & solutions

• memory.low_limit_in_bytes• First accessed – owner? mlock()?

low_limit?• memory.recharge_on_pgfault

Page 23: Memory management in Linux kernel

23

Compaction

• Physical pages migration to zone's top

• https://lwn.net/Articles/368869• Broken in 3.3-3.7• Replacement for lumpy reclaim• Use perf top for problem diagnostics

Page 24: Memory management in Linux kernel

24

Спасибо за внимание!