Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
August 2018
Refreshing our knowledge
HugePages: Why, what and how
2© The Pythian Group Inc., 2018
What's up withHugePages?
© The Pythian Group Inc., 2018 3
Jose RodriguezProject engineer at Pythian
● +10 years of experience, mainly Oracle but also SQL Server and others like DB2 LUW or PostgreSQL
● Solaris, Linux and Windows RAC and HA with DG and GG
Other areas of expertise, i.e., things I like doing
● Scripting and automation (lazy DBA)● Machine Learning● Golden Gate replication● Cloud related stuff (who doesn't nowadays, eh? )
About Pythian
Pythian’s 400+ IT professionals
help companies adopt
and manage disruptive data
technologies to better compete
© 2018 Pythian. Confidential 4
© 2018 Pythian. Confidential 5
Systems currently managed by Pythian
EXPERIENCED
Pythian experts in 35 countries
GLOBAL
Millennia of experience gathered and shared
over 19 years
EXPERTS
11,800 2 400+
What are you taking awaySo you can leave now if you already have it
Agenda
7© The Pythian Group Inc., 2018
Why do we care?
What are HugePages?
How to implement?
What can happen - Case Studies
© The Pythian Group Inc., 2018 8
● It is 2019, but HugePages seem yet to be understood and broadly implemented
● More memory -> more problems
● systems with >= 1TB RAM are common nowadays
● Problems caused by lack of HugePages are not always easy to spot
Why do we care?
9© The Pythian Group Inc., 2018
What Are HugePagesBehind the scenes
© The Pythian Group Inc., 2018 10
Virtual to physical memory mapping
0x00
...
0x01...0x230x24
42
157
245
user
pro
c ess
mai
n m
emor
y
11© The Pythian Group Inc., 2018
Memory allocations are tracked in PageTables
virtual real
0x00 42
0x01 175
0x02 176
0x03 177
0x04 178
... ...
4kB
© The Pythian Group Inc., 2018 12
Virtual to physical memory mapping
...
virtual real
0x00 42
0x01 175
0x02 176
0x03 177
0x04 178
... ...
user
pro
c ess
phys
ical
mem
ory
PageTable
© The Pythian Group Inc., 2018 13
● To allocate 100 GiB there will be 26,214,400 memory pages of 4KB each
● An OS would typically group and map them hierarchically in frames■ i.e. continuous space can be mapped more efficiently
● Each PageTable Entry (PTE) is around 8 bytes for 64 bits systems● Vmem offset + Physical address + Flags
● PageTables are also stored in memory. Size would be 200 MiB in our example
● For shared memory segments (e.g. SGA) each process has a copy of the PageTable
● A regular single instance may have 1000 sessions * 200 MiB each leads to 200GiB of RAM to track RAM.
What's up with PageTables?
14© The Pythian Group Inc., 2018
HugePages to the Rescue!
virtual real
0x00 1
0x02 ...
0x02 ...
0x03 ...
0x04 ...
... ...
2048KB2048KiB
4KiB
PageTable reduced 512 time to only ~400KiB
© The Pythian Group Inc., 2018 15
● Allocate only enough HugePages● HugePages cannot be swapped out● Oracle Automatic Memory Management (AMM) is incompatible
with HugePages● Transparent HugePages (THP) do not go along well with
Oracle, disabled by default in UEK2+● Platforms other than Linux x64 have even bigger choices of
large page sizes up to 1GiB● In extreme cases, SGA of TiBs in size, may lead to slow
instance startup. PRE_PAGE_SGA may help here● AMM is forbidden in 12.2 if RAM>4GiBs, so HP should be used
here.
HugePages additional facts
© The Pythian Group Inc., 2018 16
● Do we really need/want HugePages for ASM?● ASM uses AMM by default so initially not HP compatible.
/dev/shm is important here.● We don't for "regular" ASM instances. Documentation and best
practices say this clearly, although this may change in future releases.
● Highly recommended for Exadata. MOS notes 2062068.1 and 2111010.1 clearly indicate that ASMM should be enabled and HugePages available for ASM.
HugePages and ASM
© The Pythian Group Inc., 2018 17
● /dev/shm is automatically set to 50% of total RAM● Oracle AMM uses /dev/shm to "store" shared memory pages● We may be tempted to reduce the size of /dev/shm to allow
more room to HugePages. No need● HugePages and AMM are incompatible
HugePages and /dev/shm
© The Pythian Group Inc., 2018 18
● Does Oracle use HugePages for PGA?● No, it doesn't (currently)● No hard evidence against it in docs or MOS● Tests show that Oracle is not allocating HugePages for it● Counterintuitive for small memory allocations● May change in the future (DWH or DSS sort area)
HugesPages and PGA
19© The Pythian Group Inc., 2018
HugePages on the Cloud
● Supported on AWS RDS since July, 2017 but not enabled by default. There are limitations to the type of instance you can enable HP on.
● No official documentation on Azure, but a recent test showed that we can set up HP in a Linux VM running on Azure.
● Google Cloud platform supports HugePages.
● Oracle Cloud Service – Officially supported for Exadata Cloud Service.OCI allows it but not by default.Classic platform has it enabled by default.
20© The Pythian Group Inc., 2018
Let's do it!
© The Pythian Group Inc., 2018 21
● Script provided in MOS: "Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration (Doc ID 401749.1)"
● Or use the following formula:
SGA size (MiB) / 2 (MiB) + 42
How many HugePages do I need?
22
© The Pythian Group Inc., 2018
● May need extra work on VMs● Disable AMM● Set use_large_pages=only● Disable THP● Set memlock user limit● Set vm.nr_hugepages● Set vm.hugetlb_shm_group as required
(SUSE)● Reboot OS (not always required)● Restart Oracle instance● Use TuneD profiles on RHEL 7 and
above
Implementation steps
23© The Pythian Group Inc., 2018
Success!
2018-08-20T12:43:18.163509+00:00Dump of system resources acquired for SHARED GLOBAL AREA (SGA) 2018-08-20T12:43:18.163653+00:00 Per process system memlock (soft) limit = 2048M2018-08-20T12:43:18.163821+00:00 Expected per process system memlock (soft) limit to lock SHARED GLOBAL AREA (SGA) into memory: 1540M2018-08-20T12:43:18.163952+00:00 Available system pagesizes: 4K, 2048K2018-08-20T12:43:18.164143+00:00 Supported system pagesize(s):2018-08-20T12:43:18.164220+00:00 PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)2018-08-20T12:43:18.164382+00:00 2048K 1056 770 770 NONE
[oracle@HPtesting ~]$ grep ^HugePages /proc/meminfo HugePages_Total: 1056HugePages_Free: 287HugePages_Rsvd: 1HugePages_Surp: 0
© The Pythian Group Inc., 2018 24
LargePages (A.K.A. HugePages in Windows)
● Available since Oracle 10.1● Enabled by adding an entry into the registry, ideally for each SID
instead of general● Again only used for SGA● Not considered in the "Working Set" so memory usage metrics are
now somehow flawed ● Startup times may be slow and with high impact on the server
performance for older versions● Oriented to DWH type databases
25© The Pythian Group Inc., 2018
Case StudiesLack of HugePages causing trouble
26© The Pythian Group Inc., 2018
RAC node eviction● 1 node of 2-node cluster evicted
● Logs show a timeout responding to something prior to eviction
● We found no other errors or evidence
● sar to the rescue!
27© The Pythian Group Inc., 2018
RAC node eviction – “sar -r”
05:20:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty05:30:01 AM 361136 65389512 99.45 932 24477352 41041528 29.48 29509228 1880236 56405:40:01 AM 354896 65395752 99.46 95164 24434320 41039432 29.48 29504552 1902320 55205:50:01 AM 382940 65367708 99.42 87912 24420284 41021636 29.47 29474908 1902904 49606:00:01 AM 385016 65365632 99.41 52432 24414712 41053708 29.49 29477412 1878860 48806:10:01 AM 386796 65363852 99.41 596 24416944 41046880 29.48 29412032 1909420 62806:20:02 AM 376484 65374164 99.43 596 24546212 41069336 29.50 29603108 2107020 46006:30:01 AM 335176 65415472 99.49 596 24893684 41094396 29.52 29676840 2078424 64806:40:05 AM 334152 65416496 99.49 596 24554064 41222332 29.61 29453168 2061660 006:50:03 AM 349392 65401256 99.47 596 22963852 41360864 29.71 28031816 1851900 7207:00:10 AM 342752 65407896 99.48 596 21190320 41768848 30.00 26854676 1723480 007:10:04 AM 341756 65408892 99.48 596 20787592 41769828 30.00 26706944 1765980 12Average: 414530 65336118 99.37 19907 24589646 41094908 29.52 29439910 1903200 2315
07:16:28 AM LINUX RESTART
28© The Pythian Group Inc., 2018
RAC node eviction – “sar -B”
05:20:01 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff05:30:01 AM 7257.49 91.49 8704.61 1.28 6571.59 44.62 0.00 30.63 68.6505:40:01 AM 2021.61 2486.25 141607.78 5.77 60734.72 451.59 38.26 386.08 78.8205:50:01 AM 6980.26 71.57 7241.62 0.56 6380.14 35.86 7.72 38.22 87.7106:00:01 AM 7262.73 67.56 8717.63 1.10 6549.42 47.03 1.18 40.58 84.1806:10:01 AM 1759.35 379.66 15556.00 4.59 7320.75 185.60 2.75 143.16 76.0106:20:02 AM 63309.67 3624.60 34222.39 267.30 50019.66 42307.14 982.44 13754.06 31.7706:30:01 AM 115962.81 2730.86 30665.11 373.74 86055.51 843180.51 16021.66 26924.73 3.1306:40:05 AM 83609.10 1331.45 20393.23 235.71 62484.15 1104330.76 25458.10 20843.61 1.8406:50:03 AM 158193.69 4252.68 27395.53 375.73 111261.98 1848753.28 61689.42 38619.69 2.0207:00:10 AM 98699.51 4257.23 23771.15 292.99 70708.84 590100.80 12354.06 23573.29 3.9107:10:04 AM 125777.83 2409.66 23413.33 415.65 91748.06 952301.48 24672.30 31401.32 3.21Average: 20671.81 1108.76 15287.47 51.49 18890.14 126225.66 3483.02 4065.45 3.13
07:16:28 AM LINUX RESTART
29© The Pythian Group Inc., 2018
RAC node eviction - “cat /proc/meminfo” (after incident)
$ cat /proc/meminfoMemTotal: 65918584 kBMemFree: 1583912 kBMemAvailable: 20034320 kBBuffers: 416208 kBCached: 41349928 kBSwapTotal: 73469948 kBSwapFree: 73334068 kBKernelStack: 23392 kBPageTables: 12495120 kBAnonHugePages: 1478656 kBHugePages_Total: 0HugePages_Free: 0HugePages_Rsvd: 0HugePages_Surp: 0Hugepagesize: 2048 kB
30© The Pythian Group Inc., 2018
Unexpected Swapping● Lots of notes about swapping in
alert log
● Small (2GB SGA)
● Rarely used database
● vm.swappiness was not reviewed, probably at 60
WARNING: Heavy swapping observed on system in last 5 mins.pct of memory swapped in [0.27%] pct of memory swapped out [2.22%].Please make sure there is no memory pressure and the SGA and PGA are configured correctly. Look at DBRM trace file for more details.
© The Pythian Group Inc., 2018 31
● Once a month load > 400
● System unusable but no crash
CPU stealing
32© The Pythian Group Inc., 2018
Yet again - sar is the star[oracle@oramstr01 oracle]$ sar -f sa07 -s 14:30:00 -e 18:30:00 -uLinux 2.6.32-573.8.1.el6.x86_64 (oramstr01.testing.com) 12/07/2016 _x86_64_ (80 CPU)
02:30:01 PM CPU %user %nice %system %iowait %steal %idle02:40:01 PM all 38.31 0.00 3.35 3.02 0.00 55.3202:50:01 PM all 34.02 0.00 3.15 2.63 0.00 60.1903:00:01 PM all 34.20 0.00 3.20 1.68 0.00 60.9203:10:01 PM all 40.79 0.00 3.81 2.96 0.00 52.4403:20:01 PM all 37.33 0.00 3.43 2.40 0.00 56.8303:30:04 PM all 40.72 0.00 6.12 2.62 0.00 50.5403:40:01 PM all 10.08 0.00 88.36 0.30 0.00 1.2603:50:02 PM all 8.66 0.00 90.82 0.05 0.00 0.4704:00:03 PM all 31.66 0.00 68.27 0.02 0.00 0.0404:10:03 PM all 45.84 0.00 49.19 0.90 0.00 4.0704:20:01 PM all 40.68 0.00 54.30 0.97 0.00 4.0504:30:01 PM all 37.81 0.00 43.22 1.13 0.00 17.8404:40:02 PM all 15.68 0.00 84.18 0.05 0.00 0.0904:50:02 PM all 12.76 0.00 87.23 0.00 0.00 0.0105:00:03 PM all 11.84 0.00 88.14 0.00 0.00 0.0105:10:01 PM all 18.56 0.00 62.74 0.71 0.00 17.9905:20:01 PM all 15.84 0.00 1.73 1.17 0.00 81.2705:30:01 PM all 19.22 0.00 1.75 0.71 0.00 78.3305:40:01 PM all 25.51 0.00 2.02 1.15 0.00 71.3205:50:01 PM all 23.78 0.00 1.85 1.05 0.00 73.3206:00:01 PM all 20.15 0.00 1.66 0.88 0.00 77.3006:10:01 PM all 21.14 0.00 2.68 2.93 0.00 73.2506:20:01 PM all 18.94 0.00 2.33 2.26 0.00 76.46Average: all 26.23 0.00 32.81 1.29 0.00 39.67
© The Pythian Group Inc., 2018 33
[oracle@oramstr01 ~]$ dateWed Dec 14 10:23:22 EST 2016[oracle@oramstr01 ~]$ ps -ef | grep -c oracleccxp2717[oracle@oramstr01 ~]$ grep PageTable /proc/meminfoPageTables: 461002976 kB
Yes, that is 440 GiBs of PageTables!
Sessions and pagetable memory
34© The Pythian Group Inc., 2018
Yet again - sar is the star
[oracle@oramstr01 oracle]$ sar -r -f sa07 -s 14:30:00 -e 17:30:00Linux 2.6.32-573.8.1.el6.x86_64 (oramstr01.testing.com) 12/07/2016 _x86_64_ (80 CPU)
02:30:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit02:40:01 PM 57063440 1001654588 94.61 1610536 412910108 311506624 26.7202:50:01 PM 57747440 1000970588 94.55 1610564 412911112 306685964 26.3103:00:01 PM 46592180 1012125848 95.60 1610596 412914608 308191992 26.4403:10:01 PM 31680840 1027037188 97.01 1610608 412917688 310993660 26.6803:20:01 PM 16457972 1042260056 98.45 1610628 412920472 309580976 26.5603:30:04 PM 1739692 1056978336 99.84 1610628 411393436 317613764 27.2503:40:01 PM 5066928 1053651100 99.52 1538352 395198928 324298580 27.8203:50:02 PM 28196104 1030521924 97.34 1342292 381568100 324394208 27.8304:00:03 PM 11313156 1047404872 98.93 1359396 378901468 326693864 28.0304:10:03 PM 80061500 978656528 92.44 1359488 375162128 321167148 27.5504:20:01 PM 64494004 994224024 93.91 1359508 375163768 322061964 27.6304:30:01 PM 108230896 950487132 89.78 1359532 375166776 313685004 26.9104:40:02 PM 135833716 922884312 87.17 1359548 375168248 318691876 27.3404:50:02 PM 192323488 866394540 81.83 1359556 375169736 315572568 27.0705:00:03 PM 235108136 823609892 77.79 1359648 375172216 312304460 26.7905:10:01 PM 360281464 698436564 65.97 1359724 375173424 295083536 25.3205:20:01 PM 357150032 701567996 66.27 1359748 375175952 296449248 25.43
35© The Pythian Group Inc., 2018
Summary● HugePages are usually good to
have● How to implement● Know where to look
● /proc/meminfo■ HugePages■ Pagetables
● Remember the power of sar/OSwBB● Following best practices prevents
issues
© The Pythian Group Inc., 2018 36
References
● Oracle 11g internals part 1: Automatic Memory Management by Tanel Poder
● Oracle SGA memory allocation on startup by Fritz Hoogland
● Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB
Configuration (Doc ID 401749.1)
● Oracle Exadata Initialization Parameters and Diskgroup Attributes Best Practices (
Doc ID 2062068.1)
● 12.2 Grid Infrastructure and Database Upgrade steps for Exadata Database Machine running
11.2.0.3 and later on Oracle Linux (Doc ID 2111010.1)
● ASM & Shared Pool (ORA-4031) (Doc ID 437924.1)
Q&AAsk now or reach out later, but don't keep the question for yourself
38© The Pythian Group Inc., 2018
THANK YOU Hope you enjoyed it