56
IBM Advanced Technical Support - Americas April 17, 2009 © 2009 IBM Corporation AIX Performance: Configuration & Tuning for Oracle Vijay Adik [email protected] ATS - Oracle Solutions Team

Oracle AIX+Tuning+1

Embed Size (px)

Citation preview

Page 1: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

April 17, 2009 © 2009 IBM Corporation

AIX Performance: Configuration & Tuning for Oracle

Vijay [email protected] - Oracle Solutions Team

Page 2: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

2 © 2009 IBM Corporation April 17, 2009

Legal information

The information in this presentation is provided by IBM on an "AS IS" basis without any warranty, guarantee or assurance of any kind. IBM also does not provide any warranty, guarantee or assurance that the information in this paper is free from errors or omissions. Information is believed to be accurate as of the date of publication. You should check with the appropriate vendor to obtain current product information.

Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.

IBM, ̂̂̂̂ , , RS6000, System p, AIX, AIX 5L, GPFS, and Enterprise Storage Server (ESS) are trademarks or registered trademarks of the International Business Machines Corporation.

Oracle, Oracle9i and Oracle10g are trademarks or registered trademarks of Oracle Corporation.

All other products or company names are used for identification purposes only, and may be trademarks of their respective owners.

Page 3: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

3 © 2009 IBM Corporation April 17, 2009

� AIX Configuration Best Practices for Oracle

–Memory

–CPU

– I/O

–Network

–Miscellaneous

Agenda

Page 4: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

4 © 2009 IBM Corporation April 17, 2009

� The suggestions presented here are considered to

be basic configuration “starting points” for

general Oracle workloads

� Your workloads may vary

� Ongoing performance monitoring and tuning is

recommended to ensure that the configuration is

optimal for the particular workload characteristics

AIX Configuration Best Practices for Oracle

Page 5: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

5 © 2009 IBM Corporation April 17, 2009

Performance Overview – Tuning Methodology

CPU Memory I/ONetwork

Predominant Bottleneck

• Understand the external view of system performance

The external view of system performance is the observable event that is causing someone to say the system is performing poorly. Typically, (1) end-user response time, (2) application (or task) response time or (3) throughput. Should not use system metrics to judge improvement.

• Performance only improves when the predominant

bottleneck is fixed

Fixing a secondary bottleneck will not improve performance and typically results in overloading an already overloaded predominant bottleneck.

• Monitor Performance after a change – Tuning is an

iterative process

Monitoring is required after making a change for two reasons (1) Fixing the predominant bottleneck typically uncovers another bottleneck, and (2) Not all changes yield a positive results. If possible you should have a “repeatable” test to so change can be accurately evaluated.

• End-User Response time is the elapsed time between when a user submits a request and receives a response. • Application Response time is the elapsed required for one or more jobs to complete. Historically, these jobs have been called batch jobs. • Throughput is the amount of work that can be accomplished per unit time. This metric is typically expressed in terms of transaction per minute.

Iterative Tuning Process

�Stress System (i.e., Tune at Peak workload)

�Monitor Sub-Systems

�Identify Predominant Bottleneck

�Tune Bottleneck

�Repeat

Page 6: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

6 © 2009 IBM Corporation April 17, 2009

Performance Monitoring and Tuning Tools

nfso,chdevno, chdev,ifconfig

ioo, lvmo, chdev, migratepv,chlv, reorgvg

vmo, rmss,fdpr, chps/mkps

schedo, fdpr, bindprocessor, bindintcpu, nice/renice, setpri

Tuning tools

Trace Level Commands

Monitor

Commands

Status Commands

truss, pprofcurt, splat, trace, trcrpt

iptrace, ipreport, trace, trcrpt

trace, trcrpttrace,trcrpttprof, curt, splat, trace, trcrpt

svmon, truss, kdb, dbx, gprof, kdb, fuser, prof

netpmon, tcpdump

fileplace, filemon

svmon, netpmon, filemon

netpmon

ps, pstat, topas, emstat/alstat

netstat, topas, atmstat, entstat, tokstat, fddistat, nfsstat, ifconfig

vmstat, topas, iostat, lvmstat, lsps, lsattr/lsdev, lspv/lsvg/lslv

vmstat, topas, ps, lsps, ipcs

vmstat, topas, iostat, ps, mpstat, lparstat, sar, time/timex, emstat/alstat

Processes &

ThreadsNetwork

I/O

SubsystemMemoryCPU

Page 7: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

7 © 2009 IBM Corporation April 17, 2009

� AIX Configuration Best Practices for Oracle

–Memory

–CPU

– I/O

–Network

–Miscellaneous

Agenda

Page 8: Oracle AIX+Tuning+1

Advanced Technical Support – System p

8 April 17, 2009© 2009 IBM Corporation

AIX Memory Management Overview

� The role of Virtual Memory Manager (VMM) is to provide the capability for programs to address

more memory locations than are actually available in physical memory.

� On AIX this is accomplished using segments that are partitioned into fixed sizes called “pages”.

– A segment is 256M

– default page size 4K

– POWER 4+ and POWER5 can define large pages, which are 16M

� The 32-bit or 64-bit address translates into a 52-bit or 80-bit virtual address

– 32-bit system : 4-bit segment register that contains a 24-bit segment id, and 28-bit offset.

• 24-bit segment id + 28-bit offset = 52-bit VA

– 64-bit system: 32-bit segment register that contains a 52-bit segment id, and 28-bit offset.

• 52-bit segment id + 28-bit offset = 80-bit VA

� The VMM maintains a list of free frames that can be used to retrieve pages that need to be

brought into memory.

– The VMM replenishes the free list by removing some of the current pages from real memory (i.e., steal memory).

– The process of moving data between memory and disk is called “paging”.

� The VMM uses a Page Replacement Algorithm (implemented in the lrud kernel threads) to select

pages that will be removed from memory.

Page 9: Oracle AIX+Tuning+1

Advanced Technical Support – System p

9 April 17, 2009© 2009 IBM Corporation

Virtual Memory Space – 64 Bits 36-bits selects Segment Register 28-bits offset within Segment 64-bit Address

.

.

.

Virtual Memory

1 Trillion Terabytes or 1 Yotta byte

Segments IDs

0

Each Segment Register contains a 52-bit Segment ID

Kernel Segment

Page Space Disk Map

Kernel Heap

256 Mbyte Segment

52-bit Segment Id + 28-bit offset = 80-bit Virtual Address

Segment is divided into 4096 byte chunks called pages

Each Segment can have a maximum of

65536 pages

28-bit offset – to access a specific location in the

segment

228 = 256M

Page 10: Oracle AIX+Tuning+1

Advanced Technical Support – System p

10 April 17, 2009© 2009 IBM Corporation

Memory Tuning Overview

Virtual Memory

(General)

Large Pages

(Pinned Memory 1)

Memory:

�minfree

�maxfree

�lru_file_repage

�lru_poll_interval

�v_pinshm

�lgpg_regions

�lgpg_size

JFSEnhanced JFS

(JFS2)

�maxperm

�strict_maxperm

�maxclient

�strict_maxclient

NAME CUR DEF BOOT MIN MAX UNIT TYPE

--------------------------------------------------------------------------------

lru_file_repage 1 1 1 0 1 boolean D

lru_poll_interval 0 0 0 0 60000 milliseconds D

maxclient% 80 80 80 1 100 % memory D

maxfree 1088 1088 1088 8 200K 4KB pages D

maxperm% 80 80 80 1 100 % memory D

minfree 960 960 960 8 200K 4KB pages D

strict_maxclient 1 1 1 0 1 boolean D

strict_maxperm 0 0 0 0 1 boolean D

minperm% 20 20 20 1 100 % memory D

vmo –p –o <parameter name>=<new value>

-p flags updates /etc/tunables/nextboot

Page 11: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

11 © 2009 IBM Corporation April 17, 2009

� The AIX “vmo” command provides for the display and/or

update of several parameters which influence the way AIX

manages physical memory

– The “-a” option displays current parameter settings

� vmo –a

– The “-o” option is used to change parameter values

� vmo –o minfree=1440

– The “-p” option is used to make changes persist across a reboot

� vmo –p –o minfree=1440

Virtual Memory Manager (VMM) Tuning

On AIX 5.3, number of the default “vmo” settings are not optimized for

database workloads and should be modified for Oracle environments

Page 12: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

12 © 2009 IBM Corporation April 17, 200912

Kernel Parameter Tuning – AIX 6.1

� AIX 6.1 configured by default to be ‘correct’ for most

workloads.

� Many tunable are classified as ‘Restricted’:

– Only change if AIX Support says so

– Parameters will not be displayed unless the ‘-F’ option is used for commands like vmo, no, ioo, etc.

� When migrating from AIX 5.3 to 6.1, parameter override

settings in AIX 5.3 will be transferred to AIX 6.1 environment

Page 13: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

13 © 2009 IBM Corporation April 17, 200913

General Memory Tuning

Memory Use 9/20/2007

0

10

20

30

40

50

60

70

80

90

100

08:02

08:06

08:10

08:14

08:18

08:22

08:26

08:30

08:34

08:38

08:42

08:46

08:50

08:54

08:58

09:02

09:06

09:10

09:14

09:18

09:22

09:26

09:30

09:34

09:38

09:42

09:46

09:50

09:54

09:58

10:02

10:06

10:10

10:14

Process% FScache%

� Two primary categories of memory pages: Computational and File System

� AIX will always try to utilize all of the physical memory available (subject to

vmo parameter settings)

– What is not required to support current computational page demand will tend to be used for filesystem cache

– Raw Devices and filesystems mounted (or individual files opened) in DIO/CIO mode do not use filesystem cache

Page 14: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

14 © 2009 IBM Corporation April 17, 200914

VMM TuningDefinitions:

� LRUD= VMM page stealing process (LRU Daemon) – 1 per Memory Pool

� numperm, numclient = used fs buffer cache pages, seen in ‘vmstat –v’

� minperm = target minimum number of pages for fs buffer cache

� maxperm, maxclient = target maximum number of pages for fs buffer cache

Parameters:

� MINPERM% = target min % real memory for fs buffer cache

� MAXPERM%, MAXCLIENT% = target max % real memory for fs buffer cache

� MINFREE = target minimum number of free memory pages

� MAXFREE = target maximum number of free memory pages

When does LRUD start?

� When total free pages (in a given memory pool) < MINFREE

� When (maxclient pages - numclient) < MINFREE

When does LRUD stop?

� When total free pages > MAXFREE

� When (maxclient pages – numclient) > MAXFREE

Page 15: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

15 © 2009 IBM Corporation April 17, 200915

VMM Tuning starting points (AIX 5.2ML4+ or later)

LRU_FILE_REPAGE=0 (default for 5.3 and 6.1)

tells LRUD to page out file pages (filesystem buffer cache) rather than computational pages when numperm > minperm

LRU_POLL_INTERVAL=10 (default for 5.3 and 6.1)

indicates the time period (in milliseconds) after which LRUD pauses and interrupts can be serviced. Default value of “0” means no preemption.

MINPERM%=3 (default for 6.1)

MAXPERM%, MAXCLIENT%=90* (default for 6.1)

STRICT_MAXPERM=0* (default for 5.3 and 6.1)

STRICT_MAXCLIENT=1 (default for 5.3 and 6.1)

* In AIX 5.2 environments with large physical memory, set MAXPERM%, MAXCLIENT% = (2400 / Phys Memory (GB)) and STRICT_MAXPERM=1

Page 16: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

18 April 17, 2009© 2009 IBM Corporation

0%

20%

40%

60%

80%

100%

Time

Physical Memory

numperm% comp% Free% maxperm%maxfree minfree minperm%

Virtual Memory Management (VMM) Thresholds

Start stealing pages when

free memory below minfree

Stop stealing pages when

free memory above maxfree

When numperm% >

maxperm%, steal only file

system pages

When minperm% <

numperm% < maxperm%,

steal file system or

computation pages,

depending on repage rate

When numperm% <

minperm%, steal both file

system and computational

pages

Page 17: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

19 © 2009 IBM Corporation April 17, 2009

VMM Page Stealing Thresholds� Minfree/maxfree values are per memory pool in AIX 5.3 and 6.1

– Total system minfree = minfree * # of memory pools

– Total system maxfree = maxfree * # of memory pools

Most workloads do not require minfree/maxfree changes in AIX 5.3 or 6.1

� minfreeAIX 5.3/6.1: minfree = max(960,120 x # logical CPUs /#mem pools)

AIX 5.2: minfree = 120 x # logical CPUs

Consider increasing if vmstat “fre” column frequently approaches zero or if “vmstat –s” shows significant “free frame waits”

� maxfreeAIX 5.3/6.1: maxfree = max(1088,minfree + (MAX(maxpgahead, j2_maxPageReadAhead) *

# logical CPUs)/ # mem pools)

AIX 5.2: maxfree = minfree + (MAX(maxpgahead, j2_maxPageReadAhead) *

# logical CPUs

Example:

� For a 6-way LPAR with SMT enabled, maxpgahead=8 and j2_maxPageReadAhead=8:

– minfree = 360 = 120 x 6 x 2 / 4

– maxfree = 1536 = 1440 + (max(8,8) x 6 x 2)

� vmo –o minfree=1440 –o maxfree=1536 -p

Page 18: Oracle AIX+Tuning+1

Advanced Technical Support – System p

21 April 17, 2009© 2009 IBM Corporation21

Page Steal Method

� Historically, AIX maintained a single LRU list which contains

both computational and filesystem pages.

– In environments with lots of computational pages that you want to keep in memory, LRUD may have to spend a lot of time scanning the LRU list to find an eligible filesystem page to steal

� AIX 6.1 introduced the ability to maintain separate LRU lists

for computational vs. filesystem pages.

– Also backported to AIX 5.3

� New page_steal_method parameter

– Enabled (1) by default in 6.1, disabled (0) by default in 5.3

– Requires a reboot to change

– Recommended for Oracle DB environments (both AIX 5.3 and 6.1)

Page 19: Oracle AIX+Tuning+1

Advanced Technical Support – System p

22 April 17, 2009© 2009 IBM Corporation22

� Memory cards are associated with every Multi Chip Module (MCM), Dual Core Module (DCM) or Quad Core Module (QCM) in the server

– The Hypervisor assigns physical CPUs to a dedicated CPU LPAR (or shared processor pool) from one or more MCMs, DCMs or DCMs

– For a given LPAR, there will normally be at least 1 memory pool for each MCM, DCM or QCM that has contributed processors to

that LPAR or shared processor pool

� By default, memory for a process is allocated from memory associated with the processor that caused the page fault.

� Memory pool configuration is influenced by the VMO parameter “memory_affinity”

– Memory_affinity=1 means configure memory pools based on physical hardware configuration (DEFAULT)

– Memory_affinity=0 means configure roughly uniform memory pools from any physical location

� Number can be seen with ‘vmstat –v |grep pools’

� Size can only be seen using KDB

� LRUD operates per memory pool

p590 / p595 MCM Architecture

Understanding Memory Pools

Page 20: Oracle AIX+Tuning+1

Advanced Technical Support – System p

23 April 17, 2009© 2009 IBM Corporation23

Memory Affinity…

� Not generally a benefit unless processes are bound to a particular processor

� It can exacerbate any page replacement algorithm issues (e.g. system paging or excessive LRUD scanning activity) if memory pool sizes are unbalanced

� If there are paging or LRUD related issues, try basic vmo parameter or Oracle SGA/PGA tuning first

� If issues remain, use ‘kdb’ to check if memory pool sizes are unbalanced:

KDB(1)> memp *

VMP MEMP NB_PAGES FRAMESETS NUMFRB

memp_frs+010000 00 000 00B1F9F4 000 001 00B073DE

memp_frs+010780 00 003 00001BBC 006 007 00000000

memp_frs+010280 01 001 00221C80 002 003 0021C3CB

memp_frs+010500 02 002 00221C80 004 005 0021CDDE

� If the pool sizes are not balanced, consider disabling Memory Affinity:

# vmo –r –o memory_affinity=0 (requires a reboot)

– IY73792 required for 5300-01 and 5300-02

– Code changes in 5.3 TL5/TL6 solved most memory affinity issues

– Memory_affinity is also a “Restricted” tunable in AIX 6.1

Pages in pool

Free pages

Page 21: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

24 © 2009 IBM Corporation April 17, 2009

AIX 5.3/6.1 Multiple Page Size Support� AIX 5.3 5300-04 introduces two new page sizes:

– 64K

– 16M (large pages) (available since Power 4+)

– 16G (huge pages)

• Requires p5+ hardware• Requires p5 System Release 240, Service Level 202 microcode• 16MB support requires Version 5 Release 2 of the Hardware Management Console

(HMC) machine code

� User/Application must request preferred page size

– 64K page size is very promising, since they do not need to be configured/reserved in advance or pinned

• export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STATSPACK=64K oracle* to use the 64K pagesize for stack, data & text

– Will require Oracle to explicitly request the page size (10.2.0.4 & up)

– If preferred size not available, the largest available smaller size will be used

• Current Oracle versions will end up using 64KB pages even if SGA is not pinned

� Refer: http://www-03.ibm.com/systems/resources/systems_p_os_aix_whitepapers_multiple_page.pdf

Page 22: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

25 © 2009 IBM Corporation April 17, 2009

Large Page Support (optional)

Pinning shared memory

� AIX Parameters

• vmo –p –o v_pinshm = 1• Leave maxpin% at the default of 80% unless the SGA exceeds 77% of real memory

– Vmo –p –o maxpin%=[(total mem-SGA size)*100/total mem] + 3

� Oracle Parameters

• LOCK_SGA = TRUE

Enabling Large Page Support

� vmo –r –o lgpg_size = 16777216 –o lgpg_regions=(SGA size / 16 MB)

Allowing Oracle to use Large Pages

� chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle

Using Monitoring Tools

� svmon –G

� svmon –P

Oracle metalink note# 372157.1

Note: It is recommended not to pin SGA, as long as you had configured the VMM, SGA & PGA properly.

Page 23: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

26 © 2009 IBM Corporation April 17, 2009

Determining SGA size

SGA Memory Summary for DB: test01 Instance: test01 Snaps: 1046 -1047

SGA regions Size in Bytes

------------------------------ ----------------

Database Buffers 16,928,210,944

Fixed Size 768,448

Redo Buffers 2,371,584

Variable Size 1,241,513,984

----------------

sum 18,172,864,960

lgpg_regions = 18,172,864,960 / 16,777,216 = 1084 (rounded up)

Page 24: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

27 © 2009 IBM Corporation April 17, 2009

AIX Paging Space

Allocate Paging Space:

� Configure Server/LPAR with enough physical memory to satisfy memory requirements

� With AIX demand paging, paging space does not have to be large

� Provides safety net to prevent system crashes when memory overcommitted.

� Generally, keep within internal drive or high performing SAN storage

Monitor paging activity:

� vmstat -s

� sar -r

� nmon

Resolve paging issues:

� Reduce file system cache size (MAXPERM, MAXCLIENT)

� Reduce Oracle SGA or PGA (9i or later) size

� Add physical memory

Do not over commit real memory!

Page 25: Oracle AIX+Tuning+1

Advanced Technical Support – System p

28 April 17, 2009© 2009 IBM Corporation

Tuning and Improving System Performance� Adjust the VMM Tuning Parameters

– Key parameters listed on word document

� Implement VMM related Mount Options

– DIO / CIO

– Release behind or read and/or write

� Reduce Application Memory Requirements

� Memory Model

– %Computational < 70% - Large Memory Model – Goal is to adjust tuning parameters to prevent paging

• Multiple Memory pools • Page Space smaller than Memory • Must Tune VMM key parameters

– %Computational > 70% - Small Memory Model – Goal is to make paging as efficient as possible

• Add multiple page spaces on different spindles • Make all pages space the same size to ensure round-robin scheduling • PS = 1.5 computational requirements • Turn off DEFPS • Memory Load Control

� Add additional Memory

Page 26: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

29 © 2009 IBM Corporation April 17, 2009

� AIX Configuration Best Practices for Oracle

–Memory

–CPU

– I/O

–Network

–Miscellaneous

Agenda

Page 27: Oracle AIX+Tuning+1

Advanced Technical Support – System p

30 April 17, 2009© 2009 IBM Corporation30

CPU Considerations

Oracle Parameters based on the # of CPUs

– DB_WRITER_PROCESSES

– Degree of Parallelism

• user level

• table level

• query level

• MAX_PARALLEL_SERVERS or AUTOMATIC_PARALLEL_TUNING (CPU_COUNT * PARALLEL_THREADS_PER_CPU)

– CPU_COUNT

– FAST_START_PARALLEL_ROLLBACK – should be using UNDO instead

– CBO – execution plan may be affected; check explain plan

Page 28: Oracle AIX+Tuning+1

Advanced Technical Support – System p

32 April 17, 2009© 2009 IBM Corporation32

Lparstat command# lparstat -i

• Node Name : erpcc8• Partition Name : -• Partition Number : -• Type : Dedicated• Mode : Capped• Entitled Capacity : 4.00• Partition Group-ID : -• Shared Pool ID : -• Online Virtual CPUs : 4• Maximum Virtual CPUs : 4• Minimum Virtual CPUs : 1• Online Memory : 8192 MB• Maximum Memory : 9216 MB• Minimum Memory : 128 MB• Variable Capacity Weight : -• Minimum Capacity : 1.00• Maximum Capacity : 4.00• Capacity Increment : 1.00• Maximum Physical CPUs in system : 4• Active Physical CPUs in system : 4• Active CPUs in Pool : -• Unallocated Capacity : -• Physical CPU Percentage : 100.00%• Unallocated Weight : -

Page 29: Oracle AIX+Tuning+1

Advanced Technical Support – System p

33 April 17, 2009© 2009 IBM Corporation33

CPU Considerations

� Use SMT with AIX 5.3/Power5 (or later) environments

� Micro-partitioning Guidelines

– Virtual CPUs <= physical processors in shared pool

CAPPED

– Virtual CPUs should be the nearest integer >= capping limit

UNCAPPED

– Virtual CPUS should be set to the max peak demand requirement

– Entitlement >= Virtual CPUs / 3

� DLPAR considerations

Oracle 9i

– Oracle CPU_COUNT does not recognize change in # cpus– AIX scheduler can still use the added CPUs

Oracle 10g

– Oracle CPU_COUNT recognizes change in # cpus� Max CPU_COUNT limited to 3x CPU_COUNT at instance startup

Page 30: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

35 © 2009 IBM Corporation April 17, 2009

CPU: Compatibility Matrix

DLPARSMTMicro-Partition

AIX 5.2

AIX 5.3

AIX 6.1

Oracle 9i

Oracle 10g

Oracle 11g

Note: Oracle RAC 10.2.0.3 on VIOS 1.3.1.1 & AIX 5.3 TL07 and higher are certified

Page 31: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

36 © 2009 IBM Corporation April 17, 2009

� AIX Configuration Best Practices for Oracle

–Memory

–CPU

– I/O

–Network

–Miscellaneous

Agenda

Page 32: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

37 © 2009 IBM Corporation April 17, 2009

Application memory area caches data to

avoid IO

NFS caches file attributes

NFS has a cached filesystem for NFS clients

The AIX IO stack

JFS and JFS2 cache use extra system RAM

JFS uses persistent pages for cache

JFS2 uses client pages for cache

Queues exist for both adapters and disks

Adapter device drivers use DMA for IO

Disk subsystems have read and write cache

Disks have memory to store commands/data

IOs can be coalesced (good) or split up (bad) as they go thru the IO stack

Write cache Read cache or memory area used for IO

Disk

Disk subsystem (optional)

Adapter Device Drivers

Disk Device Drivers

LVM (LVM device drivers)

Raw disks

Raw LVs

Application

Logical file system

JFS JFS2 NFS Other

VMM

Multi-path IO driver (optional)

Page 33: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

40 © 2009 IBM Corporation April 17, 2009

AIX Filesystems Mount options� Journaled File System (JFS)

Better for lots of small file creates & deletes

– Buffer caching (default) provides Sequential Read-Ahead, cached writes, etc.

– Direct I/O (DIO) mount/open option � no caching on reads

� Enhanced JFS (JFS2)

Better for large files/filesystems

– Buffer caching (default) provides Sequential Read-Ahead, cached writes, etc.

– Direct I/O (DIO) mount/open option � no caching on reads

– Concurrent I/O (CIO) mount/open option � DIO, with write serialization disabled

• Use for Oracle .dbf, control files and online redo logs only!!!

� GPFS

Clustered filesystem – the IBM filesystem for RAC

– Non-cached, non-blocking I/Os (similar to JFS2 CIO) for all Oracle files

GPFS and JFS2 with CIO offer similar performance as Raw Devices

Page 34: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

41 © 2009 IBM Corporation April 17, 2009

�Direct IO (DIO) – introduced in AIX 4.3.

• Data is transfered directly from the disk to the application buffer, bypassing the file buffer cache hence avoiding double caching (filesystem cache + Oracle SGA). • Emulates a raw-device implementation.

�To mount a filesystem in DIO$ mount –o dio /data

�Concurrent IO (CIO) – introduced with JFS2 in AIX 5.2 ML1

• Implicit use of DIO. • No Inode locking : Multiple threads can perform reads and writes on the same file at the same time. • Performance achieved using CIO is comparable to raw-devices.

�To mount a filesystem in CIO:$ mount –o cio /data

Bench throughput over run duration – higher

tps indicates better performance.

AIX Filesystems Mount options (Cont’d)

Page 35: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

43 © 2009 IBM Corporation April 17, 2009

CIO/DIO implementation Advices

mount -o rw mount -o rw

mount -o rw mount -o cio *(1)

Cached by Oracle

Cached by AIX

mount -o rw mount -o cio (jfs2 + agblksize=512)

Cached by Oracle

Cached by AIX

mount -o rw mount -o rbrw

Use JFS2 write-behind …

but are not kept in AIX Cache.

mount -o rw mount -o rw

Oracle bin and shared lib.

with Standard mount options with optimized mount options

Cached by AIX Cached by AIX

Oracle Datafiles

Oracle Redolog

Cached by Oracle

Cached by Oracle

Cached by AIXCached by AIX

Cached by AIXOracle Archivelog

Oracle Control files

*(1) : to avoid demoted IO : jfs2 agblksize = Oracle DB block size / n

Page 36: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

44 © 2009 IBM Corporation April 17, 2009

CIO Demotion and Filesystem Block Size

Data Base Files (DBF)

� If db_block_size = 2048 � set agblksize=2048

� If db_block_size >= 4096 � set agblksize=4096

Online redolog files & control files

� Set agblksize=512 and use CIO or DIO

Mount Filesystems with “noatime” option

� AIX/Linux records information about when files were created and last modified as well as last accessed. This may lead to significant I/O performance problems on often accessed frequently changing filessuch as the contents of the /var/spool directory.

Page 37: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

45 © 2009 IBM Corporation April 17, 2009

I/O Tuning (ioo)

READ-AHEAD (Only applicable to JFS/JFS2 with caching enabled)

MINPGAHEAD (JFS) or j2_minPageReadAhead (JFS2)

– Default: 2

– Starting value: MAX(2,DB_BLOCK_SIZE / 4096)

MAXPGAHEAD (JFS) or j2_maxPageReadAhead (JFS2)

– Default: 8 (JFS), 128 (JFS2)

– Set equal to (or multiple of) size of largest Oracle I/O request

• DB_BLOCK_SIZE * DB_FILE_MULTI_BLOCK_READ_COUNT

Number of buffer structures per filesystem:

NUMFSBUFS:

– Default: 196, Starting Value: 568

j2_nBufferPerPagerDevice (j2_dynamicBufferPreallocation replaces)

– Default: 512, Starting Value: 2048

Monitor with “vmstat –v”

Page 38: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

46 © 2009 IBM Corporation April 17, 2009

Data Layout for Optimal I/O Performance

Stripe and mirror everything (SAME) approach:

�Goal is to balance I/O activity across all disks, loops, adapters, etc...

�Avoid/Eliminate I/O hotspots

�Manual file-by-file data placement is time consuming, resource intensive and iterative

Use RAID-5 or RAID-10 to create striped LUNs (hdisks)

Create AIX Volume Group(s) (VG) w/ LUNs from multiple

arrays, striping on the front end as well for maximum

distribution

�Physical Partition Spreading (mklv –e x) –or-

�Large Grained LVM striping (>= 1MB stripe size)

http://www-1.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100319

Page 39: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

47 © 2009 IBM Corporation April 17, 2009

Data Layout cont’d…

Stripe using Logical Volume (LV) or Physical Partition (PP) striping

� LV Striping

– Oracle recommends stripe width of a multiple of

• Db_block_size * db_file_multiblock_read_count• Usually around 1 MB

– Valid LV Strip sizes:

• AIX 5.2: 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1 MB• AIX 5.3: AIX 5.2 Stripe sizes + 2M, 4M, 16 MB, 32M, 64M, 128M

– Use AIX Logical Volume 0 offset (9i Release 2 or later)

• Use Scalable Volume Groups (VGs), or use “mklv –T O” with Big VGs• Requires AIX APAR IY36656 and Oracle patch (bug 2620053)

� PP Striping

– Use minimum Physical Partition (PP) size (mklv -t, -s parms)

• Spread AIX Logical Volume (LV) PPs across multiple hdisks in VG (mklv –e x)

Page 40: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

48 © 2009 IBM Corporation April 17, 2009

Tuning and Improving System Performance

� Adjust the key IOO Tuning Parameters

� Adjust device specific tuning Parameters

� Other I/O tuning Options

– DIO / CIO

– Release behind or read and/or write

– IO Pacing

– Write Behind

� Improve the data layout

� Add additional hardware resources

Page 41: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

50 © 2009 IBM Corporation April 17, 200950

Other I/O Stack Tuning Options (PBUFS)

� When vmstat –v shows increasing “pending disk I/Os blocked with

no pbuf” values

lvmo:

� max_vg_pbuf_count (lvmo) = maximum number of pbufs that may be

used for the VG.

� pv_pbuf_count (lvmo) = the # of pbufs that are added when a PV is

added to the VG.

ioo:

� pv_min_pbuf = The minimum # of pbufs per PV used by LVM

Page 42: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

52 © 2009 IBM Corporation April 17, 200952

Other I/O Stack Tuning Options (Device Level)

lsattr/chdev:

� num_cmd_elems = maximum number of outstanding I/Os for an adapter.

� queue_depth = the maximum # of outstanding I/Os for an hdisk.

Recommended/supported maximum is storage subsystem dependent.

� max_xfer_size = the maximum allowable I/O transfer size (default is 0x100000 or 256k).

Maximum supported value is storage subsystem dependent. Increasing value (to at

least 0x200000) will also increase DMA size from 16 MB to 256 MB.

� dyntrk = When set to yes (recommended), allows for immediate re-routing of I/O

requests to an alternative path when a device ID (N_PORT_ID) change has been

detected.

� fc_err_recov = When set to “fast_fail” (recommended), if the driver receives an RSCN

notification from the switch, the driver will check to see if the device is still on the

fabric and will flush back outstanding I/Os if the device is no longer found.

Page 43: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

54 © 2009 IBM Corporation April 17, 200954

Asynchronous I/O for filesystem environments

AIX parameters

minservers = minimum # of AIO server processes (system wide)– AIX 5.3 default = 1 (system wide), 6.1 default = 3 (per CPU)

maxservers = maximum # of AIO server processes– AIX 5.3 default = 10 (per CPU), 6.1 default = 30 (per CPU)

maxreqs = maximum # of concurrent AIO requests– AIX 5.3 default = 4096, 6.1 default = 65536

“enable” at system restart (Always enabled for 6.1)

Typical 5.3 settings: minservers=100, maxservers=200, maxreqs=65536

– Raw Devices or ASM environments use fastpath AIO

> above parameters do not apply> lsattr -El aio0 and look for the value "fastpath", which should be enabled

– CIO uses fastpath AIO in AIX 6.1

– For CIO fastpath AIO in 5.3 TL5+, set fsfastpath=1

> not persistent across reboot

Oracle parametersdisk_asynch_io = TRUEfilesystemio_options = {ASYNCH | SETALL}db_writer_processes (let default)

Page 44: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

55 © 2009 IBM Corporation April 17, 2009

• Allows multiple requests to be sent without to have to wait until the disk subsystem has completed the physical IO.

• Utilization of asynchronous IO is strongly advised whatever the type of file-system and mount option implemented (JFS, JFS2, CIO, DIO).

IO : Asynchronous IO (AIO)

�Posix vs Legacy

Since AIX5L V5.3, two types of AIO are now available : Legacy and Posix. For the moment, the Oracle code is using the Legacy AIO servers.

aioQ

Application aioserversDisk

1

2

3 4

Page 45: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

56 © 2009 IBM Corporation April 17, 2009

IO : Asynchronous IO (AIO) fastpath

• FS with CIO/DIO and AIX 5.3 TL5+ :� Activate fsfast_path (comparable to fast_path but for FS + CIO/DIO)

AIX 5L : adding the following line in /etc/inittab: aioo:2:once:aioo –o fsfast_path=1

AIX 6.1 : ioo –p –o aio_fsfastpath=1 (default setting)

• Raw Devices / ASM :� check AIO configuration with : lsattr –El aio0 enable asynchronous IO fast_path. : AIX 5L : chdev -a fastpath=enable -l aio0 (default since AIX 5.3)

AIX 6.1 : ioo –p –o aio_fastpath=1 (default setting)

Application Disk

1

2

3

With fast_path, IO are queued directly from the application into the LVM layer without any

“aioservers kproc” operation.

� Better performance compare to non-fast_path� No need to tune the min and max aioservers� No ioservers proc. => “ps –k | grep aio | wc –l” is not relevent, use “iostat –A” instead

AIX Kernel

Page 46: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

57 © 2009 IBM Corporation April 17, 200957

Asynchronous I/O for filesystem environments…

Monitor Oracle usage:

• Watch alert log and *.trc files in BDUMP directory for warning message:

“Warning “lio_listio returned EAGAIN”

� If warning messages found, increase maxreqs and/or maxservers

Monitor from AIX:

• “pstat –a | grep aios”

• Use “-A” option for NMON

• iostat –Aq (new in AIX 5.3)

Page 47: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

58 © 2009 IBM Corporation April 17, 200958

GPFS I/O Related Tunables� Refer Metalink note 302806.1

Async I/O:

� Oracle parameter filesystemio_options is ignored

� Set Oracle parameter disk_asynch_io=TRUE

� Prefetchthreads= exactly what the name says

– Usually set prefetchthreads=64 (the default)

� Worker1threads = GPFS asynch I/O

– Set worker1threads=550-prefetchthreads

� Set aio maxservers=(worker1threads/#cpus) + 10

Other settings:

� GPFS block size is configurable; most will use 512KB-1MB

� Pagepool – GPFS fs buffer cache, not used for RAC but may be for binaries. Default=64M

mmchconfig pagepool=100M

Page 48: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

59 © 2009 IBM Corporation April 17, 200959

I/O Pacing

� I/O Pacing parameters can be used to prevent large I/O

streams from monopolizing CPUs

– System backups (mksysb)

– DB backups (RMAN, Netbackup)

– Software patch updates

� When Oracle ClusterWare is used, use AIX 6.1 Defaults:

– chgsys -l sys0 -a maxpout=8193 minpout=4096 (AIX 6.1 defaults)

– nfso –o nfs_iopace_pages=1024 (AIX 6.1 defaults)

– On the Oracle clusterware set : crsctl set css diagwait 13 –force

• This will delay the OPROCD reboot time to 10secs from 0.5secs during node eviction/reboot, just enough to write the log/trace files for future diagnosis. Metalinknote# 559365.1

Page 49: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

60 © 2009 IBM Corporation April 17, 200960

ASM configurations

AIX parameters

– Async I/O needs to be enabled, but default values may be used

ASM instance parameters

– ASM_POWER_LIMIT=1

Makes ASM rebalancing a low-priority operation. May be changed dynamically. It is common to set this value to 0, then increase to a higher value during maintenance windows

– PROCESSES=25+ 15n, where n=# of instances using ASM

DB instance parameters

– disk_asynch_io=TRUE

– filesystemio_options=ASYNCH

– Increase Processes by 16

– Increase Large_Pool by 600k

– Increase Shared_Pool by [(1M per 100GB of usable space) + 2M]

Page 50: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

61 © 2009 IBM Corporation April 17, 2009

� AIX Configuration Best Practices for Oracle

–Memory

–CPU

– I/O

–Network

–Miscellaneous

Agenda

Page 51: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

62 © 2009 IBM Corporation April 17, 2009

Network Options (no) Parameters

– Set sb_max >= 1 MB (1048576)

– Set tcp_sendspace >= 262144

– Set tcp_recvspace >= 262144

– Set rfc1323=1

If isno=1, check to see if settings have been overridden at the network interface level:

$ no -a | grep isno use_isno=1

use_isno=1

$ lsattr -E -l en0 -H

attribute value description

rfc1323 N/A

tcp_nodelay N/A

tcp_sendspace N/A

tcp_recvspace N/A

tcp_mssdflt N/A

Page 52: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

63 © 2009 IBM Corporation April 17, 2009

Additional Network (no) Parameters for RAC:

� Set udp_sendspace = db_block_size * db_file_multiblock_read_count

(not less than 65536)

� Set udp_recvspace = 10 * udp_sendspace

– Must be < sb_max

� Increase if buffer overflows occur

� Ipqmaxlen=512 for GPFS environments

� Use Jumbo Frames if supported at the switch layer

Examples:

� no -a |grep udp_sendspace

� no –o -p udp_sendspace=65536

� netstat -s |grep "socket buffer overflows"

Page 53: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

64 © 2009 IBM Corporation April 17, 2009

� AIX Configuration Best Practices for Oracle

–Memory

– I/O

–Network

–Miscellaneous

Agenda

Page 54: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

65 © 2009 IBM Corporation April 17, 2009

Miscellaneous parameters

� User Limits (smit chuser)

– Soft FILE size = -1 (Unlimited)

– Soft CPU time = -1 (Unlimited)

– Soft DATA segment = -1 (Unlimited)

– Soft STACK size -1 (Unlimited)

– /etc/security/limits

� Maximum number of PROCESSES allowed per user (smit chgsys)

– maxuproc >= 2048

� Environment variables:

– AIXTHREAD_SCOPE=S

– LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STATSPACK=64K oracle*

Page 55: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

71 © 2009 IBM Corporation April 17, 2009

Q&A

Page 56: Oracle AIX+Tuning+1

IBM Advanced Technical Support - Americas

72 © 2009 IBM Corporation April 17, 2009

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: AS/400, DBE, e-business logo, ESCO, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA, Websphere, xSeries, z/OS, zSeries, z/VM

The following are trademarks or registered trademarks of other companies

Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development CorporationJava and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countriesLINUX is a registered trademark of Linux TorvaldsUNIX is a registered trademark of The Open Group in the United States and other countries.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.Intel is a registered trademark of Intel Corporation* All other products may be trademarks or registered trademarks of their respective companies.

NOTES:

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.

The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

Trademarks