Click here to load reader

Disk IO Tuning in AIX 6.1

  • View

  • Download

Embed Size (px)

Text of Disk IO Tuning in AIX 6.1

IBM Power Systems Technical UniversityOctober 1822, 2010 Las Vegas, NVIBM Systems Group

Disk IO Tuning in AIX 6.1Session ID: PE23Author: Dan Braden [email protected] Presenter: Steve Nasypany AIX Advanced Technical Skills

2010 IBM Corporation

2010 IBM Corporation

AgendaThe importance of IO tuning Disk basics and performance overview AIX IO stack Data layout Characterizing application IO Disk performance monitoring tools Testing disk subsystem performance Tuning

2010 IBM Corporation

Why is disk IO tuning important?Moore's law Processors double in price performance every 18 months Disk growth Disk densities are doubling every 12 months Customers are doubling storage capacities every 12-18 months Actuator and rotational speed increasing relatively slowly Network bandwidth - doubling every 6 months Approximate CPU cycle time Approximate memory access time Approximate disk access time 0.0000000005 seconds 0.000000270 seconds 0.010000000 seconds

Memory access takes 540 CPU cycles Disk access takes 20 million CPU cycles, or 37,037 memory accesses System bottlenecks are being pushed to the disk Disk subsystems are using cache to improve IO service times Customers now spend more on storage than on servers 2010 IBM Corporation

Why is disk IO tuning important?Seagate 15k RPM/3.5" Drive Specifications

+35% 450Capacity (GB) Max Sustained DR (MB/s) Read Seek (ms)

171 +15% 73 75 3.6





Disk IO service time not improving compared to processors

2010 IBM Corporation

Performance metricsDisk metrics MB/s IOPS With a reasonable service time Application metrics Response time Batch job run time System metrics CPU, memory and IO Size for your peak workloads Size based on maximum sustainable thruputs Bandwidth and thruput sometimes mean the same thing, sometimes not For tuning - it's good to have a short running job that's representative of your workload 2010 IBM Corporation

Performance metricsUse a relevant metric for testing Should be tied to business costs, benefits or requirements Batch job run time Maximum or sufficient application transactions/second Query run time Metrics that typically are not so relevant Application transaction time if < a few seconds Metrics indicating bottlenecks CPU, memory, network, disk Important if the application metric goal isnt met Be aware of IO from other systems affecting disk performance to shared disk If benchmarking two systems, be sure the disk performance is apples to apples and youre not really comparing disk subsystem performance

2010 IBM Corporation

Disk performanceZBR Geometry

Interface typeATA SATA SCSI FC SAS 2010 IBM Corporation

IO service times are predominately seek + rotational latency + queueing time

Disk performanceWhen do you have a disk bottleneck? Random workloads Reads average > 15 ms With write cache, writes average > 2.5 ms Sequential workloads Two sequential IO streams on one disk You need more thruputIOPS vs IO service time - 15,000 RPM disk IO service time (ms)500 400 300 200 100 0 25 50 75 100 125 150 175 200 225 250 275 300 325

IOPS 2010 IBM Corporation

How to improve disk performanceReduce the number of IOs Bigger caches Application, file system, disk subsystem Use caches more efficiently No file system logging No access time updates Improve average IO service times Better data layout Reduce locking for IOs Buffer/queue tuning Use SSDs or RAM disk Faster disks/interfaces, more disks Short stroke the disks and use the outer edge Smooth the IOs out over time Reduce the overhead to handle IOs 2010 IBM Corporation

What is %iowait?A misleading indicator of disk performance A type of CPU idle Percent of time the CPU is idle and waiting on an IO so it can do some more work High %iowait does not necessarily indicate a disk bottleneck Your application could be IO intensive, e.g. a backup You can make %iowait go to 0 by adding CPU intensive jobs Low %iowait does not necessarily mean you don't have a disk bottleneck The CPUs can be busy while IOs are taking unreasonably long times If disk IO service times are good, you arent getting the performance you need, and you have significant %iowait consider using SSDs or RAM disk Improve performance by potentially reducing %iowait to 0

2010 IBM Corporation

Solid State Disk (SSD)High performance electronic disk From 14,000 27,000 IOPS possible for a single SSD SSD IO bandwidth varies across Power and disk subsystems Typically small (69-177 GB) and expensive compared to HDDs Read or write IOs typically < 1 ms About the same IO service time as compared to writes to disk subsystem cache About 5-15X faster than reads from disk Positioned for high access density (IOPS/GB) random read data Implementation involves finding the best data to place on the SSDs SSDs can save disk costs by reducing the number of spindles needed When high access density data exists A mix of SSDs and HDDs is often best

2010 IBM Corporation

SSD vs. HDD performanceSSD offers up to 33x 125x more IOPS125X

HDD IO service time typically 5X to 40X slower*40X

33X 5X 1X HDD SSD 1X HDD SSD 2010 IBM Corporation

Access time is drive-to-drive, ignoring any caching by SAS controller

RAM diskUse system RAM to create a virtual disk Data is lost in the event of a reboot or system crash IOs complete with RAM latencies For file systems, it takes away from file system cache Taking from one pocket and putting it into another A raw disk or file system only no LVM support# mkramdisk 16M /dev/rramdisk0 # mkfs -V jfs2 /dev/ramdisk0 mkfs: destroy /dev/ramdisk0 (yes)? y File system created successfully. 16176 kilobytes total disk space. Device /dev/ramdisk0: Standard empty filesystem Size: 32352 512-byte (DEVBLKSIZE) blocks # mkdir /ramdiskfs # mount -V jfs2 -o log=NULL /dev/ramdisk0 /ramdiskfs # df -m /ramdiskfs Filesystem MB blocks Free %Used Iused %Iused Mounted on /dev/ramdisk0 16.00 15.67 3% 4 1% /ramdiskfs

2010 IBM Corporation

The AIX IO stackApplication Logical file system Raw disks Raw LVsNFS caches file attributes NFS has a cached filesystem for NFS clients JFS and JFS2 cache use extra system RAM JFS uses persistent pages for cache JFS2 uses client pages for cache Application memory area caches data to avoid IO




VMM LVM (LVM device drivers)

Multi-path IO driver (optional) Disk Device Drivers Queues exist for both adapters and disks Adapter Device Drivers Adapter device drivers use DMA for IO Disk subsystem (optional) Disk subsystems have read and write cache Disks have memory to store commands/data Disk Read cache or memory area used for IO Write cacheIOs can be coalesced (good) or split up (bad) as they go thru the IO stack IOs adjacent in a file/LV/disk can be coalesced IOs greater than the maximum IO size supported will be split up 2010 IBM Corporation

Synchronous vs Asynchronous IOsDefinition depends on the frame of reference Programmers/application When an application issues a synchronous IO, it waits until the IO is complete Asynchronous IOs are handed off to the kernel, and the application continues, and uses the AIO facilities in AIX When a group of asynchronous IOs complete, a signal is sent to the application Allows IO and processing to run simultaneously Filesystem IO Synchronous write IOs to a file system must get to disk Asynchronous IOs only need to get to file system cache GLVM or disk subsystem mirroring Synchronous mirroring requires that writes to both mirrors complete before returing an acknowledgement to the application Asynchronous mirroring returns an acknowledgement when the write completes at the local storage Writes to remote storage are done in the same order as locally 2010 IBM Corporation

Data layoutData layout affects IO performance more than any tunable IO parameter Good data layout avoids dealing with disk hot spots An ongoing management issue and cost Data layout must be planned in advance Changes are often painful iostat and filemon can show unbalanced IO Best practice: evenly balance IOs across all physical disks Random IO best practice: Spread IOs evenly across all physical disks For disk subsystems Create RAID arrays of equal size and RAID level Create VGs with one LUN from every array Spread all LVs across all PVs in the VG The SVC can, and XIV does do this automatically 2010 IBM Corporation

Random IO data layoutDisk subsystem

1 2 3 4 5RAID array LUN or logical disk PV






datavg# mklv lv1 e x hdisk1 hdisk2 hdisk5 # mklv lv2 e x hdisk3 hdisk1 . hdisk4 .. Use a random order for the hdisks for each LV

2010 IBM Corporation

Data layout for sequential IOMany factors affect sequential thruput RAID setup, number of threads, IO size, reads vs. writes Create RAID arrays with data stripes a power of 2 RAID 5 arrays of 5 or 9 disks RAID 10 arrays of 2, 4, 8, or 16 disks Do application IOs equal to, or a multiple of, a full stripe on the RAID array Or use multiple threads to submit many IOs N disk RAID 5 arrays can handle no more than N-1 sequential IO streams before the IO becomes randomized N disk RAID 10 arrays can do N sequential read IO streams and N/2 sequential write IO streams before the IO becomes randomized Sometimes smaller strip sizes (around 64 KB) perform better Test your setup if the bandwidth needed is high

2010 IBM Corporation

Data layoutBest practice for VGs and LVs Use Big or Scalable VGs Both support no LVCB header on LVs (only important for raw LVs) These can lead to issues with IOs split across physical disks Big VGs require using mklv T O option to eliminate LVCB Scalable VGs have no LVCB Only Sc

Search related