Upload
laura-hood
View
1.228
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Slides from Episode 4 of the DB2 pureScale webcast series with the IBM lab team.
Citation preview
DB2 pureScale Performance
Steve [email protected] Oct 19, 2010
2Copyright IBM 2010
Agenda
� DB2 pureScale technology review
� RDMA and low-latency interconnect
� Monitoring and tuning bufferpools
in pureScale
� Architectural features for top
performance
� Performance metrics
3Copyright IBM 2010
Cluster Interconnect
DB2 pureScale : Technology Review
Single Database View
Clients
Database
Log Log Log Log
Shared Storage Access
CS CS CSCS
CS CS
CS
Member Member Member Member
Primary2nd-ary
DB2 engine runs on several host computers– Co-operate with each other to provide coherent access to the
database from any member
Data sharing architecture – Shared access to database– Members write to their own logs– Logs accessible from another host (used during recovery)
Cluster Caching Facility (CF) technology from STG– Efficient global locking and buffer management– Synchronous duplexing to secondary ensures availability
Low latency, high speed interconnect– Special optimizations provide significant advantages on RDMA-
capable interconnects like Infiniband
Clients connect anywhere,…… see single database
– Clients connect into any member– Automatic load balancing and client reroute may change
underlying physical member to which client is connected
Integrated cluster services– Failure detection, recovery automation, cluster file system– In partnership with STG (GPFS,RSCT) and Tivoli (SA MP)
Leverage IBM’s System z Sysplex Experience and Know-How
4Copyright IBM 2010
DB2 pureScale and low-latency interconnect
� Infiniband & uDAPL provide thelow-latency RDMA infrastructure
exploited by pureScale
� pureScale currently usesDDR and QDR IB adapters according to platform
– Peak throughput of about2-4 M messages per second
– Provide message latenciesin the 10s of microseconds
or even lower
� The Infiniband development roadmap indicates continued increases in bit rates
Infiniband Roadmap from www.infinibandta.org
5Copyright IBM 2010
Two-level page buffering – data consistency & improved performance
� The local bufferpool (LBP) caches both read-only and updated pages for that member
� The shared GBP contains references to every page in all LBPs across the cluster
– References ensure consistency across members – who’s interested in which pages, in case the pages are updated
� The GBP also contains copies of all updated pages from the LBPs
– Sent from the LBP at transaction commit time– Stored in the GBP & available to members on
demand– 30 µs page read request over Infiniband from
the GBP can be more than 100x faster than reading from disk
� Statistics are kept for tuning– Found in LBP vs. found in GBP vs. read
from disk– Useful in tuning GBP / LBP sizes
CF
M1 M2 M3
10µs
3
����
5000µs
60µs
����30µs
2
����30µs
1
Expensive disk reads from M1,
M2 not required –get the modified
page from the CF
#1
#2
#3
#4#5
6Copyright IBM 2010
pureScale bufferpool monitoring and tuning
� Familiar DB2 hit ratio calculations are useful with pureScale– HR = (logical reads – physical reads) / logical reads
e.g. (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads
– As usual, physical reads come from disk, logical reads from the bufferpool (in pureScale, either this means either the LBP or GBP)
e.g., pool_data_l_reads = pool_data_lbp_pages_found +
pool_data_gbp_l_reads
� New metrics in pureScale support breaking this down by LBP & GBP amounts– pool_data_lbp_pages_found = logical data reads satisfied by the LBP
• i.e., we needed a page, and it was present & valid in the LBP
– pool_data_gbp_l_reads = logical data reads attempted at the GBP
• i.e., either not present or not valid in the LBP, so we needed to go to the GBP
– pool_data_gbp_p_reads = physical data reads due to page not present in
either the LBP or GBP• Essentially the same as non-pureScale pool_data_p_reads
– pool_data_gbp_invalid_pages = number of GBP data read attempts due
to an LBP page being present but marked invalid
• An indicator of the rate of GBP updates & their impact on the LBP
Of course, there are index
ones too
7Copyright IBM 2010
pureScale bufferpool monitoring
� Overall (and non-pureScale) hit ratio– (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads
– Great values: 95% for index, 90% for data– Good values: 80-90% for index, 75-85% for data
� LBP hit ratio– (pool_data_lbp_pages_found / pool_data_l_reads) * 100%
– Generally lower than the overall hit ratio, since it excludes GBP hits
– Factors which may affect it, other than LBP size• Increases with greater portion of read activity in the system
– Decreasing probability that LBP copies of the page have been invalidated• May decrease with cluster size
– Increasing probability that another member has invalidated the LBP page
� GBP hit ratio– (pool_data_gbp_l_reads – pool_data_gbp_p_reads) /
pool_data_gbp_l_reads
– A hit here is a read of a previously modified page, so hit ratios are typically quite low• An overall (LBP+GBP) H/R in the high 90's can correspond to a GBP H/R in the
low 80's– Factors which may affect it, other than GBP size
• Decreases with greater portion of read activity
8Copyright IBM 2010
pureScale bufferpool tuning
Step 1: typical rule-of-thumb for GBP size = 35-40% of Σ( all members’ LBP sizes )
e.g. 4 members, LBP size of 1M pages each -> GBP size of 1.4 to 1.6M pagesNB - don't forget, GBP page size is always 4kB, no matter what the LBP page size is.
– If your workload very read-heavy (e.g. 90% read), initial GBP allocation could be in the 20-30% range
– For 2-member clusters, you may want to start with 40-50% of total LBP, vs. 35-40%
Step 2: monitor the overall BP hit ratio as usual, with pool_data_l_reads,
pool_data_p_reads, etc.
– Meets your goals? If yes, then done!
Step 3: check LBP H/R with pool_data_lbp_pages_found/pool_data_l_reads
– Great values: 90% for index, 85% for data
– Good values: 70-80% for index, 65-80% for data– Increasing LBP size can help increase LBP H/R– NB – for each 16 extra LBP pages, the GBP needs 1 extra page for registrations
Step 4: check GBP H/R with pool_data_gbp_l_reads, pool_data_gbp_p_reads, etc.
– Great values: 90% for index, 80% for data– Good values: 65-80% for index, 60-75% for data– pool_data_l_reads > 10 x pool_data_gbp_l_reads means low GBP
dependence – may mean tuning GBP size in this case is less valuable– pool_data_gbp_invalid_pages > 25% of pool_data_gbp_l_reads means
GBP is really helping out, and could benefit from extra pages
9Copyright IBM 2010
� Page lock negotiation – or Psst! Hey buddy, can you pass me that page?
– pureScale page locks are physical locks, indicating which member currently ‘owns’the page. Picture the following:
• Member A : acquires a page P and modifies a row on it, and continues with its transaction. ‘A’ holds an exclusive page lock on page P until ‘A’ commits
• Member B : wants to modify a different row on the same page P. What now?
– ‘B’ doesn’t have to wait until ‘A’ commits & releases the page lock• The CF will negotiate the page back from ‘A’ in the middle of ‘A’s transaction,
on ‘B’s behalf• Provides far better concurrency & performance than needing to wait for a page
lock until the holder commits.
Log P
P
pureScale architectural features for optimum performance
P P
Member A Member B
Log
P ?P !
CF
GLM
Px: A: B
10Copyright IBM 2010
pureScale architectural features for optimum performance
� Table append cache and index page cache
– What happens in the case of rapid inserts into a single table bymultiple members? Or rapid index updates?
Will it cause the insert page to ‘thrash’ back & forth between the members, each time one has a new row?
– No - each member sets aside an extent for insertion into the table to eliminate contention & page thrashing. Similarly for indexes with the page cache
� Lock avoidance– pureScale exploits cursor stability (CS) locking semantics to
avoid taking locks in many common cases– Reduces pathlength and saves trips to the CF– Transparent & always on
11Copyright IBM 2010
Notes on storage configuration for performance
� GPFS best practices
– Automatically configured by db2cluster command• Blocksize >= 1 MB (vs. default 64k) provides
noticeably improved performance• Direct (unbuffered) IO for both logs & tablespace
containers
• SCSI-3 P/R on AIX enables faster disk takeover on member failure
– Separate paths for logs & tablespaces are recommended
� Dominant storage performance factor for pureScale: fast log writes
– Always important in OLTP– Extra important in pureScale due to log flushes driven
by page reclaims
– Separate filesystems, separate devices from each other & from tablespaces
– Ideally – comfortably under 1ms – Possibly even SSDs to keep write latencies as
low as possible
12Copyright IBM 2010
12 Member Scalability Example
� Moderately heavy transaction processing
workload modeling warehouse & ordering process
– Write transactions rate 20%– Typical read/write ratio of many OLTP
workloads
� No cluster awareness in the app– No affinity
– No partitioning
– No routing of transactions to members
� Configuration– Twelve 8-core p550 members, 64 GB, 5 GHz
– IBM 20Gb/s IB HCAs + 7874-024 IB Switch– Duplexed PowerHA pureScale across 2 additional
8-core p550s, 64 GB, 5 GHz
– DS8300 storage 576 15K disks, Two 4Gb FC Switches
1Gb EthernetClient
Connectivity
20Gb IB pureScale
Interconnect7874-024
Switch
Two 4Gb FC Switches
DS8300Storage
p550 members
p550 Cluster Caching Facility
Clients (2-way x345)
13Copyright IBM 2010
12 Member Scalability Example - Results
0123456789
101112
0 5 10 15
1.98x @ 2 members
3.9x @ 4 members
# Members
Th
rou
ghp
ut vs. 1
mem
be
r
7.6x @ 8 members
10.4x @ 12 members
14Copyright IBM 2010
DB2 pureScale Architecture Scalability
� How far will it scale?
� Take a web commerce type workload– Read mostly but not read only – about 90/10
� Don’t make the application cluster aware– No routing of transactions to members– Demonstrate transparent application scaling
� Scale out to the 128 member limit and measure scalability
15Copyright IBM 2010
The 128-member result
64 Members
95% Scalability
16 Members
Over 95%
Scalability
2, 4 and 8
Members Over
95% Scalability
32 Members
Over 95%
Scalability
88 Members
90% Scalability
112 Members
89% Scalability
128 Members
84% Scalability
16Copyright IBM 2010
Summary
� Performance & scalability are two top goals of pureScale
– many architectural features were designed solely to drive the best possible performance
� Monitoring and tuning for pureScale extends existing DB2 interfaces and practices– e.g., techniques for optimizing GBP/LBP
configuration builds on steps already familiar to DB2 DBAs.
� The pureScale architecture exploits leading-edge low latency interconnects and RDMA, to achieve
excellent performance & scalability– Initial 12- & 128-member proofpoints are
strong evidence of a successful first release, with even better things to come!
17Copyright IBM 2010
Questions