37
© 2009 IBM Corporation December 17, 2009 Introducing DB2 pureScale

Introducing DB2 pureScale · PowerHA pureScale. Information Management 20 ©2009 IBM Corporation Member 1Member 1 Page 1001 Member 2Member 2 Page 1001 CFCF Page 1001 4 …

  • Upload
    vuthuan

  • View
    222

  • Download
    6

Embed Size (px)

Citation preview

© 2009 IBM CorporationDecember 17, 2009

Introducing

DB2 pureScale

Information Management

© 2009 IBM Corporation2

Agenda

� What is pureScale– Where Does it come from

– Test Results

� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture

– Scalability

– Availability• Automatic Workload Balancing & Client Routing

� Configuration and Monitoring– Cluster configuration and operational status

– Monitoring data

� Questions

Information Management

© 2009 IBM Corporation3

Agenda

� What is pureScale– Where Does it come from

– Test Results

� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture

– Scalability

– Availability• Automatic Workload Balancing & Client Routing

� Configuration and Monitoring– Cluster configuration and operational status

– Monitoring data

� Questions

Information Management

© 2009 IBM Corporation4

DB2 pureScale

� Unlimited Capacity– Buy only what you need, add capacity as

your needs grow

� Application Transparency– Avoid the risk and cost of

application changes

� Continuous Availability– Deliver uninterrupted access to your data

with consistent performance

Reduce the risk and cost of meeting changing business demands

Information Management

© 2009 IBM Corporation5

DB2 pureScale – Takeaway Themes

� It’s Easy– To install

– To administer

– To grow

– To run applications

– Automated Recovery

� It Performs– Linear Scalability

– Availability with no database freeze

– Efficient Centralized Locking and Caching

� It’s Meets OnDemand Requirements– Adding more horsepower

– Pricing … Pay for what you use when you use it

Reduce the risk and cost of meeting changing business demands

Information Management

© 2009 IBM Corporation6

DB2 pureScale Architecture

Global lock and memory manager technology

Automatic workload balancing

Shared Data

InfiniBand network

Cluster of DB2 members

Integrated Clustering Services

Information Management

© 2009 IBM Corporation7

DB2 pureScale is Easy to Deploy

Single installation for all components

Monitoring integrated into Optim tools

Single installation for fixpaks and updates

Simple commands to add and remove members

Information Management

© 2009 IBM Corporation8

Need more… just

deploy another

server and then

turn off DB2 when

you’re done.

Issue:

All year, except for

two days, the

system requires 3

servers of capacity.

Solution:

Use DB2 pureScale

and add another

server for those two

days, and only pay

sw license fees for

the days you use it.

Unlimited Capacity and Easy Scalability

� DB2 pureScale has been designed to grow to whatever

capacity your business requires

� Flexible licensing designed for minimizing costs of peak times

� Only pay for additional capacity when you use it even if for

only a single day

Over 100+ node architecture validation has been run by IBM� Add Members without changing applications and without

administrative complexity

Information Management

© 2009 IBM Corporation9

Proof of DB2 pureScale Architecture Scalability

� How far will it scale?

� Take a web commerce type workload– Read mostly but not read only

� Don’t make the application cluster aware– No routing of transactions to members

– Demonstrate transparent application scaling

Information Management

© 2009 IBM Corporation10

The Result

64 Members

95% Scalability

16 Members

Over 95%

Scalability

2, 4 and 8

Members Over

95% Scalability

32 Members

Over 95%

Scalability

88 Members

90% Scalability

112 Members

89% Scalability

128 Members

84% Scalability

Information Management

© 2009 IBM Corporation11

Dive Deeper into a 12 Member Cluster

� Looking at more challenging workload with more

updates – 1 update transaction for every 4 read transactions

– Typical read/write ratio of many OLTP workloads

� No cluster awareness in the application– No routing of transactions to members

– Demonstrate transparent application scaling

� Redundant system– 14 8-core p550s including duplexed PowerHA pureScale™

� Scalability remains above 90%

Information Management

© 2009 IBM Corporation12

Application Transparency

� Take advantage of extra capacity instantly– No need to modify your application code

– No need to tune your database infrastructure

Developers don’t even need to know more nodes are being added

Administrators can add capacity without re-tuning or re-testing

Information Management

© 2009 IBM Corporation13

Continuous Availability

Online Recovery– A key DB2 pureScale design point is

to maximize availability during failure recovery processing

– When a database member fails, only data in-flight on the failed member remains locked• In-flight = data modifications that are

part of active transactions on the member at the time it failed

– Redistribute workload to surviving nodes immediately

% of Data Available

Time (~seconds)

Only data in-flight updates

locked during recovery

Database member

failure (unplanned)

100

50

Stealth Maintenance– Allow DBAs to apply maintenance

without negotiating an outage

window

Information Management

© 2009 IBM Corporation14

Agenda

� What is pureScale– Where Does it come from

– Test Results

� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture

– Scalability

– Availability• Automatic Workload Balancing & Client Routing

� Configuration and Monitoring– Cluster configuration and operational status

– Monitoring data

� Questions

Information Management

© 2009 IBM Corporation15

Cluster Interconnect

DB2 pureScale : Technology Overview

Single Database View

Clients

Shared Database

Log Log Log Log

Shared Storage Access

CS CS CSCS

CS CS

CS

Member Member Member Member

Primary2nd-ary

DB2 engine runs on several host computers– Co-operate with each other to provide coherent access to the

database from any member

Data sharing architecture– Shared access to database– Members write to their own logs on shared disk– Parallel File System (GPFS bundled)– Logs accessible from another host (used during recovery)– SCSI 3 Persistent Reserve recommended

PowerHA pureScale technology– Efficient global locking and buffer management– Synchronous duplexing to secondary ensures availability

Low latency, high speed interconnect– Special optimizations provide significant advantages on RDMA-

capable interconnects (eg. Infiniband)

Clients connect anywhere,…… see single database– Clients connect into any member– Automatic load balancing and client reroute may change

underlying physical member to which client is connected

Integrated cluster services– Failure detection, recovery automation, cluster file system– In partnership with STG and Tivoli

Information Management

© 2009 IBM Corporation16

Agenda

� What is pureScale– Where Does it come from

– Test Results

� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture

– Scalability

– Availability• Automatic Workload Balancing & Client Routing

� Configuration and Monitoring– Cluster configuration and operational status

– Monitoring data

� Questions

Information Management

© 2009 IBM Corporation17

Group Buffer

Pool

Buffer Pool

Member 1

What Happens in DB2 pureScale to Read a Page

� Agent on Member 1 wants to read page 5011. db2agent checks local buffer pool: page not found

2. db2agent performs Read And Register (RaR) RDMA call

directly into PowerHA pureScale memory– No context switching, no kernel calls.

– Synchronous request to PowerHA pureScale

3. If PowerHA pureScale has the page, it returns the page to the

Member. Here, PowerHA pureScale replies that it does not

have the page (again via RDMA)

4. db2agent reads the page from disk

db2agent

1

2

3

4

501 501

PowerHA pureScale

Much more scalable, does not require locality of data

REG

Information Management

© 2009 IBM Corporation18

DB2 pureScale - Member 1 Updates a Row

1. Agent makes a Set Lock State (SLS) RDMA call to PowerHA pureScale for

X-lock on the row and P-lock to indicate the page will be updated– Prevents other members from making byte changes to the page at the exact same

time as this member

– SLS call takes as little as 15 microseconds end to end

2. PowerHA pureScale responds via RDMA with grant of lock request

3. Page updated

� At this point Member 1 does not

need to do anything else– P-lock is only released in a

lazy fashion

– If another Member wants it, they

can have it but otherwise

member 1 keeps it until commit

Member 2Member 2

Buffer pool GBP

db2agent

2

Member 1Member 1

3

Buffer pool

db2agent

501

501

GLM

501

1 REG REG

X - P

PowerHA pureScale

Information Management

© 2009 IBM Corporation19

DB2 pureScale - Member 1 Commits Their Update

1. Agent makes a Write And Register Multiple (WARM) RDMA call to

PowerHA pureScale for the pages it has updated

2. PowerHA pureScale will pull all the pages that have been updated directly

from the memory address of Member 1 into its global buffer pool– P-Locks released if not already released (as are X-locks for the row)

3. PowerHA pureScale invalidates

the page in all other members that

have read this page by directly updating

a bit in the other members’

buffer pool directory– Before a member accesses

this changed page again it

must get the current copy

from the Global Buffer Pool

– SILENT INVALIDATION: no CPU cycles

– No interrupt or other message processing

– Increasingly important as cluster grows

Member 2Member 2

Buffer pool GBP

db2agent

2

3

Member 1Member 1

Buffer pool

db2agent

501

501

GLM

501

1

501

501REG REG

X - P

PowerHA pureScale

Information Management

© 2009 IBM Corporation20

Member 1Member 1Page 1001

Member 2Member 2Page 1001

CFCF Page 1001

SW398122Huras4

NW849291Sachedina3

SW450321Zikopoulos2

NE111000Eaton1

Page 1001

UPDATE T1 SET C3 = 111000 WHERE C1 = 1

UPDATE T1 SET C4 = SE WHERE C1 = 4

RaR 1001

Member 1 Member 2

RaR 1001

SLS X_row1 P_1001

Valid = N

P-Lock

SW398122Huras4

NW849291Sachedina3

SW450321Zikopoulos2

NE109351Eaton1

P-Lock

SW398122Huras4

NW849291Sachedina3

SW450321Zikopoulos2

NE109351Eaton1

SLS X_row4

P_1001

NE111000Eaton1

SE398122Huras4

Release P-Lock and pull page

NE111000Eaton1

powerHA pureScale

DB2 pureScale - Two Members Update Same Page

Information Management

© 2009 IBM Corporation21

Summary on Achieving Efficient Scaling

� Deep RDMA exploitation over low latency fabric– Enables round-trip response time as low as 10-15 microseconds(lock request)

� Silent Invalidation– Informs members of page updates requires no CPU cycleson those members

– No interrupt or other message processing required

– Increasingly important as cluster grows

� Hot pages available without disk I/O from GBP memory– RDMA and dedicated threads enable read page operations in ~10s of microseconds

Information Management

© 2009 IBM Corporation22

Agenda

� What is pureScale– Where Does it come from

– Test Results

� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture

– Scalability

– Availability• Automatic Workload Balancing & Client Routing

� Configuration and Monitoring– Cluster configuration and operational status

– Monitoring data

� Questions

Information Management

© 2009 IBM Corporation23

Availability Goals

� HA automation built-in

“out of the box”:– Built-in failover, recovery and fail-back

� Unplanned events (eg. Member and

PowerHA pureScale server failures) – Instantaneous Online recovery

– All data that is not being updated is fully available during

recovery

� Planned events (eg. member

maintenance) : “Stealth” Maintenance– No errors

– No data availability loss

Information Management

© 2009 IBM Corporation24

Log

CS

CS

DB2

Member SW Failure : “Member Restart on Home Host”

Single Database View

Shared Data

Clients� kill -9 erroneously issued to a member

� DB2 Cluster Services automatically detects member’s death– Informs other members & powerHA pureScale

servers– Initiates automated member restart on same

(“home”) host– Member restart is like a database crash recovery in

a single system database, but is much faster• Redo limited to inflight transactions • Benefits from page cache in CF

� In the mean-time, client connections are transparently re-routed to healthy members– Based on least load (by default), or,– Pre-designated failover member

� Other members remain fully available throughout – “Online Failover”– Primary retains update locks held by member at the

time of failure – Other members can continue to read and update

data not locked for write access by failed member

� Member restart completes– Retained locks released and all data fully available– Transaction work routed to the restarted member

CS

DB2

CS

DB2

CS

DB2

CS

Updated Pages Global Locks

LogLogLog

Log Records Pages

PrimarySecondary

Updated Pages Global Locks

kill -9 Automatic;

Ultra Fast;

Online

Information Management

© 2009 IBM Corporation25

Log

CS

CS

DB2

Member HW Failure : “Member Restart on Guest Host (aka Restart Light)”

Shared Data

Clients� Power cord tripped over accidentally

� DB2 Cluster Services loses heartbeat and declares member down– Informs other members & PowerHA pureScale

servers– Fences member from logs and data– Initiates automated member restart on another

(“guest”) host• Using reduced, and pre-allocated memory model

– Member restart is like a database crash recovery in a single system database, but is much faster• Redo limited to inflight transactions • Benefits from page cache in PowerHA pureScale

� In the mean-time, client connections are automatically re-routed to healthy members– Based on least load (by default), or,

– Pre-designated failover member

� Other members remain fully available throughout – “Online Failover”– Primary retains update locks held by member at the

time of failure – Other members can continue to read and update

data not locked for write access by failed member

� Member restart completes– Retained locks released and all data fully available

CS

DB2

CS

DB2

CS

Updated Pages Global Locks

LogLogLog

PrimarySecondary

Updated Pages Global Locks

Fence

CS

DB2

DB2

Pages

Log Recs

Single Database View

Automatic;

Ultra Fast;

Online

Information Management

© 2009 IBM Corporation26

Log

CS

CS

DB2

Member Failback

Shared Data

Clients

� Power restored and system re-booted

� DB2 Cluster Services automatically detects system availability– Checks all pre-reqs (heartbeat, adapters, file system access, etc.)

– Informs other members and PowerHA pureScale servers

– Removes fence– Brings up member on home host

� Client connections automatically re-routed back to member

CS

DB2

CS

CS

Updated Pages Global Locks

LogLogLog

PrimarySecondary

Updated Pages Global Locks

CS

DB2

DB2

Single Database View

DB2

Information Management

© 2009 IBM Corporation27

Log

CS

CS

DB2

Primary PowerHA pureScale Failure

Shared Data

Clients

� Power cord tripped over accidentally

� DB2 Cluster Services loses heartbeat and declares primary down– Informs members and secondary– PowerHA pureScale service momentarily

blocked– All other database activity proceeds

normally• Eg. accessing pages in bufferpool, existing locks, sorting, aggregation, etc

� Members send missing data to secondary– Eg. read locks and page registrations

� Secondary becomes primary– PowerHA pureScale service continues

where it left off– No errors are returned to DB2 members

• Suspend lock timeouts

CS

DB2

CS

DB2

CS

Updated Pages Global Locks

LogLogLog

PrimarySecondary

Updated Pages Global Locks

CS

DB2

Single Database View

Primary

Automatic;

Ultra Fast;

Online

Information Management

© 2009 IBM Corporation28

Automatic Workload Balancing & Client Routing

� Run-time load information used to automatically balance load across the cluster (as in System z sysplex)– Load information of all members kept on each member– Piggy-backed to clients regularly– Used to route next connection (or optionally next transaction) to least loaded member– Routing occurs automatically (transparent to application)

� Failover– Load of failed member evenly distributed to surviving members automatically

� Fallback

ClientsClients

Information Management

© 2009 IBM Corporation29

Optional Affinity-based Routing App ServersGroup A

<affinity_list>

<list name="list1"

serverorder=“member1,member2,member3,member4" ></list>

<list name="list2"

serverorder=“member2,member3,member4,member1" ></list>

<list name="list3"

serverorder=“member3,member4,member1,member2" ></list>

<list name="list4"

serverorder=“member4,member1,member2,member3" ></list>

</affinity_list>

<client_affinity_defined>

<client name=“groupA" hostname="appsrv1.ibm.com" listname="list1" >

<client name=“groupB" hostname="appsrv2.ibm.com" listname="list2" >

<client name=“groupC" hostname="appsrv3.ibm.com" listname="list3" >

<client name=“groupD" hostname="appsrv4.ibm.com" listname="list4" >

</client_affinity_defined>

App ServersGroup B

App ServersGroup C

App ServersGroup D

db2dsdriver.cfg file excerpt:

Information Management

© 2009 IBM Corporation30

“Stealth” Maintenance : Example

Log LogLogLog

1. Ensure automatic load balancing (default) or automatic routing is enabled

2. db2stop member 3 quiesce<timeout>

3. db2stop instance on host <hostname>--also--

db2cluster –cm –enter –maintenance

4. Perform desired maintenance (eg. install AIX PTF)

5. db2cluster –cm –exit maintenance-- also --

db2start instance on host <hostname>

6. db2start member 3

Single Database View

DB2 DB2 DB2 DB2

Information Management

© 2009 IBM Corporation31

Agenda

� What is pureScale– Where Does it come from

– Test Results

� Deeper Dive: Key Design Points– Overall DB2 pureScale Architecture

– Scalability

– Availability• Automatic Workload Balancing & Client Routing

� Configuration and Monitoring– Cluster configuration and operational status

– Monitoring data

� Questions

Information Management

© 2009 IBM Corporation32

Low Cost of Management

�Over-arching goal: systems management

comparable to a single system.–Fully automated install• Including instance creation with multiple members and

PowerHA pureScale servers using the GUI.

–SQL and commands to view the state of the cluster

–High degree of automated recovery from unplanned

events

–Simple procedures for planned events• Grow and shrink the cluster by adding and dropping

members and PowerHA pureScale servers

• Add and remove disk space from a database

• Hardware and software maintenance activities

Information Management

© 2009 IBM Corporation33

Simple Installation

1. Complete pre-requisite work� AIX installed, on the network, access to shared disks

2. Add the member db2iupdt –add –m <MemHostName:MemIBHostName> InstName

Note: extending and shrinking the instance is an offline task in the initial release

SD image

You can also:� Drop member� Add / drop CF

3. DB2 does all tasks to add the member to the cluster� Copies the image and

response file to host6� Runs install� Adds M4 to the resources

for the instance.� Sets up access to the

cluster file system for M4

� Initial installation� Complete pre-requisite work: AIX installed, hosts on the network, access to shared

disks enabled.� Copies the DB2 pureScale image to the Install Initiating Host.� Installs the code on the specified hosts using a response file.� Creates the instance, members, and primary and secondary PowerHA pureScale

servers as directed. � Adds members, primary and secondary PowerHA pureScale servers, hosts, HCA

cards, etc. to the domain resources.� Creates the cluster file system and sets up each member’s access to it.

Add a member

host6host0

Install

Member 0

CSCS

scp image and rspfile

host4

Install

CSCS

Primary

host5

Install

CSCS

Secondary

Install

host1

Member 1

CSCS

Install

host2

Member 2

CSCS

Install

host3

Member 3

CSCS

Install

Initiating

Host

Copy Image Locally

DB2

pureScale

Image

Member 4

Install

CSCS

Member 4

Information Management

© 2009 IBM Corporation34

Instance and Host Status

0 host0 0 - MEMBER1 host1 0 - MEMBER2 host2 0 - MEMBER3 host3 0 - MEMBER4 host4 0 - CF5 host5 0 - CF

db2nodes.cfg

Host status

Instance statusDB2 DB2 DB2 DB2

Single Database View

CF CF

Shared Data

host1host0 host3host2

host5

Clients

host4

> db2start08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful. 08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful.

08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful.08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful.

SQL1063N DB2START processing was successful.

> db2instance -list

ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

0 MEMBER STARTED host0 host0 NO

1 MEMBER STARTED host1 host1 NO

2 MEMBER STARTED host2 host2 NO

3 MEMBER STARTED host3 host3 NO

4 CF PRIMARY host4 host4 NO

5 CF PEER host5 host5 NO

HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO

host1 ACTIVE NO NO

host2 ACTIVE NO NO

host3 ACTIVE NO NO

host4 ACTIVE NO NO

host5 ACTIVE NO NO

Information Management

© 2009 IBM Corporation35

Operational Monitoring� New monitoring views and SQL functions

• SYSIBMADM.DB2_CF• SYSIBMADM.DB2_MEMBER

– Global locking and global bufferpool statistics– Drill down into other PowerHA pureScaleinternal statistics

– Cluster communications time

– Cross-member page access statistics

� Drill down per member or get global view– Available from any member

� Event monitors “always available” mode– DB2 pureScale chooses initial member automatically

– Fails over automatically if member fails

� Various new monitoring elements– Example, GBP tuning related elements (partial list):• DATA_GBP_L_READS• DATA_GBP_P_READS• INDEX_GBP_L_READS• INDEX_GBP_P_READS

DB2 DB2 DB2 DB2

Single Database View

CF

Shared Data

host1host0 host3host2

host5

Clients

host4

Information Management

© 2009 IBM Corporation36

DB2 pureScale & Optim Tooling Solutions� Consistent and integrated approach

to offering tools for DB2 pureScale– Optim tools offered for enterprise and

distributed versions of DB2 fully aware of

DB2 pureScale environments

� Database Administration– Ability to perform common administration

tasks across members and PowerHA

PureScale servers

– Integrated navigation through shared

data instances

� System Monitoring– Seamless view of status and statistics

across all instances

– Includes information on locking,

connection, storage, memory, etc.

� Application Development– Full support for developing Java, C, and

.NET applications against a pureScale

environment

Select Quiesce options that

define how and when the action should occurLaunch Desired

Administration Task Assistant Select which

member to quiesce before taking it offline

View, modify, or execute the commands to complete a

task

Information Management

© 2009 IBM Corporation37

Questions?