Upload
markus-michalewicz
View
1.828
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Cloud Consolidation with Oracle (RAC) – How much is too much?
Citation preview
5/14/2012
1
1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Cloud Consolidation with Oracle (RAC)
– How much is too much? [email protected]
Senior Principal Product Manager
Oracle RAC, Oracle America
Technical Director, Oracle RAC
Development, Oracle America
5/14/2012
2
3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features
or functionality described for Oracle’s products remains at the sole discretion of Oracle.
4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Agenda • Database Cloud Architectures
• General Considerations
• Database Consolidation
– Memory Management
– CPU(_COUNT) Management
• A word on Instance Caging
– Database Resource Management
– Summary Database Consolidation
• Oracle RAC-specific Considerations for Consolidation
– Per Server Limits
– Real Time (RT) Processes
– Cluster Limits
• Summary and Q&A
5/14/2012
3
5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
OS
ERP DW CRM
DB
OS
DB
DB
Database Cloud
Database
Database Cloud
OS
ERP DW CRM
OS
DB
Schema Server
Infrastructure Cloud
Hypervisor
CRM DW ERP
OS
DB
OS
DB
OS
DB
Hypervisor
Database Cloud Architectures Common building blocks are shared server and storage pools
6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
General Considerations
5/14/2012
4
7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle (RAC) Database Consolidation Registered vs. Running
• Registered databases and instances could
potentially start and run at the same time.
• Oracle’s Quality of Service Management
or scripts can be used to model policies
to run certain databases only at certain
times; e.g. geographic region over time
• Assume the cluster is PST based:
• EMEA based DBs run 10:00pm – 8am PST
• APAC based DBs run 6:00pm – 3am PST
• USA based DBs run 8:00am – 6pm PST
Registered:
• n DB instances
are defined to
run on a machine
(potentially)
Running:
• Registered databases
and instances are
(concurrently) running
(active workload)
vs.
8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Cloud
OS
ERP DW CRM
OS
DB
Schema
Database Consolidation General Considerations
• Simplest of all consolidation cases:
• One database Instance per Server
• When consolidating more than one
database on a server, consider the
server capacity with any DB added.
• More details: Best Practices for Database Consolidation in Private Clouds http://www.oracle.com/technetwork/database/focus-
areas/database-cloud/database-cons-best-practices-1561461.pdf
• Exadata Database Machine-specific: http://www.oracle.com/technetwork/database/features/
availability/exadata-consolidation-522500.pdf
5/14/2012
5
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
General Considerations Consider Workload Characteristics during Capacity Planning
Utiliz
atio
n
Workload A
Time
Utiliz
atio
n
Workload B
Time
New Workloads
OR
Time
Utiliz
atio
n
Peak
Average
The smaller this
gap, the better.
Existing Workload
10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
General Considerations Consider Workload Characteristics during Capacity Planning
+
Resulting Workload
Time
Utiliz
atio
n Peak
Average
Poor match:
Gap increases
(antagonistic)
Utiliz
atio
n
Workload B
Time
Time
Utiliz
atio
n
Peak
Average
The smaller this
gap, the better.
Existing Workload
5/14/2012
6
11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
General Considerations Consider Workload Characteristics during Capacity Planning
Utiliz
atio
n
Workload A
Time
+
Resulting Workload
Time
Utiliz
atio
n
Peak
Average
Good match:
Gap decreases
(complimentary)
Time
Utiliz
atio
n
Peak
Average
The smaller this
gap, the better.
Existing Workload
12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
General Considerations Consider Workload Characteristics during Downtime
OS
DBA
OS OS
ERP CRM DW HR
DBB
DBA
OS
Time
Utiliz
atio
n
Peak
Average
Utilization Normal Operation
5/14/2012
7
13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
General Considerations Consider Workload Characteristics during Downtime
OS
DBA
OS OS
ERP CRM DW HR
DBB
DBA
OS Average
Time
Utiliz
atio
n
Peak
During Downtime
Time
Utiliz
atio
n
Peak
Average
Utilization Normal Operation
14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
• Standardization of software and
hardware simplifies planning
• Standardized hardware means a
predictable behavior should demand
increase and additional hardware needs
to be added (horizontal scaling approach)
• Using “application profiling” (template
based deployment) based on current
system(s) and performance baselines
allows for a predictable deployment of
new applications on the same system
using existing profiles.
The Benefits of Standardization Easier deployment and better predictability
Small OLTP
Medium OLTP
Large OLTP
New
Application A
Then deploy
No
de
s
5/14/2012
8
15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation
16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Starting block: One Database instance per server
• Components to consider:
• Memory
• CPU
• I/O
• Processes
• Network
DB
OS
5/14/2012
9
17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation One instance per server as the basis
• An Oracle database by default assumes that there is
only one database instance running on the server:
• Instance parameters are based on this assumption
• Consolidation changes that premise
• Main resources used:
• Memory
• CPU
• I/O
• Resources “regulated by default”:
• Memory
• SGA / PGA Targets
• CPU
DB
OS
18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Recommendation 1: Manage Memory carefully (and dynamically)
• Avoid memory starvation and swapping
as it has negative impact on the system.
• Do not oversubscribe memory resources
• Define memory settings carefully – rule of thumb:
• For the OS in general:
• Shared Memory identifiers and segments
• Use Hugepages, if possible – details:
• MOS notes 361323.1 and 401749.1
• For OLTP applications:
• SUM (sga_target + pga_aggregated_target)
<= 80% of physically available memory per DB server
• For DW / BI applications:
• SUM (sga_target + 3* pga_aggregated_target)
<= 80% of physically available memory per DB server
20%
80% DB
OS
5/14/2012
10
19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Recommendation 2: Use CPU_COUNT to “cage instances”
• CPU usage should be regulated.
• The OS scheduler schedules CPU as
requested by each individual instance.
• The OS scheduler does not know about the
priority of the various instances on the server.
• Use CPU_COUNT or ideally Instance Caging
• Instance Caging is configured in just 2 steps:
• 1. Set “cpu_count” parameter
• Max. number of CPUs the instance can use at any time
• 2. Set “resource_manager_plan” parameter
• Enables CPU Resource Manager
• E.g. out-of-box plan “DEFAULT_PLAN”
OS
DB
A
DB
B
20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Using CPU_COUNT as a “central knob”
• CPU_COUNT regulates CPU usage and
dependent resources (to a certain degree):
• Parallelism (PQ operations)
• Processes
• Load Calculation
• Processes and PQ
operations should be
considered explicitly.
Data Structures
Concurrency
Parallelism
Processes
Memory Allocation
Load Calculation
OS
fx(CPU_COUNT)
16 20%
80%
DB
A
DB
B
5/14/2012
11
21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
A Word on Instance Caging
22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Instance Caging: Partitioning Approach
• Provides maximum isolation
• For performance-critical databases
• If one database instance is idle,
its CPU allocation is unused
• The rule of thumb for the partitioning
approach is to set a general limit: • SUM (CPU_COUNT) < 75% x Total CPUs
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 4 CPUs
Instance D: 4 CPUs
OS 16
16 CPUs
DB
D
DB
C
DB
B
DB
A
5/14/2012
12
23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Instance Caging: Over-Provisioning Approach
• Best used for non-critical databases
that are typically well-behaved
• Contention for CPU if database
instances are sufficiently loaded
• Typically not enough contention
to destabilize OS or DB instances
• Best approach if the goal
is fully utilized CPUs 0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 6 CPUs
Instance D: 8 CPUs
OS 16
16 CPUs
DB
A
DB
B
DB
C
DB
D
24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Instance Caging – under the covers
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 4 CPUs
Instance D: 4 CPUs
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 6 CPUs
Instance D: 8 CPUs
• If cpu_count is set to 4 on a 16 CPU server
• All foreground processes make progress
• But only 4 foregrounds are running at any time
• Most backgrounds are not managed
• Critical and use very little CPU
• MMON, Job Scheduler slaves are managed
• No CPU affinity
• Not meant for hard-partitioning or licensing
• All CPUs may be used
• CPU utilization averaged across all CPUs ≤ 25%
• More information:
http://www.oracle.com/technetwork/database/focus-areas/performance/instance-caging-wp-166854.pdf
Partitioning Approach Over-Provisioning
Approach
5/14/2012
13
25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Over-Provisioning Approach – It’s still hardware that’s the limit
• Best used for non-critical
databases that are typically
well-behaved – examples:
• Complimentary workload
• Systems with little contention
for CPU if database instances
are sufficiently loaded.
• Do not use,
• If the load is significant and
of longer duration, as system
stability can get impacted.
• For highly critical systems OS 16
16 CPUs
DB
A D
BB
DB
C D
BD
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 6 CPUs
Instance D: 8 CPUs
0
20
40
60
80
100
120
140
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12
CPU Util.
Average
26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation How much over-provisioning is OK?
• As CPU_COUNT does not consider the
“quality of a CPU” an absolute maximum
is hard to determine / depends on the system.
• The general rule of thumb is:
• SUM (CPU_COUNT) <= up to 2x Total CPUs
• Consider different system types:
OS 16
16 CPUs
DB
A
DB
B
DB
C
DB
D
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 6 CPUs
Instance D: 8 CPUs
Threaded Core based Engineered
“Total CPUs” are based on
# of threads # of cores # of threads
Max. over-provisioning factor
1.5 (HT*) 1.0 (hHT**)
2.0 3.0 (DBM) 2.0 (ODA)
(HT*): ratio 1:2; (hHT**): ratio 1:n, with n >2
5/14/2012
14
27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Resource Management
28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Recommendation 3: Use DB Resource Manager
3 steps to use Resource Manager:
1. Group sessions with similar
performance objectives into
Consumer Groups
2. Allocate resources to consumer
groups using Resource Plans
3. Enable Resource Plan
5/14/2012
15
29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Summary Database Consolidation
30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Database Consolidation Summary
1. Define memory settings carefully:
• For OLTP applications:
• SUM (sga_target + pga_aggregated_target)
<= 80% of physically available memory per DB server
• For DW / BI applications:
• SUM (sga_target + 3* pga_aggregated_target)
<= 80% of physically available memory per DB server
2. Use CPU_COUNT or ideally Instance Caging
• The general rule of thumb is:
• SUM (CPU_COUNT) <= up to 2 x Total CPUs
• Consider different system types.
3. These are the most crucial per server limits.
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 6 CPUs
Instance D: 8 CPUs
DB
A
OS
DB
B
20%
80%
5/14/2012
16
31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC-specific Considerations for Consolidation
32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC based Consolidation Server limits are mostly reached before cluster limits apply
• Most customers will experience “per
server limits” before “cluster limits” apply.
• Oracle RAC introduces a few more
processes (potential limits) to consider.
• Oracle RAC DBs use LMS Real
Time (RT) processes per instance.
• LMS RT processes need to
be considered in particular
OS 16
DB
A1
D
BB
1
DB
C1
OS 16
DB
A2
D
BB
2
DB
C2
Clusterware Clusterware
Per
Serv
er
Lim
its
Cluster Limits
5/14/2012
17
33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Real Time (RT) Processes
34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC based Consolidation Considerations for Real Time (RT) Processes in general
• A Real Time process can only
run on one CPU (core) at a time.
• The usage of the CPU is typically short.
• The general rule of thumb is:
• The aggregated number of
RT processes per server should
not exceed the number of cores per server
• One Oracle RAC instance has typically at
least one RT process (LMS) per default
• An Oracle ASM instance has one RT process
• Oracle Clusterware uses various RT processes
OS 16 OS 16
Clusterware Clusterware
DB
A1
D
BB
1
DB
C1
DB
A2
D
BB
2
DB
C2
5/14/2012
18
35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC based Consolidation Background for LMS Real Time (RT) Process recommendation
• The number of LMS RT processes per
instance is determined by a function on
CPU_COUNT.
• In order to guarantee optimized
performance and reliability, the
general rule of thumb for RAC is:
• The aggregated number of LMS
RT processes per server should
not exceed [cores per server]-1
• See MOS note: 558185.1 for details
• This leaves one CPU free for additional
RT processes to be assigned as needed,
as LMS RT can stay on a CPU for a moment OS 16 OS 16
DB
A1
D
BB
1
DB
C1
DB
A2
D
BB
2
DB
C2
Clusterware Clusterware
36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC based Consolidation Automatic adjustment of LMS process priority in 11.2.0.3
• With 11.2.0.3 the number of LMS RT
processes are monitored and adjusted
according to the number of cores on
the node periodically.
• The goal is to keep
RT LMSs per server <= # cores per server
• For details, see MOS note 1392248.1 –
Auto-Adjustment of LMS Process Priority
in Oracle RAC with 11.2.0.3 and later
• This excludes any ASM instance running
on the system as well as any pre-11.2.0.3
database instance
• See also MOS note 1439551.1 –
Oracle (RAC) Database Consolidation Guidelines
for Environments using mixed Database Versions
OS 4 OS 4
DB
A1
D
BB
1
DB
C1
D
BD
1
DB
A2
DB
B2
DB
C2
DB
D2
DBF1
DBE1
DBF2
DBE2
Per
Serv
er
Lim
its
5/14/2012
19
37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC based Consolidation Recommendation 4: Over-provision only based on CPU_COUNT
• Do not over-provision the number
of LMS RT processes on one server.
• Limit the number of RT LMSs by:
• Using CPU_COUNT
• Directly reducing the number of LMS
RT processes (gcs_server_processes)
• Downgrading additional LMS RT
processes to time share (TS).
• In 11.2.0.3 and later, the RT to CPU
rule will be enforced by automatically
downgrading subsequently started
LMS RT processes to TS.
OS OS
DB
A1
D
BB
1
DB
C1
D
BD
1
DB
A2
D
BB
2
DB
C2
D
BD
2
DBF1
DBE1
DBF2
DBE2
Per
Serv
er
Lim
its
0
4
8
12
16
20
24
28
32
Instance A: 4 CPUs
Instance B: 4 CPUs
Instance C: 6 CPUs
Instance D: 8 CPUs
48 .
.
.
.
.
.
38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Cluster Limits
5/14/2012
20
39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Oracle RAC based Consolidation Cluster Limits – Starting DBs are currently the main concern
• In most cases, per server limits will be
reached before cluster limits are reached.
• Cluster limits apply to:
• Registered resources (databases)
• Starting databases in the cluster – reason:
• Starting databases need to register
with the cluster (Oracle Clusterware).
• Currently and for example, 100
starting Oracle RAC databases
on a 4 node cluster are supported.
• The Number assumes each Oracle RAC
database uses an instance on each node.
OS
DB
A1
D
BB
1
DB
C1
OS
DB
A2
D
BB
2
DB
C2
Clusterware Clusterware
Cluster Limits
DB
D1
DB
D2
Starting:
• Registered databases
and instances start
• Default is “starting
at the same time”.
40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Summary
5/14/2012
21
41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
(Oracle RAC) Database Consolidation Summary
=
• For a successful database consolidation,
consider the following as rules of thumb:
1. General considerations for capacity planning
2. Manage Memory carefully (and dynamically)
3. Use CPU_COUNT
4. Use DB Resource Manager
5. Over-provision only based on CPU_COUNT
Time
Utiliz
atio
n
Peak
Average
Existing Workload
Data Structures
Concurrency
Parallelism
Processes
Memory Allocation
Load Calculation
+ 20%
80% DB
OS
42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.