30
Evolving the Enterprise’s Database Infrastructure “Move to the Grid”

Evolving the Enterprise’s Database Infrastructure “Move to the Grid”

  • View
    229

  • Download
    1

Embed Size (px)

Citation preview

Evolving the Enterprise’s Database Infrastructure

“Move to the Grid”

Agenda

• Problems• Introduction to Oracle 10g features• Demonstrate impact on the Enterprise• Propose Phase I Project

– Consolidation– Scalable Grid Architecture

Top 7 Problemsfor DBAs

• Growth in number and size of databases do not match staffing levels

• Root cause of performance bottlenecks are not easily diagnosed or obvious

• After a session ends, statistics and troubleshooting information are not always available

• Databases are shoehorned onto servers without consideration of correct layout leading to IO bottlenecks

Top 7 Problemsfor DBAs

• Impossible to manually monitor and tune all databases

• Managing storage correctly is very time consuming

• Database tuning is part experience, part science, part art and part intuition.

Top Problemsfor Sysadmins

• Many different servers, different architectures

• High number of databases per single node – complex to schedule maintenance windows

• Grey area between DBA and sysadmin responsibilities

New in 10g

• The vision for the grid• 10g not a regular database upgrade• RAC enhancements• Backup strategy• ASM (Automatic Storage Management)• ADDM & Advisors• DataGuard

Problems Solved in 10gfor DBAs

• Some tedious and time consuming DBA tasks are now managed by Oracle

• Oracle will identify root causes of performance issues and rank the effectiveness of fixing them

• Oracle stores statistics about every session in its repository

• ASM will rebalance hot spots making it easier to have many databases on a server

Problems Solved in 10gfor DBAs

• 10g metrics and alerts will allow the DBAs to be more proactive by providing out of the box alerts

• ASM will allow for Oracle to manage storage reducing this very time consuming problem

• Oracle 10g provides advisors for tuning

The vision for the Grid

• The “g” in 10g• Grid is not RAC, RAC is not Grid• Treat all computing resources like a utility

in all layers of the product stack• Clustered application servers (ias cluster)• Clustered database (RAC)• Automatic Storage management (ASM)

for provisioning Storage

The vision for the Grid

• Scalability – Easily add more resources• Management, monitoring and provisioning

with “Grid Control”• Virtualization of resources – Applications

are not tied to specific hardware but rather see one large pool of resources

10g NOT a regular database upgrade

• Big learning curve• Changes at all levels of the hardware

stack• Good opportunity to define job

responsibilities in relation to the hardware stack

The grid hardware stack

• Application servers (ISR/NCS depending on application)

• Databases (DBA Team / ISR)• Load balancers/Interconnects/Network

Infrastructure (NCS)• Servers (NCS Sysadmins)• Storage Architect (NCS)• Cluster (Sysadmins/Storage Architects) • Firewall appliances (NCS)• Backups (DBA / NBU Admins)

RAC Enhancements

• FAN – Fast Application Notification• Smarter load balancing across nodes

– Can now mix different classes of servers in your Cluster this gives ability to leverage existing hardware

– Before grid some servers were almost always idle and some were never idle, grid makes the best use of resources

• Assign % of CPU usage to a Service• Better management of workload

Backup Philosophy in 10g

• Backups go to disk not tape • Flashback logs

– Supports flashback database and recovery through resetlogs

• Flash recovery area– On disk– Holds one full backup– Holds all Incrementals– Archive & flashback logs– Backed up and managed by RMAN– Flash recovery area backed up to Tape– Best practice: Use ASM for this area– Shared by all instances on server

Backup philosophy in 10g

• Benefits– Most failures now are due to NBU on a rate of 5 or 6

per day. Requires operations to resubmit the backup and DBA time to follow up.

– Time of Backup now at 4-6 hours (for MCGP)– Lots of time spent waiting on tape– Recovery from tape is slow, new features help minimize

downtime– All files to recover are in same location– Having this on ASM minimizes work to maintain

archivelog free space (avoid database hang)

Automatic Storage management

• Oracle’s “Smart” Filesystem• DBAs only have to deal with a few diskgroups

rather then trying to fit datafiles on fixed size mountpoints.

• Raw partitions have always been recommended for performance but before ASM were very difficult to manage

• ASM can stripe and mirror your storage (Optional)

• ASM can rebalance to avoid hot spots• Managing storage is very time consuming to do

right, ASM does the tedious tasks for you.

ADDM & Advisors

• Oracle has internalized metric collection in 10g

• ADDM runs and looks for problems• ADDM will recommend the use of advisors

to further investigate the problem• Will help the DBA (and developer) by

providing tuning advice.

DataGuard

• What is redo• RAC = Instance availability• DataGuard = Database availability• Logical and Physical standby• Protect database vs. Provide service• All enterprise systems should have Dataguard• Imagine loosing an hour of committed

transactions in Banner or Vista?• Time to rebuild an enterprise system?• Uses for DWH

Phase I Project scope

• Bring in required infrastructure• Consolidate

– Tempest/Squall replaced with scalable grid technology

– Migrate DORACs/ORACs into this architecture

Phase I Project scope

Current grid control implementation not highly available– Migrate Grid Control repository database to

RAC.– Cluster application server, Norad2– Leverage virtualization

Required Infrastructure(Grid Control)

• Have been using grid control for the past two years since it was beta

• Not optional in 10g*• Has helped us to develop standards and be proactive• Upgrade to release 2 in progress• Release 2 improves on provisioning and RAC management• Will be used by developers as well as DBAs when we go to

10g• In release 2, Oracle has partnered with third parties to

deploy agents on non Oracle software and appliances Including SQL Server, WebLogic, F5 Load Balancers

Losing Grid Control

• No monitoring and alerts for databases• No GUI to manage 10g databases• Loss of tools for programmers and DBAs• Scheduled DBA jobs would not run

Required Infrastructure(OID)

Oracle Internet Directory• ONAMES is deprecated in 10g. ONAMES is a

central naming service used to translate a name to a connect string and is needed for connectivity.

• Bridge from Oracle products to Active Directory for single sign-on and authentication

• Could have many other uses to manage and simplify security in Oracle products (Needs more research)

• Should be highly available or risk users not being able to connect to databases

Required Infrastructure(OID)

Establish a two node OID, objectives:• Replace ONAMES and shared TNSNAMES files

as a standard naming method • Clean up of all names as well as investigate the

use of global_names• Replace infra1.portal.mcgill.ca for managing

authentication. (Migrate asdb instance on infra1 to RAC - solely for Portal metadata)

Infrastructure(worth investigating)

WebCache• Part of Oracle application server install• Used by Portal (but not currently installed in HA

config)– Should be made highly available

• Should have a better understanding of how it works

• Can it benefit more than just the portal? (Improve Registration?)

• Investigate “Times 10” data cache

Consolidation(Tempest/Squall)

• Tempest and Squall are servers funded by NCS as per a Tony Masi initiative to consolidate disparate databases from across campus.

• Tempest is a test server containing 12 databases.

• Squall is a production server containing 20 databases

• Databases serve mostly E-business group’s clients, ICS (HEAT) and ARR (Scheduling)

• On-going demand for new databases• Difficult to estimate capacity and resource needs• Not scalable and not highly available• Best candidate for new architecture

Consolidation(Tempest/Squall)

• Set up a 10g test grid to replace Tempest• Set up a 10g production grid to replace Squall• Migrate any applications on Tempest/Squall to 10g grid for

which 10g is supported as well as migrate all McGill developed applications currently residing on Tempest/Squall.

• Migrate NCS databases• Production Grid will provide a location for any 10g

database that needs to be highly available (Grid Control repository, Portal repository)

• Project should include consultant from Oracle to review plan, discuss best practices and guide in initial setup of test environment.

• Good learning experience before restructuring large Enterprise systems (Vista, Banner)

Risks of non-action

• Not a Tony Masi “Top 5” project but if we do not get Phase I accomplished and gain the needed knowledge we will not meet next year’s objectives (i.e. Vista upgrade, Banner upgrade)

• Staff resources continue to be stressed• Advantages of new best practices for RMAN and backups

of flash recovery area• Development of methodology for migrating to Cost based

optimizer• Learning best practices for ASM on Hitachi SAN• Benefiting from new features in OEM (monitoring, tuning

and provisioning)• New failover and load balancing features on RAC (FAN –

Fast application Notification)• Setup and configuration of 10g RAC

Key Skills to Develop

• Best practice to migrate 9i RAC to 10g RAC

• Correct use of WebCache• Understand implications of

global_names=true• Get developers up to speed on writing

good code and performance tuning as well as trained on using new 10g tools

• Oracle Internet Directory

Summary

• Big learning curve• Need to move forward or future projects

will be in jeopardy of failure• All levels of hardware stack are implicated