Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
금융시스템을위한
재해복구솔루션Disaster Recovery Solution
오라클 iSD : 최태상/허은
최대가용시스템아키텍처
Oracle9i Real Application Clusters와데이타가드를이용한고가용성구축
발표내용
비상사태개요
오라클재해복구솔루션
RAC 및 DataGuard 최대가용시스템아키텍처
데모 - DataGuard
비상사태의개요비상사태의개요
Emergency Response:
데이타의손실을최대한줄이면서 신속한대응 (Crisis Management)
Emergency Response:
데이타의손실을최대한줄이면서 신속한대응 (Crisis Management)
Disaster Recovery:
비즈니스를재개하기위한데이타의복구 ( Mainframe Recovery, Distributed Recovery)
Disaster Recovery:
비즈니스를재개하기위한데이타의복구 ( Mainframe Recovery, Distributed Recovery)
Disaster:예상되지못한사태의발생으로기업의비즈니스를일정기간서비스하지못하는상태( Business Interruption, Emergency, Crisis)
Disaster:예상되지못한사태의발생으로기업의비즈니스를일정기간서비스하지못하는상태( Business Interruption, Emergency, Crisis)
BCPBCP
DRDR
Emergency Response
Emergency Response
DisasterDisaster
Business Continuity Planning:비즈니스중단의영향, 복구전략의정형화, 계속적인비즈니스수행을위해사전에필요한준비작업 ( Business Resumption, Contingency Planning, Business Continuity Management )
Business Continuity Planning:비즈니스중단의영향, 복구전략의정형화, 계속적인비즈니스수행을위해사전에필요한준비작업 ( Business Resumption, Contingency Planning, Business Continuity Management )
인터넷시대의기업들의최대관심사 –시스템의신뢰성과가용성
High Availability (HA) is a top priority for major US organizations (IDC eWorld Survey 2001)
One minute of downtime can cost between $2,500 and $10,000 per minute (Standish Group 2001)
Even 99.9% data availability can cost a company nearly $5m a year (The Standish Group 2001)
비즈니스중단 - 비용측면
Business Average Hourly Impact
Retail Brokerage $6.5 million
Credit Card Sales Authorization $2.6 million
Pay-per-View $150,000
Home Shopping Channels $113,000
Catalog Sales $90,000
Airline Reservation Centers $90,000
Tele-Ticket Sales $69,000
Package Shipping Service $28,000
ATM Fees $15,000
가용성레벨
95%95% 1818 66 00
99%99% 33 1515 3636
99.9%99.9% 00 88 4646
99.99%99.99% 00 00 5353
99.999%99.999% 00 00 55
PercentageAvailability Days
Downtime Per Year (7x24x365)Hours Minutes
99.9999%99.9999% 00 00 11
고가용성의목표
MTTF (Mean Time To Failure ) 의최대화– 24X7의신뢰성있는비즈니스운영환경제공
MTTR(Mean Time to Recover) 최소화– 비즈니스의영향을최소화
모든재난에대한시스템접근허용 –데이타코럽션, 사용자실수, 바이러스등
가용성과업타임(Uptime)
Availability
Uptime
99.82% 16 hours
Unplanned DowntimeIgnores Planned
Downtime
24 X 6.67 8 hours
Planned DowntimeIgnores Unplanned
Downtime
Event Timeline: Day 1: Morning of Tues, Sept 11, 2001Event Timeline: Day 1: Morning of Tues, Sept 11, 20018:48 AM EST
Hijacked airliner crashes into WTC, North Tower
9:03 AM EST
Hijacked airliner crashes into WTC, South Tower
9:38 AM EST
Hijacked airliner crashes into Pentagon, Wash DC
9:59 AM EST
WTC South Tower collapses
10:05 AM EST
Security Operations Center established
10:28 AM EST
National Sales reports closing of east coast branches in hi-rises
10:29 AM EST
WTC North Tower collapses
10:29 AM EST
Employee messages posted on voicemail and 800-MER-HELP; m/f datacentersconfirmed operational; PWS begins setup at 450 Lexington Ave.
10:40 AM EST
Hijacked airliner crashes to ground in Somerset County, PA
11:40 AM EST
New York City orders evacuation of 570 Washington Street. (Reopens 1:00 PM)
11:45 AM EST
ML Canada reported the evacuation of all buildingsXxx Street evacuated
.
Company Actions
Corporate Response Team at Hopewell, NJ
Impacting Events
9:15 AM EST
Merrill Lynch NYC buildings evacuated; CRT members notified by E-Mail and Paging
9:47 AM EST
TIS Command Center established at
10:00 AM EST
ML organizations report post evacuation updates to CRT; IPCG Executives contacted by group’s BCP Team
9:25 AM EST
CRT conference call line made available as firm’s single point of contact.
비상사태의유형형태
HumanHumanErrorsErrors
Data FailuresData Failuresand Disastersand Disasters
System System FailuresFailures
Drop Tables,Drop Tables,Administrator ErrorsAdministrator Errors
Terror,Terror,Data Corruption, Data Corruption, Flood, Fire, EarthquakesFlood, Fire, Earthquakes
Power Outages,Power Outages,System CrashesSystem Crashes
Unplanned Unplanned OutagesOutages
오라클재해복구솔루션
System System FailuresFailures
Data FailureData Failureand Disastersand Disasters
Human ErrorsHuman Errors
Real Application ClustersContinuous availability for all applications
Real Application Clusters GuardZero data loss
UnplannedUnplannedOutagesOutages
Data Guard and Flashback QueryEnable users to correct their mistakes
SystemSystemMaintenanceMaintenance
Data Data MaintenanceMaintenance
Data Guard and Dynamic ReconfigurationCapacity on demand without interruption
PlannedPlannedOutagesOutages
Online RedefinitionAdapt to changes online
Oracle9i Real Application Clusters Oracle9i Real Application Clusters 가용성및구조상의이점
Oracle9i Real Application Clusters란 ?
Instance XNode 1
Instance YNode 2
HighHigh--speed interconnectspeed interconnectInstance Z
Node 3
Shared StorageShared Storage
Database filesDatabase files
Real Application Clusters의구조
Network
Low Latency Interconnectie. VIA or Proprietary
Users
No SinglePoint Of Failure
Shared CacheShared Cache
Centralized Management
Console
High Speed Switch or
Interconnect
ClusteredDatabase
Servers
Hub or Switch Fabric Storage Area Network
Mirrored Disk Subsystem
RAC(Real Application Clusters)의이점–속도향상, 확장성및최대가용성제공
Instance XNode 1
Instance YNode 2
Instance ZNode 3
Larger data
Higherreliability and
availability
Non usertasks
Higher
Greatercomplexity
Lower response
times
Lessmanagement
overhead
Greater user population
volumes throughput rate
Real Application Clusters의장점
노드간의효율적인메세지교환
모든노드에장애가발생하지않는한데이타의접근가능
복구비용은시스템장애발생횟수에따라증가함 –노드수에는관계없음
클러스터링기반의옵티마이저
업무의로드밸런싱
Oracle9i Real Application ClustersOracle9i Real Application Clusters
최적의구성형태
RAC의최적구성
RACRACDatabaseDatabase
Instance 1Instance 1 Instance 2Instance 2
Local Local
DiskDisk
Local Local
DiskDiskOracle SoftwareOracle Software
Archive LogsArchive LogsOracle Oracle SoftwareSoftware
Archive LogsArchive LogsData FilesData Files
Control FilesControl Files
Redo LogsRedo Logs
SPFILESPFILE
데이타데이타 가드가드
구성및가용성
데이타가드란 ?
기본(Primary) DB의복사본을생성,관리및운영을자동으로해주는데이타베이스
기본 DB에장애가발생하였을때(재해, 유지보수시) 대기(Standby) DB가비즈니스재개에필요한데이타, 프로그램을수행
데이타가드구조
Clients
Data Guard Broker
StandbyDatabase B
roke
r Age
ntB
roke
r Age
nt
Bro
ker A
gent
PrimaryDatabase
Clients
Primary Site
Standby Site
Data Changes
동작원리 ?
기본 DB의리두로그를대기 DB로전송대기 DB는기본 DB의내용으로계속적으로 Sync 됨기본 DB 활성화 ; 대기 DB는복구모드나 Read-only /Read-Write 모드로활성화필요에따라대기 DB가기본 DB로전환가능
데이타가드구성
Managed as a single configurationPrimary and standby databases can be single-instance Oracle or Real Application ClustersUp to nine standby databases supported in a single configuration
PrimaryDatabase
StandbyDatabase
Standby Site A
StandbyDatabase
Standby Site B
Primary Site
데이타가드구성의상세사항Physical/Logical
StandbyDatabase
PrimaryDatabase
Online Redo Logs
ARCH(Synchronous)
RFS
StandbyRedo Logs
ARCH
FAL
Oracle NetTransactions
LGWR(Synchronous/Asynchronous)
Backup /Reports
MRP/ LSPAffirm/NoAffirm
Transform Redo to SQLfor SQL Apply
Archived Redo Logs Archived Redo Logs
데이타가드 - 리두적용
Data Guard Broker
Physical StandbyDatabase
OptionalDelay
Sync or Async Redo Shipping
Network
Redo Apply
DIGITAL DATA STORAGE
PrimaryDatabase
Backup
Physical Standby Database is a block-for-block copy of the primary databaseUses the database recovery functionality to apply changesCan be opened in read-only mode for reporting/queriesCan also perform backup, offloading production database
데이타가드 - SQL 적용
Optional Delay
Sync or Async Redo Shipping
NetworkContinuously
Open for Reports
Transform Redo to SQL and Apply
Data Guard Broker
Logical StandbyDatabase
AdditionalIndexes &
Materialized ViewsPrimary
Database
Logical Standby Database is an open, independent, active databaseContains the same logical information (rows) as the production databasePhysical organization and structure can be very differentCan host multiple schemas
Can be queried for reports while logs are being applied via SQLCan create additional indexes and materialized views for better query performance
인적재해나데이타코럽션으로부터의보호
The application of changes received from the primary can be delayed at standby to allow for the detection of user errors andprevent standby to be affectedThe apply process also revalidates the log records to prevent application of any log corruptions
Primary Site
StandbyDatabase
Standby Site
ProductionDatabase
Optional Delayed Apply
Switch & Failover
Primary and Standby role transitionsSwitchover
– Planned role reversal– No database reinstantiation required
Failover – Unplanned failure (e.g. disasters) of primary – Primary database must be reinstantiated
Initiated using SQL or GUI interfaceData Guard automates the processes involved including full support for RAC
유연한데이타보호모드
Asynchronous redo shipping
Minimal data loss – usually 0 to few seconds
Maximum Performance
Synchronous redo shipping
ZeroSingle Failure Protection
Maximum Availability
Synchronous redo shipping to 2 sites
ZeroDouble Failure Protection
Maximum Protection
Redo ShipmentRisk of Data LossProtection Mode
Balance cost, availability, performance, and transaction protection
최대가용시스템구조(Maximum Availability Architecture)
Overview
최대가용시스템구조
오라클의재해솔루션을바탕으로한신뢰성있는시스템구조
MTTR을최소화하는반면 MTTF를최대화하는 3-tier 구성방식
RAC & Data Guard 로구성됨고객의요구를수용하여 Customize 될수있음
최대가용시스템구조 –구성요소
• MAA의주요컴포넌트• 다중미들티어및애플리케이션
• 다중네트워크인프라스트럭쳐
• 다중스토리지인프라스트럭쳐
• Real Application Clusters (RAC)을이용한시스템및인스턴스장애로부터복구
• Data Guard (DG)를이용한인재및데이타파일장애로부터복구
• 운영프로세스 정리
Real Application Clusters 가드
데이타가드와 Real Application Clusters는상호보완적인관계
RAC을이용하여재해복구시스템을구축하여가용성향상
재해및데이타손실을방지하는데이타가드
RAC는데이타코럽션이나인재를방지하지못함
최대가용시스템구조
최대가용시스템의복구현황
MTTROutages Solution Sets
Hardware andSoftware Patches
Human Errors andData Failures
Host and InstanceFailures
Site Disasters
<= 30 minutes
<= 30 minutes
<= 1-5 minutes
<= 30 minutes
RAC or Data Guard
Data Guard
RAC
Data Guard on Secondary Site
재해복구시스템구축시성능고려사항
동일한사양의기본및대기데이타베이스구축– 대기데이타베이스로운영을할경우대비
네트워크의속도가전체응답시간에미치는영향– Network latency will increase response time
Remote write = network round trip time + local write I/O time
– Bandwidth > max redo generation rate
데이타가드와리모트미러링효율성비교
오라클데이타가드의장점– Better Performance
A Standby database propagates only changes, not every IO to a Standby database
– Better Resilience - changes are logically validated – Better Error tolerance - application of changes can be delayed to
back out mistakes– Better ROI - Standby allows read-only access 리모트미러링의장점
– useful for non-database files
데이타가드와리모트미러링성능비교
If mirroring is used for a database, then the database files, the online logs, and the archive logs must be mirrored resulting in much worse performance
Network Bandwidth Network I/Os
Standby Database
Remote M irroring
77xx 2727xx
DEMO 구축환경
DEMO 시나리오
기본(Primary) 및대기(Standby) DB 기동및환경체크Gap Detection and Resolution(네트워크장애발생시)Delayed Apply of Redo Information(인적장애발생시)Creating a configuration with DGM(DataGuard Manager)Force Logging on primary database(no logging 문장수행시)Graceful Switchover (계획된시스템정지시)
재해복구시스템구축시전반적인고려사항
데이타손실이나시스템의장애시발생되는경비 ?복구시스템에복제되는주요데이타의옵션 ?최소한의데이타손실을허용할것인가 ?성능을위해고려해야할사항은없는가 ?기본적인인프라스트럭쳐를변경하지않으면서재해복구시스템을구축할수는없는가 ?
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S