Upload
ruby-welch
View
217
Download
2
Embed Size (px)
Citation preview
High Availability and Fault-High Availability and Fault-Tolerance in Real-Time Tolerance in Real-Time DatabasesDatabases
Jan LindströmJan Lindström
University of HelsinkiUniversity of Helsinki
Department of Computer ScienceDepartment of Computer Science
OverviewOverview
The causes of the downtimeThe causes of the downtime Availability solutionsAvailability solutions CASE 1: ClustraCASE 1: Clustra CASE 2: TelORBCASE 2: TelORB CASE 3: RODAINCASE 3: RODAIN
The Causes of DowntimeThe Causes of Downtime
Planned downtimePlanned downtime• Hardware expansionHardware expansion
• Database software upgradesDatabase software upgrades
• Operating system upgradesOperating system upgrades
Unplanned downtimeUnplanned downtime• Hardware failureHardware failure
• OS failureOS failure
• Database software bugsDatabase software bugs
• Power failurePower failure
• DisasterDisaster
• Human errorHuman error
Traditional Availability SolutionsTraditional Availability Solutions
ReplicationReplication FailoverFailover Primary restartPrimary restart
CASE 1: Clustra CASE 1: Clustra
Developed for telephony applications such as Developed for telephony applications such as mobility management and intelligent mobility management and intelligent networks.networks.
Relational database with location and Relational database with location and replication transparency.replication transparency.
Real-Time data locked in main memory and Real-Time data locked in main memory and API provides precompiled transactions.API provides precompiled transactions.
NOT a Real-Time Database !NOT a Real-Time Database !
Clustra hardware architectureClustra hardware architecture
Data distribution and replicationData distribution and replication
How Clustra Handles FailuresHow Clustra Handles Failures
Real-Time failover: Hot-standby data is up to date, so Real-Time failover: Hot-standby data is up to date, so failover occurs in milliseconds.failover occurs in milliseconds.
Automatic restart and takeback: Restart of the failed node Automatic restart and takeback: Restart of the failed node and takeback of operations is automatic, and again and takeback of operations is automatic, and again transparent to users and operators.transparent to users and operators.
Self-repair: If a node fails completely, data is copied from Self-repair: If a node fails completely, data is copied from the complementary node to standby. This is also automatic the complementary node to standby. This is also automatic and transparent.and transparent.
Limited failure effectsLimited failure effects
How Clustra Handles UpgadesHow Clustra Handles Upgades
Hardware, operating system, and database Hardware, operating system, and database software upgrades without ever going software upgrades without ever going down.down.• Process called “rolling upgrade”Process called “rolling upgrade”
– I.e. required changes are performed node by node.I.e. required changes are performed node by node.
– Each node upgraded to catch up to the status of Each node upgraded to catch up to the status of complementary node.complementary node.
– When this is completed, the operation is performed When this is completed, the operation is performed to next node.to next node.
CASE 2: TelORBCASE 2: TelORB
CharacteristicsCharacteristics Very high availability (HA), robustness implemented in SWVery high availability (HA), robustness implemented in SW (soft) Real Time(soft) Real Time Scalability by using loosely coupled processorsScalability by using loosely coupled processors
OpennessOpenness Hardware: Intel/Pentium Hardware: Intel/Pentium Language: C++, JavaLanguage: C++, Java Interoperability: CORBA/IIOP, TCP/IP, Java RMIInteroperability: CORBA/IIOP, TCP/IP, Java RMI 3:rd party SW: Java3:rd party SW: Java
TelORB AvailabilityTelORB Availability
Real-time object-oriented DBMSReal-time object-oriented DBMS supporting supporting Distributed TransactionsDistributed Transactions ACID propertiesACID properties expected from a DBMS expected from a DBMS Data ReplicationData Replication (providing redundancy) (providing redundancy) Network RedundancyNetwork Redundancy
Software Configuration ControlSoftware Configuration Control Automatic restart of processes that originally executed Automatic restart of processes that originally executed
on a faulty processor on the ones that are working on a faulty processor on the ones that are working Self healingSelf healing
In service upgradeIn service upgrade of software with no disturbance to operation of software with no disturbance to operation
Hot replacementHot replacement of faulty processors of faulty processors
Automatic ReconfigurationAutomatic Reconfiguration
reloading
Software upgradeSoftware upgrade
Smooth software upgrade when old and Smooth software upgrade when old and new version of same process can coexistnew version of same process can coexist
Possibility for application to arrange for Possibility for application to arrange for state transfer between old and new static state transfer between old and new static process (unless important states aren’t process (unless important states aren’t already stored in the database)already stored in the database)
Partioning: Types and DataPartioning: Types and Data
21 221817
A B
2019 2019 A
B
1817
21 22
AdvantagesAdvantages
Standard interfaces throughStandard interfaces through Corba Corba
Standard languagesStandard languages: C++, Java: C++, Java
Based onBased on commercial hardware commercial hardware
(Soft)(Soft) Real-time OS Real-time OS
Fault tolerance Fault tolerance implemented in softwareimplemented in software
FullyFully scalable scalable architecturearchitecture
Includes powerfulIncludes powerful middleware: middleware: AA database management system database management system and and functions forfunctions for software management software management
Fully compatibleFully compatible simulated environment simulated environment for development onfor development on Unix/Linux/NTUnix/Linux/NT workstations workstations
CASE 3: RODAINCASE 3: RODAIN
Real-Time Object-Oriented Database Real-Time Object-Oriented Database Architechture for Intelligent NetworksArchitechture for Intelligent Networks
Real-Time Main-Memory Database System Real-Time Main-Memory Database System Runs on Real-Time OS: Chorus/ClassiX Runs on Real-Time OS: Chorus/ClassiX
(and Linux)(and Linux)
Rodain ClusterRodain Cluster
Rodain Database NodeRodain Database Node
Distributed DatabaseSubsystem
User Request Interpreter Subsystem
Watchdog Subsystem
Fault-Tolerance andRecovery Subsystem
Object-OrientedDatabaseManagementSubsystem
Database Primary Unit
User Request Interpreter Subsystem
Watchdog Subsystem
Object-OrientedDatabaseManagementSubsystem
Database Mirror Unit
Distributed DatabaseSubsystem
Fault-Tolerance andRecovery Subsystem
shared
disk
Distributed DatabaseSubsystem
User Request Interpreter Subsystem
Watchdog Subsystem
Fault-Tolerance andRecovery Subsystem
Object-OrientedDatabaseManagementSubsystem
Database Primary Unit
User Request Interpreter Subsystem
Watchdog Subsystem
Object-OrientedDatabaseManagementSubsystem
Database Mirror Unit
Distributed DatabaseSubsystem
Fault-Tolerance andRecovery Subsystem
shared
disk
RODAIN Database Node IIRODAIN Database Node II
ORD ArchitechtureORD Architechture
TRP
FTRS
DDSORD
OCC DataIndex
Fault-ToleranceFault-Tolerance
Based on logs and mirroringBased on logs and mirroring Logs send to MirrorLogs send to Mirror Mirror stores the logs on disk in SSSMirror stores the logs on disk in SSS Mirror maintains copy of main-memory Mirror maintains copy of main-memory
databasedatabase Mirror makes disk copies of its database Mirror makes disk copies of its database
imageimage
RecoveryRecovery
Based on role switchingBased on role switching When Primary failsWhen Primary fails
• Mirror updates its MMDB up to dateMirror updates its MMDB up to date• Mirror starts acting as new PrimaryMirror starts acting as new Primary• Active transactions are restarted or lostActive transactions are restarted or lost
When Mirror failsWhen Mirror fails• Primary stores logs directly to SSSPrimary stores logs directly to SSS
Recovery IIRecovery II
During recovery the failed NodeDuring recovery the failed Node• always starts as a mirror nodealways starts as a mirror node• loads most recent database image from disks in loads most recent database image from disks in
SSSSSS• updates the log tail to loaded imageupdates the log tail to loaded image• receives the logs from primary nodereceives the logs from primary node• continues as normal mirror nodecontinues as normal mirror node
Further readingFurther reading
Bratsberg, Humborstad: Online Scaling in a Highly Available Bratsberg, Humborstad: Online Scaling in a Highly Available Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp 451-460, 2001.451-460, 2001.
Clustra Database: Technical Overview, http://www.clustra.comClustra Database: Technical Overview, http://www.clustra.com Björnerstedt, Ketoja, Sintorn, Sköld: Replication between Björnerstedt, Ketoja, Sintorn, Sköld: Replication between
Geographically Separated Clusters - An Asynchronous Scalable Geographically Separated Clusters - An Asynchronous Scalable Replication Mechanism for Very High Availability, Proceedings of the Replication Mechanism for Very High Availability, Proceedings of the International Workshop on Databases in Telecommunications II, International Workshop on Databases in Telecommunications II, LNCS vol 2209, pp. 102-115, 2001.LNCS vol 2209, pp. 102-115, 2001.
Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Main-Memory Database for Telecommunications, Proceedings of the Main-Memory Database for Telecommunications, Proceedings of the International Workshop on Databases in Telecommunications, LNCS International Workshop on Databases in Telecommunications, LNCS vol 1819, pp 158-173, 2000.vol 1819, pp 158-173, 2000.