High Availability and Fault- Tolerance in Real-Time Databases Jan Lindström University of Helsinki Department of Computer Science

High Availability and Fault-High Availability and Fault-Tolerance in Real-Time Tolerance in Real-Time DatabasesDatabases

Jan LindströmJan Lindström

University of HelsinkiUniversity of Helsinki

Department of Computer ScienceDepartment of Computer Science

OverviewOverview

The causes of the downtimeThe causes of the downtime Availability solutionsAvailability solutions CASE 1: ClustraCASE 1: Clustra CASE 2: TelORBCASE 2: TelORB CASE 3: RODAINCASE 3: RODAIN

The Causes of DowntimeThe Causes of Downtime

Planned downtimePlanned downtime• Hardware expansionHardware expansion

• Database software upgradesDatabase software upgrades

• Operating system upgradesOperating system upgrades

Unplanned downtimeUnplanned downtime• Hardware failureHardware failure

• OS failureOS failure

• Database software bugsDatabase software bugs

• Power failurePower failure

• DisasterDisaster

• Human errorHuman error

Traditional Availability SolutionsTraditional Availability Solutions

ReplicationReplication FailoverFailover Primary restartPrimary restart

CASE 1: Clustra CASE 1: Clustra

Developed for telephony applications such as Developed for telephony applications such as mobility management and intelligent mobility management and intelligent networks.networks.

Relational database with location and Relational database with location and replication transparency.replication transparency.

Real-Time data locked in main memory and Real-Time data locked in main memory and API provides precompiled transactions.API provides precompiled transactions.

NOT a Real-Time Database !NOT a Real-Time Database !

Clustra hardware architectureClustra hardware architecture

Data distribution and replicationData distribution and replication

How Clustra Handles FailuresHow Clustra Handles Failures

Real-Time failover: Hot-standby data is up to date, so Real-Time failover: Hot-standby data is up to date, so failover occurs in milliseconds.failover occurs in milliseconds.

Automatic restart and takeback: Restart of the failed node Automatic restart and takeback: Restart of the failed node and takeback of operations is automatic, and again and takeback of operations is automatic, and again transparent to users and operators.transparent to users and operators.

Self-repair: If a node fails completely, data is copied from Self-repair: If a node fails completely, data is copied from the complementary node to standby. This is also automatic the complementary node to standby. This is also automatic and transparent.and transparent.

Limited failure effectsLimited failure effects

How Clustra Handles UpgadesHow Clustra Handles Upgades

Hardware, operating system, and database Hardware, operating system, and database software upgrades without ever going software upgrades without ever going down.down.• Process called “rolling upgrade”Process called “rolling upgrade”

– I.e. required changes are performed node by node.I.e. required changes are performed node by node.

– Each node upgraded to catch up to the status of Each node upgraded to catch up to the status of complementary node.complementary node.

– When this is completed, the operation is performed When this is completed, the operation is performed to next node.to next node.

CASE 2: TelORBCASE 2: TelORB

CharacteristicsCharacteristics Very high availability (HA), robustness implemented in SWVery high availability (HA), robustness implemented in SW (soft) Real Time(soft) Real Time Scalability by using loosely coupled processorsScalability by using loosely coupled processors

OpennessOpenness Hardware: Intel/Pentium Hardware: Intel/Pentium Language: C++, JavaLanguage: C++, Java Interoperability: CORBA/IIOP, TCP/IP, Java RMIInteroperability: CORBA/IIOP, TCP/IP, Java RMI 3:rd party SW: Java3:rd party SW: Java

TelORB AvailabilityTelORB Availability

Real-time object-oriented DBMSReal-time object-oriented DBMS supporting supporting Distributed TransactionsDistributed Transactions ACID propertiesACID properties expected from a DBMS expected from a DBMS Data ReplicationData Replication (providing redundancy) (providing redundancy) Network RedundancyNetwork Redundancy

Software Configuration ControlSoftware Configuration Control Automatic restart of processes that originally executed Automatic restart of processes that originally executed

on a faulty processor on the ones that are working on a faulty processor on the ones that are working Self healingSelf healing

In service upgradeIn service upgrade of software with no disturbance to operation of software with no disturbance to operation

Hot replacementHot replacement of faulty processors of faulty processors

Automatic ReconfigurationAutomatic Reconfiguration

reloading

Software upgradeSoftware upgrade

Smooth software upgrade when old and Smooth software upgrade when old and new version of same process can coexistnew version of same process can coexist

Possibility for application to arrange for Possibility for application to arrange for state transfer between old and new static state transfer between old and new static process (unless important states aren’t process (unless important states aren’t already stored in the database)already stored in the database)

Partioning: Types and DataPartioning: Types and Data

21 221817

A B

2019 2019 A

B

1817

21 22

AdvantagesAdvantages

Standard interfaces throughStandard interfaces through Corba Corba

Standard languagesStandard languages: C++, Java: C++, Java

Based onBased on commercial hardware commercial hardware

(Soft)(Soft) Real-time OS Real-time OS

Fault tolerance Fault tolerance implemented in softwareimplemented in software

FullyFully scalable scalable architecturearchitecture

Includes powerfulIncludes powerful middleware: middleware: AA database management system database management system and and functions forfunctions for software management software management

Fully compatibleFully compatible simulated environment simulated environment for development onfor development on Unix/Linux/NTUnix/Linux/NT workstations workstations

CASE 3: RODAINCASE 3: RODAIN

Real-Time Object-Oriented Database Real-Time Object-Oriented Database Architechture for Intelligent NetworksArchitechture for Intelligent Networks

Real-Time Main-Memory Database System Real-Time Main-Memory Database System Runs on Real-Time OS: Chorus/ClassiX Runs on Real-Time OS: Chorus/ClassiX

(and Linux)(and Linux)

Rodain ClusterRodain Cluster

Rodain Database NodeRodain Database Node

Distributed DatabaseSubsystem

User Request Interpreter Subsystem

Watchdog Subsystem

Fault-Tolerance andRecovery Subsystem

Object-OrientedDatabaseManagementSubsystem

Database Primary Unit


Watchdog Subsystem


Database Mirror Unit



shared

disk



Watchdog Subsystem



Database Primary Unit


Watchdog Subsystem


Database Mirror Unit



shared

disk

RODAIN Database Node IIRODAIN Database Node II

ORD ArchitechtureORD Architechture

TRP

FTRS

DDSORD

OCC DataIndex

Fault-ToleranceFault-Tolerance

Based on logs and mirroringBased on logs and mirroring Logs send to MirrorLogs send to Mirror Mirror stores the logs on disk in SSSMirror stores the logs on disk in SSS Mirror maintains copy of main-memory Mirror maintains copy of main-memory

databasedatabase Mirror makes disk copies of its database Mirror makes disk copies of its database

imageimage

RecoveryRecovery

Based on role switchingBased on role switching When Primary failsWhen Primary fails

• Mirror updates its MMDB up to dateMirror updates its MMDB up to date• Mirror starts acting as new PrimaryMirror starts acting as new Primary• Active transactions are restarted or lostActive transactions are restarted or lost

When Mirror failsWhen Mirror fails• Primary stores logs directly to SSSPrimary stores logs directly to SSS

Recovery IIRecovery II

During recovery the failed NodeDuring recovery the failed Node• always starts as a mirror nodealways starts as a mirror node• loads most recent database image from disks in loads most recent database image from disks in

SSSSSS• updates the log tail to loaded imageupdates the log tail to loaded image• receives the logs from primary nodereceives the logs from primary node• continues as normal mirror nodecontinues as normal mirror node

Further readingFurther reading

Bratsberg, Humborstad: Online Scaling in a Highly Available Bratsberg, Humborstad: Online Scaling in a Highly Available Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp Database, Proceedings of the 27th VLDB Conference, Rome, Italy, pp 451-460, 2001.451-460, 2001.

Clustra Database: Technical Overview, http://www.clustra.comClustra Database: Technical Overview, http://www.clustra.com Björnerstedt, Ketoja, Sintorn, Sköld: Replication between Björnerstedt, Ketoja, Sintorn, Sköld: Replication between

Geographically Separated Clusters - An Asynchronous Scalable Geographically Separated Clusters - An Asynchronous Scalable Replication Mechanism for Very High Availability, Proceedings of the Replication Mechanism for Very High Availability, Proceedings of the International Workshop on Databases in Telecommunications II, International Workshop on Databases in Telecommunications II, LNCS vol 2209, pp. 102-115, 2001.LNCS vol 2209, pp. 102-115, 2001.

Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Lindström, Niklander, Porkka, Raatikainen: A Distributed Real-Time Main-Memory Database for Telecommunications, Proceedings of the Main-Memory Database for Telecommunications, Proceedings of the International Workshop on Databases in Telecommunications, LNCS International Workshop on Databases in Telecommunications, LNCS vol 1819, pp 158-173, 2000.vol 1819, pp 158-173, 2000.

Documents

High Availability and Fault- Tolerance in Real-Time Databases Jan Lindström University of Helsinki Department of Computer Science