24
High Availability for Highly Reliable Systems Mike Friesenegger SUSE Wednesday, February 6, 2013 Session: 12367

High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

High Availability for Highly Reliable Systems

Mike FrieseneggerSUSE

Wednesday, February 6, 2013Session: 12367

Page 2: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Agenda

• What is a high availability (HA) cluster?• What is required to build a HA cluster using SLES?• Let's build a two node cluster...

2

Page 3: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Challenge

• Faults will occur

– Hardware crash, flood, fire, power outage, earthquake?

• Can you afford a service outage or worse, loss of data?

– You might afford a five second blip, but can you afford a longer outage?

• How much does downtime cost?

Murphy's Law is Universal

Can you afford low availability systems?

Page 4: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

What is a high availability cluster?

Page 5: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

A Simple HA Cluster

Heartbeat

Node 1 Node 3

SAN

MASTERNode 2

Page 6: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Resources Running in the Cluster

Node 1 Node 3

MASTERNode 2

Heartbeat

SAN

Page 7: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Migrating a Resource

Node 1 Node 3

MASTERNode 2

Heartbeat

SAN

Page 8: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

MASTERNode 2

Node Failure in the Cluster

Node 1 Node 3

Heartbeat

SAN

Page 9: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

MASTERNode 2

STONITH the Failed Node Out of the Cluster

Node 1 Node 3

Heartbeat

SAN

Page 10: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Node 2

Resources brought up on other nodes the Cluster

Node 1 MASTERNode 3

Heartbeat

SAN

Page 11: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

What is required to build a HA cluster using SUSE Linux Enterprise Server?

Page 12: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

● Most modern and complete open source solution for implementing highly available Linux clusters

● A suite of robust open source technologies that is:

– Affordable

– Integrated

– Virtualization aware

● Used with SUSE Linux Enterprise Server, it helps you:

– Maintain business continuity

– Protect data integrity

– Reduce unplanned downtime for your mission-critical Linux workloads

SUSE Linux Enterprise High Availability Extension

Page 13: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

The stack includes:

– resource-agents – manage and monitor availability of services

– stonith – IO fencing support (snIPL - simple network IPL for System z)

– corosync and OpenAIS – cluster infrastructure

– Pacemaker – cluster resource manager

– CRM GUI – graphical interface for cluster resource and dependencies editing

– hawk – Web console for cluster monitoring and administration

– CLI – improved command line to interact with the CIB: editing, prepare multiple changes - commit once, syntax validation, etc.

Linux High Availability Stack

Page 14: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Cluster Diagram

Kernel

VM1

LAMPApache

IPext3

Kernel Kernel

Corosync + openAIS

Pacemaker

DLM

cLVM2+OCFS2

VM2

Network Links

Clients

Storage

Page 15: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

● Achieve high availability of mission-critical services

● Active/active services

– OCFS2, Databases, Samba File Servers

● Active/passive service fail-over

– Traditional databases, SAP setups, most regular services

● Private Cloud

– HA, automation and orchestration for managed VMs

● High availability across guests

– Build HA on top of a non-HA cloud

● Remote clustering

– Local (GA), Metro (SP1), and Geographical (SP2) area clusters

Key Use Cases

Page 16: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

● Local cluster

– Negligible network latency

– Typically synchronous concurrent storage access

● Metro area (stretched) cluster

– Network latency <15ms (~20 miles)

– Unified / redundant network between sites

– Usually some form of replication at the storage level

● Geo clustering

– High network latency, limited bandwidth

– Asynchronous storage replication

From Local to GEO

Page 17: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Local Cluster

SLESSLE HA

SLESSLE HA

Clients

Page 18: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Metro (Stretched) Cluster

SLESSLE HA

SLESSLE HA

SLESSLE HA

SLESSLE HA

Clients

Page 19: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

GEO Cluster

Site A Site B

(Arbitrator)

boothd

boothd boothd

Node 1 Node 2 Node 7 Node 8

Site C

Page 20: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

● Cluster fail-over between different data center locations

– Provide disaster resilience in case of site failure

– Each site is a self-contained, autonomous cluster

– Support both manual and automatic switch-/fail-over

● Extends Metro Cluster capabilities

– No distance limit between data centers

– No unified storage / network needed

● Storage replicated as active / passive

– Leverage Distributed Replicated Block Device (DRBD)

– Can integrate third-party solutions

Geo Clustering for SUSE Linux Enterprise High Availability Extension

Page 21: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

● SUSE® Linux Enterprise High Availability Extension– Sold as annual support subscriptions – 1 or 3-year– Inherits the support level of the underlying SUSE Linux Enterprise Server

subscription– Only charged for x86 and x86_64 – included for free with Itanium, IBM

POWER and IBM System z subscriptions● Geo Clustering for SUSE Linux Enterprise High Availability

Extension– Sold as annual support subscriptions – 1 or 3-year– Additional option for the SUSE Linux Enterprise High Availability Extension

● Each system participating in the Geo Cluster needs a subscription for GEOClustering, the High Availability Extension and SUSE Linux Enterprise Server

– Inherits the support level of the underlying SUSE Linux Enterprise Server subscription

Getting the SLE High Availability Extension

Page 22: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Let's build a two node cluster...

Page 23: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Session Evaluation12367 - High Availability for Highly Reliable Systems

23

Page 24: High Availability for Highly Reliable Systems · Achieve high availability of mission-critical services Active/active services – OCFS2, Databases, Samba File Servers Active/passive

Corporate HeadquartersMaxfeldstrasse 590409 NurembergGermany

+49 911 740 53 0 (Worldwide)www.suse.com

Join us on:www.opensuse.org

24