Upload
buiphuc
View
258
Download
6
Embed Size (px)
Citation preview
SAP HANA Disaster Recovery with SUSE High Availability Extension
Cleber Paiva de Souza / Gabriel Cavalcante
{cleber,gabriel}@ssys.com.br
S-SYS Systems and Solutions
2
S-SYS and SUSE
• S-SYS officially born in
Jan/2014
• SUSE partner since
beginning
• Formed by professionals
with experience in SUSE
products, Linux in
general, training and
software development
• Acting together with
SUSE engineers in pre-
sales and project delivery
3
Fujitsu Brazil Case: Sap HANA Appliances
• Fujitsu offers Primergy
RX600 S6 and
PRIMEQUEST machines
with SAP HANA.
• SAP HANA HA studies took
place at Fujitsu Platform
Solution Center (PSC).
• Integration between S-SYS
and Fujitsu teams.
• Knowledge transfer allowed
Fujitsu to delivery SAP
HANA integrated with
SUSE High Availability.
4
Fujitsu RX600 hardware specification
• Up to 4x Powerful Intel® Xeon®
processors E7 family
• Expandable to 1 TB of DDR3-
RAM with mirroring support
• Robust I/O design and 10 PCI
Express slots
• 8x hard drive bays and support
up to 8 TB local storage
• Integrated Remote Management
Controller (iRMC) providing
advanced management features
• 4U rack server form factor
5
Fujitsu Rx600 real hardware specification
• 4 Intel® Xeon® E7 4870 2.4
GHz. (10 cores * 2 threads * 4
sockets = 80 cores)
• 1 TB of RAM
• 2 x Fusion-IO PCI cards (1.2
GB in RAID 1)
• 8 x 900 GB SAS 10000 RPMs
in RAID5
• 6 x 10GE Network interfaces
• 6 x 1GE Network interfaces (4
onboard and 2 PCI)
6
Fujitsu Primequest PRIMEQUEST Hardware Specification
Highlights of the new generation
8-Socket Server with up to 4 independent HW partitions and flexible I/O based on latest Intel® Xeon E7-x800v3
Maximum performance by new Intel Haswell-EX processor generation with up to 18 cores
Increased memory capacity and performance by 192x DDR4 DIMM slots with 1866Mhz
System self repair with flexible-IO and reserved-SB functionality 12Gbps RAID controller with 1/2GB cache
All parts are redundant and/or hot swappable Enhanced ServerView management
Improved Enterprise RAS feature set
Product facts
up to 8x Intel Xeon E7-x800v3 (Haswell-EX)
Up to 12 TB RAM (using 192 x 64GB) Up to 24 x 2.5” HDD/SSD´s Up to 16 PCIe slots internal Additional 48 PCIe hot plug slots in 4 x ext. PCI Box
Up to 8 x 10GbE internal with 4 x IOUF
8
Considerations
• We are Linux experts not SAP experts.
• SLES for SAP = SLES 11 + HA extension + SAP support + SAPHanaSR.
• All tests done on SLES for SAP 11 SP3.
• Two-node clusters only (Scale-up / single-box replication).
• AUTOMATED_REGISTER=“false”
• By default SAP HANA instances are not started during boot. Cluster take care of services.
• Synchronous system replication.
• SAP HANA SPS 08 release 85.
9
Definitions
Parameter Value
Cluster node 1 hana01
Cluster node 2 hana02
SID HDB
Instance number 00
User key slehaloc
User Password
hdbadm P@ssword1
sapadm P@ssword1
SYSTEM P@ssword1
slehaloc Password1
10
Problem
• SAP HANA System Replication on SLES for SAP
Applications from June/2014 did not provide
configurations for IPMI as a STONITH resource.
• SAPHanaSR Hawk template does not provide IPMI
configuration.
12
Procedures for setup
1) Install SLES for SAP
2) Configure network interfaces
3) Configure NTP and timezone
4) Setup disk layout
5) Check hostnames and IP addresses
6) Install SAP HANA database
7) Setup HANA
8) Configure SLES HA Extension
9) Testing takeover
10) Stress test
14
Install SLES for SAP
• Install SUSE as usual
– Select pattern SAP HANA Server Base.
– SLES for SAP install minimal network services.
• Register at SUSE Customer Center (SCC) and apply
updates to prevent well-known problem and bugs.
– Updates size ~ 500MB
• SAPHanaSR is available only on SLES for SAP:
– Provide SAPHanaTopology and SAPHana resource agents
– Provide Hawk wizard templates
16
Configure network interfaces
• Define how your network communication will work.
• Define interfaces for user access, heartbeat, data
replication, SAP remote support, STONITH, IPMI etc.
17
Network throughout and redundancy
• Use bonding for aggregation or redundancy. (802.3ad,
balance-rr, active-backup etc)
• Make use of 10GE network interfaces and Infiniband
56 GB.
• High Availability requires redudant paths for network
switches, fibre switches, Infiniband switches etc.
• Monitor your environment.
19
Configure NTP and timezone
• All nodes must be in time sync.
• Cluster could fail if clock are skewed.
• Make use of the same timezone on all nodes or SAP
could misbehavior.
• Trace events on logs
could be hard.
22
Data throughput and redundancy
• Put /hana/log on Fusion-IO for performance.
– 7200 RPMs SATA = ~100 IOPS
– 15000 RPMs SAS = ~200 IOPS
– SSD disks = ~20,000 IOPS
– Fusion-IO = ~140.000 IOPS
• /hana/data e /hana/shared on some RAID layout (1,
5, 6 etc).
24
Check hostnames and IP addresses
• Hostname should be defined before starting SAP
HANA installation.
– SAP HANA stores this information on sapstart service profiles
– Altering hostame after installation will require some changes
in files such as
/usr/sap/<SID>/HDB<instance_number>/<hostname>/sapp
rofile.ini
• Check /etc/hosts consistency.
– All nodes must know other nodes’ IPs to hostname mapping.
• Assign a virtual IP and hostname for the master node
in cluster.
39
Setup HANA (part I)
• Create user for data synchronization on all nodes:
# export PATH="$PATH:/usr/sap/HDB/HDB00/exe"
# hdbsql -u system -i 00 'CREATE USER slehasync PASSWORD Password1'
# hdbsql -u system -i 00 'GRANT DATA ADMIN TO slehasync'
# hdbsql -u system -i 00 'ALTER USER slehasync DISABLE PASSWORD LIFETIME'
• Set password on all nodes:
# hdbuserstore SET slehaloc localhost:30015 slehasync Password1
40
Setup HANA (part II)
• Verify user creation on all nodes:
# hdbuserstore list
DATA FILE : /root/.hdb/hana01/SSFS_HDB.DAT
KEY SLEHALOC
ENV : localhost:30015
USER: slehasync
• Verify query working without asking for password on
all nodes:
# hdbsql -U slehaloc "select * from dummy"
DUMMY
"X"
1 row selected (overall time 2733 usec; server time 115 usec)
41
Setup HANA (part III)
• Defining primary node with user hdbadm:
hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_enable --name=SITE001
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.
• Verifying node state with user hdbadm:
hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 1
site name: SITE001
Host Mappings:
~~~~~~~~~~~~~~
done.
42
Setup HANA (part IV)
• Do first backup:
hana01:~ # hdbsql -u system -i 00 "BACKUP DATA USING FILE ('backup')"
Password:
0 rows affected (overall time 46.986124 sec; server time 46.984819 sec)
• Verify replication status:
hana01:~ # hdbsql -U slehaloc 'select distinct REPLICATION_STATUS from
SYS.M_SERVICE_REPLICATION’
REPLICATION_STATUS
0 rows selected (overall time 1701 usec; server time 401 usec)
43
Setup HANA (part V)
• Define secondary node with user hdbadm:
hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_register --remoteHost=hana01 --
remoteInstance=00 --mode=sync --name=SITE002
adding site ...
checking for inactive nameserver ...
nameserver hana02:30001 not responding.
collecting information ...
updating local ini files ...
done.
• Check secondary node status with user hdbadm:
hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: sync
site id: 2
site name: SITE002
active primary site: 1
44
Setup HANA (part VI)
• Check primary node status with user hdbadm:
hana01:/usr/sap/HDB/HDB00> hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 1
site name: SITE001
Host Mappings:
~~~~~~~~~~~~~~
hana01 -> [SITE001] hana01
hana01 -> [SITE002] hana02
done.
46
Configure SLES HA Extension
• Install pattern “High Availability”.
• Install package SAPHanaSR.
• sleha-init on first node.
• Change /etc/corosync/corosync.conf if necessary.
– udp (multicast) vs udpu (unicast).
– Enable redundant channel and rrp mode.
– Enable security auth.
• sleha-join on second node.
• Keep STONITH disabled during configuration.
47
HA Configuration
• Default / global properties
property $id="cib-bootstrap-options" \
no-quorum-policy="ignore" \
stonith-action="poweroff"
rsc_defaults $id="rsc-options" \
resource-stickiness="1000" \
migration-threshold=3 \
failure-timeout=60
op_defaults $id="op-options" \
timeout="600”
48
HA Configuration
• SAPHanaTopology:
primitive rsc_SAPHanaTopology_HDB_HDB00
ocf:suse:SAPHanaTopology \
params SID="HDB" InstanceNumber="00" \
op monitor interval="10" timeout="600" \
op start interval="0" timeout="600" \
op stop interval="0" timeout="300"
clone cln_SAPHanaTopology_HDB_HDB00
rsc_SAPHanaTopology_HDB_HDB00 \
meta is-managed="true" clone-node-max="1"
interleave="true"
49
HA Configuration
• SAPHana:
primitive rsc_SAPHana_HDB_HDB00 ocf:suse:SAPHana \
params SID="HDB" InstanceNumber="00" PREFER_SITE_TAKEOVER="yes" AUTOMATED_REGISTER="true" DUPLICATE_PRIMARY_TIMEOUT="7200" \
op start interval="0" timeout="3600" \
op stop interval="0" timeout="3600" \
op promote interval="0" timeout="3600" \
op monitor interval="60" role="Master" timeout="700" \
op monitor interval="61" role="Slave" timeout="700" \
meta target-role="Started"
ms msl_SAPHana_HDB_HDB00 rsc_SAPHana_HDB_HDB00 \
meta clone-max="2" clone-node-max="1" interleave="true"
order ord_SAPHana_HDB_HDB00 2000: cln_SAPHanaTopology_HDB_HDB00 msl_SAPHana_HDB_HDB00
50
HA Configuration
• Virtual IP:
primitive rsc_ip_HDB_HDB00 ocf:heartbeat:IPaddr2 \
params ip="10.30.1.1" iflabel="0" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20" \
op monitor interval="10" timeout="20"
colocation col_saphana_ip_HDB_HDB00 2000:
rsc_ip_HDB_HDB00:Started msl_SAPHana_HDB_HDB00:Master
51
HA Configuration
• STONITH IPMI:
primitive stonith_ipmi_hana01 stonith:external/ipmi \
params hostname="hana01" ipaddr="172.16.1.1" userid="admin"
passwd="admin" \
op monitor enabled="true" interval="300" start-delay="5"
timeout="20"
location stonith_ipmi_hana01_not_on_hana01 stonith_ipmi_hana01
-inf: hana01
primitive stonith_ipmi_hana02 stonith:external/ipmi \
params hostname="hana02" ipaddr="172.16.1.2" userid="admin"
passwd="admin" \
op monitor enabled="true" interval="300" start-delay="5"
timeout="20"
location stonith_ipmi_hana02_not_on_hana02 stonith_ipmi_hana02
-inf: hana02
52
Hawk template
• Created custom Hawk template including IPMI as
STONITH. Available at
http://www.ssys.com.br/susecon/tut20056/hawk-
template.tar.gz.
57
Manual takeover
• Secondary become primary with user hdbadm:
hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_takeover
checking local nameserver ...
done.
• Verify new state with user hdbadm:
hana02:/usr/sap/HDB/HDB00> hdbnsutil -sr_state
checking for active or inactive nameserver ...
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 2
site name: SITE002
Host Mappings:
~~~~~~~~~~~~~~
hana02 -> [SITE001] hana01
hana02 -> [SITE002] hana02
done.
58
Cluster takeover
• Set AUTOMATED_REGISTER=”true”.
• Take attention to STONITH. Prefer shutdown instead
of reboot.
• Take attention to timeout (start, stop, migration etc)
60
Stress test
• Detect problem during stress.
• Most of time due to lower timeout.
• HanaStress (https://github.com/Centiq/HanaStress)
hanastress.py -v --host localhost -i 00 -
u SYSTEM -p P@ssword1 -g anarchy --tables
100 --rows 100000 --threads 10
(This will create 100 tables with 100000 rows of
information each, using 10 threads)
61
Cleanup after stress test
• Remove database fragmentation:
– ALTER SYSTEM RECLAIM DATAVOLUME 120 DEFRAGMENT
– ALTER SYSTEM RECLAIM LOG
• Force flushing log data to disk:
– ALTER SYSTEM SAVEPOINT
62
References
• https://www.suse.com/docrep/documents/wvhlogf37z/
sap_hana_system_replication_on_sles_for_sap_appli
cations.pdf
• http://scn.sap.com/docs/DOC-60318
• http://scn.sap.com/docs/DOC-60374
• http://scn.sap.com/docs/DOC-60368
65
+49 911 740 53 0 (Worldwide)www.suse.com
Corporate Headquarters
Maxfeldstrasse 590409 NurembergGermany
Join us on:www.opensuse.org
Unpublished Work of SUSE LLC. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary, and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of
their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,
and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose.
The development, release, and timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at
any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in
this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All
third-party trademarks are the property of their respective owners.