46
Best Practice Guide June 29, 2011 Applicable to: SANsymphony-V R8 PSP1 or greater SANsymphony 7.0 PSP 4 or greater SANmelody 3.0 PSP 4 or greater Overview This document does not supersede technical knowledge taught in DataCore training courses nor professional skills in working with SAN and storage environments. The installation and configuration described in this document made use of third party software and hardware. DataCore does not make any warranties or guarantees concerning such third party software and hardware. It is intended solely as an aid for installing and configuring Storage Virtualization solutions with DataCore™ software products. This document does not provide a warranty to any DataCore software, equipment, or service, nor does it imply product availability. DataCore is not responsible for the use of this document and does not guarantee the results of its use. DataCore does not warrant or guarantee that anyone will be able to recreate or achieve the results described in this document. Best Practice Guide Cumulative Change Summary Date Changed Memory section to be more general and what amount of cache can be used with different Windows versions, updated hardware section regarding PCI buses, added ISCSI settings for Windows 2008 servers, added FAQ numbers to several sections to point to more specific explanations instead of general ones, changed application server section to point to the Technical bulletins on DataCore support site, added information about running two different fail over applications on the same Windows application server, added FAQ for DataCore Manuals and Administration Guides. Added note for larger than 2TB storage LUNs support on SANsymphony 7.0.3 and later and SANmelody 3.0.3 and later , added note for AIM source buffer July 1, 2010 Updated Windows Operating System Settings adding User-Mode Crash Dump settings. Updated Network Configuration adding dedicated network link required. Updated information on use of Redundant mirror paths with FC Mirror Ports and not for iSCSI mirror paths. Added use of Long I/O metrics to section 6 Pool Performance Aspects. Updated Section 7 - Synchronous Mirroring over Long Distances with example. Edited section 8 - Snapshot Best practices about reasons for 1MB SAU size for Snapshot Destination Pool. October 22, 2010 Added SANsymphony-V and its new terminology. Removed references to SANmelody 2.x and SANsymphony 6.x February 2, 2011 Put link in for Pre-requisites otherwise they will be out of date February 22, 2011 Entered Network Connection paragraph on page 6 April 08, 2011 Added recommendation to not disable the DataCore iSCSI driver/adapter for those channels which won't be used for iSCSI on p.25 June 29, 2011 COPYRIGHT Copyright © 2011 by DataCore Software Corporation. All rights reserved. DataCore, the DataCore logo, SANsymphony, and SANmelody are trademarks of DataCore Software Corporation. Other DataCore product or service names or logos referenced herein are trademarks of DataCore Software Corporation. All other products, services and company names mentioned herein may be trademarks of their respective owners. ALTHOUGH THE MATERIAL PRESENTED IN THIS DOCUMENT IS BELIEVED TO BE ACCURATE, IT IS PROVIDED “AS IS” AND USERS MUST TAKE ALL RESPONSIBILITY FOR THE USE OR APPLICATION OF THE PRODUCTS DESCRIBED AND THE INFORMATION CONTAINED IN THIS DOCUMENT. NEITHER DATACORE NOR ITS SUPPLIERS MAKE ANY EXPRESS OR IMPLIED REPRESENTATION, WARRANTY OR ENDORSEMENT REGARDING, AND SHALL HAVE NO LIABILITY FOR, THE USE OR APPLICATION OF ANY DATACORE OR THIRD PARTY PRODUCTS OR THE OTHER INFORMATION REFERRED TO IN THIS DOCUMENT. ALL SUCH WARRANTIES (INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE AND AGAINST HIDDEN DEFECTS) AND LIABILITY ARE HEREBY DISCLAIMED TO THE FULLEST EXTENT PERMITTED BY LAW. No part of this document may be copied, reproduced, translated or reduced to any electronic medium or machine-readable form without the prior written consent of DataCore Software Corporation

Datacore - Best Practices

Embed Size (px)

Citation preview

Page 1: Datacore - Best Practices

Best Practice Guide June 29, 2011 Applicable to: SANsymphony-V R8 PSP1 or greater SANsymphony 7.0 PSP 4 or greater SANmelody 3.0 PSP 4 or greater

Overview This document does not supersede technical knowledge taught in DataCore training courses nor professional skills in working with SAN and storage environments. The installation and configuration described in this document made use of third party software and hardware. DataCore does not make any warranties or guarantees concerning such third party software and hardware. It is intended solely as an aid for installing and configuring Storage Virtualization solutions with DataCore™ software products. This document does not provide a warranty to any DataCore software, equipment, or service, nor does it imply product availability. DataCore is not responsible for the use of this document and does not guarantee the results of its use. DataCore does not warrant or guarantee that anyone will be able to recreate or achieve the results described in this document.

Best Practice Guide Cumulative Change Summary Date

Changed Memory section to be more general and what amount of cache can be used with different Windows versions, updated hardware section regarding PCI buses, added ISCSI settings for Windows 2008 servers, added FAQ numbers to several sections to point to more specific explanations instead of general ones, changed application server section to point to the Technical bulletins on DataCore support site, added information about running two different fail over applications on the same Windows application server, added FAQ for DataCore Manuals and Administration Guides. Added note for larger than 2TB storage LUNs support on SANsymphony 7.0.3 and later and SANmelody 3.0.3 and later , added note for AIM source buffer

July 1, 2010

Updated Windows Operating System Settings adding User-Mode Crash Dump settings. Updated Network Configuration adding dedicated network link required. Updated information on use of Redundant mirror paths with FC Mirror Ports and not for iSCSI mirror paths. Added use of Long I/O metrics to section 6 Pool Performance Aspects. Updated Section 7 - Synchronous Mirroring over Long Distances with example. Edited section 8 - Snapshot Best practices about reasons for 1MB SAU size for Snapshot Destination Pool.

October 22, 2010

Added SANsymphony-V and its new terminology. Removed references to SANmelody 2.x and SANsymphony 6.x February 2, 2011

Put link in for Pre-requisites otherwise they will be out of date February 22, 2011

Entered Network Connection paragraph on page 6 April 08, 2011

Added recommendation to not disable the DataCore iSCSI driver/adapter for those channels which won't be used for iSCSI on p.25 June 29, 2011

COPYRIGHT

Copyright © 2011 by DataCore Software Corporation. All rights reserved.

DataCore, the DataCore logo, SANsymphony, and SANmelody are trademarks of DataCore Software Corporation. Other DataCore product or service names or logos referenced herein are trademarks of DataCore Software Corporation. All other products, services and company names mentioned herein may be trademarks of their respective owners.

ALTHOUGH THE MATERIAL PRESENTED IN THIS DOCUMENT IS BELIEVED TO BE ACCURATE, IT IS PROVIDED “AS IS” AND USERS MUST TAKE ALL RESPONSIBILITY FOR THE USE OR APPLICATION OF THE PRODUCTS DESCRIBED AND THE INFORMATION CONTAINED IN THIS DOCUMENT. NEITHER DATACORE NOR ITS SUPPLIERS MAKE ANY EXPRESS OR IMPLIED REPRESENTATION, WARRANTY OR ENDORSEMENT REGARDING, AND SHALL HAVE NO LIABILITY FOR, THE USE OR APPLICATION OF ANY DATACORE OR THIRD PARTY PRODUCTS OR THE OTHER INFORMATION REFERRED TO IN THIS DOCUMENT. ALL SUCH WARRANTIES (INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT, FITNESS FOR A PARTICULAR PURPOSE AND AGAINST HIDDEN DEFECTS) AND LIABILITY ARE HEREBY DISCLAIMED TO THE FULLEST EXTENT PERMITTED BY LAW.

No part of this document may be copied, reproduced, translated or reduced to any electronic medium or machine-readable form without the prior written consent of DataCore Software Corporation

Page 2: Datacore - Best Practices

Table of Contents

1 – General Outline 3 2 – High Level Design 4 3 – DataCore storage server Hardware Design Guidelines 5

Hardware & Software Requirements 5 General Hardware recommendations 5 Determine performance requirements 7 Number of Fibre Channel ports 8 Number of iSCSI ports 9 Amount and Speed of CPUs 10 Amount of Memory 11 RAM Sizing 11

4 – DataCore Storage Server Operating System Configuration 12 General Operating System Notes 12 Windows Operating System Settings 13 Network Configuration 14 Securing a DataCore Storage Server 16 Backup and restore a DataCore Storage Server 18

5 – SAN Design Guide 20 Avoiding Single Points of Failure (SPOF) 20 Fibre Channel Cabling and Port Settings 21 Zoning Considerations (Fibre Channel Switch Configuration) 23 iSCSI (LAN) Cabling and Port Settings 25 6 – Pool performance aspects 28 Understanding DataCore thin provisioning Technology 28 Disk Types 29 Amount of Disks 30 RAID Layout 30 General Storage/Disk Pool notes 35 Long I/O Metrics (For SANmelody 3.x or SANsymphony 7.x only) 36

7 – Synchronous Mirror Best Practices 37 Match Size and Performance Characteristics 37 Synchronous Mirroring over Long Distances 38

8 – Snapshot Best Practices 40 Snapshot Destinations Pool Considerations 40 Snapshot Performance Considerations 40

9 – AIM/Replication Best Practices 42 Relocate page files to non-replicated Virtual volumes/vdisks 42 Link Throughput 42 Buffer Location and Size 42 Backup Applications use timestamps 43 Use Snapshots to Access AIM/Replication Destination Volumes 43

10 – Continuous Data Protection (CDP) 44 11 – Application Server/Host Best Practices 45

Page 3: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 3

1 – General Outline

Intention of this document This document was created to be an aid for those who install and configure DataCore Software Storage Virtualization solutions. It is basically a collection of insights which have proved to be beneficial over time. It documents good storage and storage network design as well as good software configurations that optimize utility (stability), performance, and availability. The Best Practice Guide is intended for use by trained personnel. We assume that standard terms such as virtual volume/vdisk, pools, etc. are understood. We also assume that common tasks such as creation of pools and virtual volumes/vdisks, connection of Application Servers/Hosts and mapping/serving volumes to Application Servers/Hosts are also understood. Best Practices are flexible Each customer environment is unique, which makes giving general advice somewhat difficult. The recommendations given in this document are guidelines – not rules. Even following all recommendations in this guide does not necessarily mean that it will be perfect in any regard due to the dependencies on individual factors. However, following the guidelines should most likely provide a stable, well-performing and secure system.

MEL205 - SANmelody Implementation & Administration

Guidelines do not supersede technical training or professional skills This document does not supersede DataCore technical training courses provided by DataCore Software or a DataCore Authorized Training Partner. Attending one or more of the following training courses is mandatory for any installer of SANmelody or SANsymphony high availability (HA) environments:

SYM205 - SANsymphony Implementation and Management SYM301 - Advanced Practices Course SYMV8P - SANsymphony-V R8 Administration

Also necessary is holding a valid DataCore Certified Implementation Engineer (DCIE) certification. In addition professional skills in dealing with storage devices, RAID technology, Storage Area Network (SAN) infrastructures, Fibre Channel and/or iSCSI protocol are necessary. If you do not fulfill the above mentioned points and/or have any difficulties understanding terms or procedures described in this document, please contact your DataCore sales representative or DataCore training department for information on obtaining a DCIE certification and the required skills.

Page 4: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 4

2 – High Level Design

First of all, prior to any task of system planning or implementation being performed, a Theory of Operation should be developed. The Theory of Operation is a description of how a

Theory of Operation

system should work and what goals must be achieved. Operation, especially in terms of availability or safety, is always an end-to-end process from the user to the data or service. Storage or storage services cannot be considered separately since they are not isolated instances but integrated pieces of IT infrastructure. Several books have been written about that topic which this document does not want to supersede. The following list will provide some aspects of a safe IT environment which are too often overlooked. Keep it simple Avoid complexity at any level. Sophisticated environments may be smart in many aspects – however, if they are error-prone, unstable or complex to support, a more simple approach is often more beneficial.

Any dependency to possible single point of failure should be avoided. Some dependencies are often disregarded but can impact a whole environment.

Separation is key

Keep storage components away from the public network. Limit users who can access (or are aware of the existence of) a central device.

Distribute devices across separate racks, rooms, buildings and sites. Create separated hazard zones to isolate disruptive impacts.

Consider redundant power sources. Avoid connecting redundant devices to the same power circuit. Use UPS protected power supply and connect every

Regard failsafe networks (such as LAN, WAN, SAN infrastructure) – a highly-available IT system may become worthless quickly if it can be no longer accessed due to a network outage.

device to it. For example, a UPS back-up does not help much if the UPS fails to notify the Application Server/Host to shut down because a management LAN switch was considered as 'minor' and therefore not connected to the UPS backed power circuit.

Do not forget environmental components (air condition, physical location, etc.). A non-redundant failed air conditioner may collapse all redundant systems located in the same datacenter. Separated rooms on the same floor may be affected by a pipe burst at the same time. Datacenters in the same building may be affected by a fire if a coffee machine inflamed a curtain somewhere else in the building.

Control access DataCore Storage Servers should be accessed by qualified (trained and skilled) personnel only. Make sure that everyone understands the difference between a 'normal' server and a DataCore Storage Server as explained in this document. Monitoring and notification Highly available systems typically recover automatically from errors or keep the service alive even if half of the environment fails. However, those conditions must be recognized and fixed as soon as possible by the responsible personnel to avoid future problems. Knowledge and documentation Document environment well, keep the documentation up-to-date and available. Establish 'shared knowledge' – make sure that at least two people are familiar with a particular entity at any time.

Page 5: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 5

3 – DataCore storage server Hardware Design Guidelines

Hardware & Software Requirements

For minimum hardware and software requirements as well as supported hardware and software (Fibre Channel HBAs, Switches, 3rd Party Software etc.) check Qualified Lists available on the DataCore Support Web site: http://datacore.custhelp.com/app/answers/detail/a_id/283 If you will be running the DataCore Storage Server in a Virtual Machine please check FAQ #1155 http://datacore.custhelp.com/app/answers/detail/a_id/1155

General Hardware recommendations

DataCore storage servers play a key role for performance and availability in a storage environment since all traffic between Application Server/Hosts and disk devices flows through these appliances. Therefore the hardware platform hosting DataCore SANmelody or SANsymphony should be a machine of adequate quality. An industry standard server of any of the major brands is normally a good choice. Important points: The boot drive (C:\) should be two hard disks in RAID 1 configuration Use separate RAID controllers for boot drive (C:\) and RAID sets to be used for virtual volumes/vdisks or

AIM/Replication buffers Equip the server with redundant power supplies Cover hardware components with an appropriate service contract to ensure quick repair in case of failure Network connection that does on rely on DNS so that inter-node communication can always take place DataCore Storage Servers should be protected against sudden power outages (UPS)

PCI bus RAID controllers, Fibre Channel HBAs and iSCSI NICs can generate a lot of traffic which needs to be transported over the PCI buses of the DataCore Storage Server. When selecting the hardware platform for the DataCore Storage Server make sure that the PCI bus system can handle the expected workload. PCI-Express (PCIe) is a serial bus architecture and best used with other serial protocols like Fibre Channel, SAS/SATA and iSCSI. For a DataCore Storage Server, adequate server hardware with appropriate/independent PCIe buses should be chosen rather than workstation or blade server hardware which is typically not designed for heavy backplane IO.

Here a rule of thumb applies: Low-end RAID controllers deliver low-end performance. For example, an integrated onboard RAID controller that comes free with a server box may be sufficient to control two disks in a RAID 1 configuration for the boot drive. RAID controllers which are capable of controlling numerous disks and

RAID controllers RAID controllers or SAN storage controllers which are used to control physical disk drives used for DataCore virtual volumes/vdisks have an essential requirement—they must be capable of delivering high performance. Bear in mind that those controllers typically do not host disks for one server, but need to handle a heavy load of IO requests from multiple Application Servers/Hosts.

Page 6: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 6

delivering adequate performance typically have their own CPU (RAID accelerator chip), battery protected cache, multiple channels/ports etc. Fibre Channel HBAs / iSCSI interfaces Fibre Channel HBAs and network adapters used for iSCSI are available in single-port, dual-port and quad-port cards. From a performance standpoint, there is often not much of a difference if two single-port cards or one dual-port card is used. From an availability standpoint there is a difference. If a dual-port or quad-port card fails, most likely all of the ports are affected. For minimizing the risk of multiple channel/port failures, a larger number of cards are preferred. Network Connection The network connection between DataCore Servers is critical for inter-node communication. User Interface updates also require that the DataCore Servers can communicate to each other. Name resolution is needed for proper communication so if the DNS is down or has stale information this can result in delayed responses. It is recommended that this link be dedicated and not shared for iSCSI, AIM/Replication or other server management activities.

Page 7: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 7

Determine performance requirements

A DataCore Storage Server’s hardware design is key to superior performance. Before the hardware is set up, learn what the requirements are first. In a Storage Area Network (SAN) without DataCore being involved all IOs from the Application Servers/Hosts to the disks and back are transported by the SAN infrastructure. This is known as the total throughput and measured in IOs (IO/s) per second and Megabytes per second (MB/s). The value IO/s declares how many IO operations per second are processed while MB/s specifies the amount of data being transported. Both values correlate with each other, but must be considered separately. For example, a database application which creates many small IOs may generate a high amount of IO/s, but little MB/s due to the fact that the IOs are small. A media streaming application may generate massive MB/s with much fewer IO/s. As a rule, there is never a sustained throughput. Workload may vary highly over time. User behavior influences workload, especially application tasks like backup jobs, archive runs, data mining, invoicing etc. may generate load peaks. In a DataCore environment, all IOs go through SANmelody, SANsymphony or SANsymphony-V Storage Servers. In order to ensure an appropriate hardware design, the performance requirements must be known. Good sources for getting performance analysis are management applications, SAN switch logs, performance monitoring, etc. If those values are unknown and cannot be measured, they may be estimated. However, hardware design based on assumptions may turn out to be insufficient and may need to be adjusted. When designing a DataCore Storage Server, three points are crucial: Number of Fibre Channel (FC) / iSCSI ports Number of CPUs Amount of memory

Those requirements can be calculated as follows.

SAN Throughput IO/s & MB/s

Disks / Storage Arrays

Application server/host

Page 8: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 8

Number of Fibre Channel ports

Table 1 – Maximum HBA performance (according to vendor specification)

Port Speed IO/s MB/s half-duplex MB/s full-duplex 2 Gb/s 100000 200 400

4 Gb/s 150000 400 800

8 Gb/s 200000 800 1600

Above table (Table 1) shows the maximum values that Fibre Channel ports can achieve. They are specifications and manufacturer information; they do not reflect the effective amount of user data which can be transported. In “real life” scenarios, Fibre Channel ports may not be utilized over 66% of their maximum IO/s due to network protocol overhead. Table 2 shows more realistic values. Table 2 – Realistic values*

Port Speed IO/s MB/s half-duplex MB/s full-duplex 2 Gb/s 65000 180 360

4 Gb/s 100000 360 720

8 Gb/s 130000 720 1440

* These values are theoretical averages. In practice the real performance depends on many factors and may be higher or lower. However, from a practical standpoint these values can be used for calculations. On the basis of these values, the number of Fibre Channel lines needed to carry the load between Application Servers/Hosts and disks can be determined. The amount of lines must be appropriate to satisfy load peaks and should leave some space for future growth – do not choose hardware that is minimally sufficient.

Symmetric design Best practice design rule is using separate physical channels/ports for “frontend” traffic (to Application Servers/Hosts), “backend” traffic (to storage/disks) and mirror traffic (to other DataCore Storage Servers). Basically it is possible to share channels/ports. However, a clean design employs dedicated ports.

A rule of thumb applies: Equal count of HBAs for frontend, backend and mirror channels/ports. If number of lines were calculated with 2, in total 6 Fibre Channel ports are needed: 2 dedicated ports for frontend traffic 2 dedicated ports for backend traffic 2 dedicated ports for mirror traffic If equipped with 4 Gb/s FC-HBAs, this DataCore Storage Server should be able to transport 200,000 IO/s equal 720 MB/s.

Frontend

Backend

Mirror

DataCore Storage Server

Page 9: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 9

Number of iSCSI ports

Amount of required network interface cards (NIC) used for iSCSI traffic is more difficult to determine than the number of Fibre Channel ports. iSCSI performance typically has higher dependencies on external factors (quality of iSCSI initiators, network infrastructure, CPU power etc.). iSCSI is often combined with other interconnect technologies like shown in diagram below. The table below shows average iSCSI performance values. Note that these values are just indications. The nature of iSCSI is that throughput highly depends on lots of factors.

CPU considerations for iSCSI iSCSI traffic basically generates more CPU load than Fibre Channel due to the fact that most iSCSI protocol overhead (encapsulation of SCSI commands in IP packages) is handled by DataCore Storage Servers CPUs. Typically iSCSI initiator ports consume more CPU cycles than iSCSI target ports. There is no rule of thumb for calculating CPU load for iSCSI traffic as this depends on the Operating system as well as on the CPU architecture and number of NIC cards/ports. This is what you should have at least: For every 2 iSCSI Target ports (frontend ports), add one additional CPU (core). For every 2 iSCSI Initiator ports (mirror/backend), add one additional CPU (core).

Port Speed IO/s MB/s 1 Gb/s 12000 80

10 Gb/s 80000 530

Frontend (iSCSI)

Backend (SAS/SATA)

Mirror (Fibre Channel)

DataCore Storage Server

Page 10: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 10

Amount and Speed of CPUs

Amount of CPUs or CPU cores Depending on how many IO/s must be processed by the DataCore Storage Server; the appropriate amount of CPUs must be selected. Modern server CPUs are able to execute millions of commands per second. CPU power is often underestimated and as a result DataCore Storage Servers are frequently “overpowered” in terms of CPU power. Depending on how many independent PCI buses the PCI architecture provides One CPU can handle one FC-HBA triplet (frontend, backend, mirror channel/port) or two dual port FC HBAs. Following this rule, a DataCore Storage Server with 6 Fibre Channel ports needs 2 CPUs. Or the other way around—a quad-core is adequate to serve 12 Fibre Channel ports (4 frontend, 4 backend, 4 mirror channels/ports). This rule assumes the usage of latest available CPU technology. iSCSI note: Please be aware that iSCSI traffic causes much higher CPU load than Fibre Channel because the encapsulation of SCSI commands in IP packages (and vice versa) is performed by the CPU. A DataCore Storage Server which handles heavy iSCSI traffic might have significantly higher demand for CPU power. Speed of CPU The faster a CPU runs, the more commands it can execute per second. However, practical experience has shown that clock speed is a secondary factor in terms of IO performance. Two slower CPUs are preferred over one fast one. This does not mean that clock speed does not matter. It means within a CPU family the clock speed difference is minor, for example 3.0 GHz compared to 3.2 GHz clock speed of the same CPU type. Type of CPU Basically any x86 or x64 CPU is adequate; DataCore Software has not recognized any significant performance difference between CPUs from different vendors with similar architecture and clock speed. However, we recommend the use of “server class” CPUs instead of CPUs which are intended to be used in consumer workstations. Intel Itanium processors are not supported in DataCore Storage Servers.

Page 11: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 11

Amount of Memory

A portion of memory (RAM) of a DataCore Storage Server is allocated to DataCore Cache and is used for Read/Write caching of IO operations between Application Server/Hosts and the physical disks in the storage backend. Cache is the “workhorse” and should not be undersized. Cache that is oversized can drive up the cost of hardware and delivers no real benefit. So it is important to size the amount of RAM used in the DataCore Storage Server properly. Maximum RAM used for DataCore Cache:

SANsymphony-V R8 1TB

SANmelody 3.0 x64 bit OS 1TB

SANsymphony 7.0 x64 OS 1TB

SANmelody 3.0 x32 bit OS 20GB

SANsymphony 7.0 x32 bit OS 20GB

The Windows operating system of a DataCore Storage Server does not need much system memory, 2 GB is usually sufficient. DataCore Cache is set to use about 80% of the total amount of physical RAM. This value can be adjusted if necessary but should not be unless advised by DataCore Technical Support or published in Technical Bulletin 7a/b. With a 32bit OS DataCore Storage Server, with a large amount of RAM, it is recommended to monitor Page Table Entries (PTEs) usage and non paged pool memory usage to not risk an unresponsive DataCore Storage Server or even a crash. See Technical Bulletin 7a/b for more information. When more than 20 GB RAM of memory is installed in a 32bit OS, SANmelody/SANsymphony will not allocate more than 20 GB. The remaining memory remains available to the Windows operating system but cannot be allocated to DataCore Cache. SANmelody 3.0, SANsymphony 7.0 and SANsymphony-V, when running on Windows 2008 64-bit Operation Systems, automatically adjust the cache usage to the optimal value. Please refer to Technical Bulletin 7a/b for more detailed information.

RAM Sizing

Exact calculation of RAM requires very detailed analysis and only satisfies a specific point in time, which is likely to be out of date very quickly. Hence it is always better to oversize and have more RAM than undersize and have to little RAM in a DataCore Storage Server.

Page 12: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 12

4 – DataCore Storage Server Operating System Configuration

General Operating System Notes

Remember that a DataCore Storage Server is not an “ordinary” Windows server. It is a Storage Virtualization Appliance – common rules or company policies do not apply to these machines. Selecting the right Windows operating system edition for SANsymphony-V R8: Up to 32 GB of physical memory: Windows 2008 Server R2 Standard Edition (64-bit) 32 GB – 2 TB physical memory: Windows 2008 Server R2 Enterprise & Datacenter Edition (64-bit) Selecting the right Windows operating system edition for SANmelody 3.0.x and SANsymphony 7.0.x: Up to 32 GB of physical memory: Windows 2008 Server Standard Edition (64-bit) 32 GB – 2 TB physical memory: Windows 2008 Server Enterprise Edition (64-bit)

Note: 32-bit versions of Windows 2008 Server support up to 4 GB (Standard Ed.) or 64 GB (Enterprise Ed.) Microsoft Windows (non-OEM) versions vs. Microsoft OEM Windows versions DataCore recommends using Microsoft Windows non-OEM versions for use with DataCore Storage Servers. OEM versions are not recommended because they often contain modified systems settings, 3rd party drivers, monitoring software, tools and utilities etc. which may interfere with SANsymphony or SANmelody. If you wish to use vendor software (such as hardware monitoring agents), please install necessary drivers only and make sure that DataCore system functions are not affected (perform the Functional Test Plan found on FAQ 1301). Language version The recommended language version for Windows operating systems used for DataCore Storage Servers is English. Basically every language is supported. However, DataCore software products are primarily developed and tested on English platforms. More important is that the English version helps to ensure quick and reliable help in case of malfunction. English is the common language used in Technical support. For support reasons, we strongly recommend using English Windows operating system versions only.

virus scanning application

3rd party software A DataCore Storage Server acts as a Storage Virtualization Appliance. It controls I/O operations between Application Servers/Hosts and disks – nothing else. A DataCore Storage Server should not be used for hosting backup software, network services (DHCP, DNS, etc.), browsing the internet, downloading software, or performing any other functions that are unrelated to DataCore software. For this reason, a DataCore Storage Server does not need to have any 3rd party software installed, except for:

event notification tool/agent UPS communication software/agent

See section "Securing a DataCore Storage Server" in this document for more information. Please refer to FAQ #1335: Best Practices: Using 3rd party Diagnostic Tools http://datacore.custhelp.com/app/answers/detail/a_id/1335/kw/1335

Page 13: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 13

Windows Operating System Settings Naming of a DataCore Storage Server (for SANmelody 3.x and SANsymphony 7.x and earlier only) Refer to FAQ #9 and FAQ #624 for naming conventions before installing DataCore Software and remember FAQ #559 before you try to change the name of a DataCore Storage Server which already has DataCore Software installed. FAQ #9 http://datacore.custhelp.com/app/answers/detail/a_id/9 FAQ #624 http://datacore.custhelp.com/app/answers/detail/a_id/624 FAQ #559 http://datacore.custhelp.com/app/answers/detail/a_id/559 For SANsymphony-V R8 refer to the Online Help system at http://www.datacore.com/SSV-Webhelp/ under the Rules for Naming SAN Components section. Disable automatic updates DataCore qualifies the latest Microsoft Service Pack (SP) upon general release. Only full Microsoft Windows Service Packs are qualified. DataCore Software does not qualify individual Security updates, Pre-Service Packs or Hot-Fixes. See Support FAQ #839 for more details. FAQ # 839 http://datacore.custhelp.com/app/answers/detail/a_id/839 Pagefile size and memory dump settings On a DataCore Storage Server the page file (pagefile.sys) is not used by SANmelody / SANsymphony. However, it needs to be configured of an appropriate size to record debugging information in case of an unexpected stop of the DataCore Storage Server (memory dump). For best practice configure "Complete memory dump" in Windows Startup and Recovery settings and match the size of the page file with the installed RAM in the DataCore Storage Server. If this isn't possible for any reason, make sure that at least "Kernel memory dump" is configured and the pagefile is set to a minimum size of 2 GB. For more information regarding memory dump file options for Windows please see Microsoft KB article #254649 http://support.microsoft.com/kb/254649/EN-US/ Send and collect save User-Mode dumps On Windows 2008 DataCore Storage Servers if the DataCore GUI crashes a “Windows Problem Reports and Solutions window may open prompting you to send information to Microsoft" please do so as DataCore can analyze the User-Mode dumps sent to Microsoft. Also configure Windows 2008 to save these User-Mode dumps on each DataCore Storage Server: Open regedit, create the registry key; HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps Close regedit, no reboot is required. Now if DataCore the UI crashes a user mode dump will be saved on the DataCore Storage Server and can be sent to DataCore Support for analysis. For more information about User-Mode Dumps see http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx Time synchronization The system time does not affect any functionality of SANmelody/SANsymphony. However, for troubleshooting reasons (such as log comparison) it is helpful if the system times on all DataCore Storage Servers in a Partnership/Server Group or Region and all Application Servers/Hosts are synchronized. Use NET TIME \\TIMESRV /SET /YES command to synchronize system times. See Microsoft KB article #120944 "Using NET TIME for all Workstations and Servers" for more details. http://support.microsoft.com/kb/120944 All other settings Leave at default values. Don't try to optimize Windows—other default settings are sufficient.

Page 14: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 14

Network Configuration

Keep DataCore Storage Server away from public network A DataCore Storage Server is a Storage Virtualization Appliance and must not be seen or accessed by any user beside the Storage administrators. DataCore recommends connecting the DataCore Storage Servers to a dedicated management LAN or VLAN, not to the public (user) network. No domain membership, use a dedicated workgroup Do not make a DataCore Storage Server a member of a Windows domain*. There is no reason for a domain membership as it applies policies, restrictions, user rights etc. to the machine which may interfere with SANmelody or SANsymphony. Place all DataCore Storage Servers in a Partnership/Server Group or Region in a dedicated Windows workgroup, such as "DATACORE.” Do not leave them in the default workgroup "WORKGROUP." This forces one of the DataCore Storage Servers to be the workgroup’s Master Browser and ensures quick responses. *Note: SANharmony™ users (applicable only to SANmelody 3.x or SANsymphony 7.x) may consider DataCore Storage Server domain membership to gain required credentials for NAS services. In this case, ensure that domain policies do not conflict with SANmelody 3.x or SANsymphony 7.x functionality such as user rights, enforced update installation, service communication, etc. Use static IP addresses DataCore Storage Servers should use fixed IP addresses. Do not assign IP addresses dynamically by a DHCP server. Name resolution Both SANmelody and SANsymphony communicate with their peers by hostname. For simplicity DataCore does not recommend registering the DataCore Storage Servers on a DNS server but using the HOSTS file and LMHOSTS for static name resolution instead. Enter hostnames and IP addresses of all DataCore Storage Servers within a Partnership/Server Group or Region to the HOSTS file located in: C:\WINDOWS\system32\drivers\etc. SANsymphony-V R8 will also need to have port 3793 open on the firewall. IPv6 must be disabled on the link used for name resolution. Dedicated Network Link It is recommended that DataCore Storage Servers in the same Partnership/Server Group or Region communicate with each other over the LAN and have a dedicated LAN connection that is not used for iSCSI Traffic. See the following diagram as an example of how a network connection can be setup:

Page 15: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 15

Page 16: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 16

Securing a DataCore Storage Server Protect against viruses The local Windows operating system should be protected against virus attacks by an anti-virus application. If AIM/Replication is used, exclude the AIM/Replication source and destination buffers from being scanned. See FAQ #708 – "Keeping DataCore Storage Servers free from Viruses" for more details. http://datacore.custhelp.com/app/answers/detail/a_id/708 Firewalls DataCore Storage Servers should always be located behind at least one firewall. If the enterprise has multiple levels of trusted networks, then the DataCore Storage Server should be placed on that network with the highest level of security. If communication between DataCore Storage Servers need to cross a firewall or if there is a reason to activate firewall software (such as Windows firewall) then certain ports must be opened. Port 3260 is needed for iSCSI traffic and port 3793 for inter-node communication. Protect against power loss If DataCore Storage Servers from the same Partnership/Server Group or Region are located in the same room or data center, they should never be connected to the same power circuit. In case of power loss to one DataCore Storage Server, the partner DataCore Storage Server can take over IO handling for mirrored volumes and switch off write caching of the affected mirrors to prevent data loss or corruption. If two DataCore Storage Servers loose power simultaneously this security mechanism will fail and cache content will be lost.

DataCore Storage Servers should be connected to battery-backed power lines to prevent unexpected power loss.

SANsymphony-V is UPS complaint provided that the Windows operating system is configured properly. SANmelody 3.x and SANsymphony 7.x includes a command line utility to react on UPS events: dcsupsevent.exe. Install UPS communication agent (if available) on DataCore Storage Servers and configure it to execute appropriate dcsupsevent commands on power failure events, see table below.

UPS status notification Command(s) to execute Description

ON BATTERY dcsupsevent -writethru Switches off write caching for all virtual volumes/vdisks on this DataCore Storage Server.

BATTERY RECHARGED dcsupsevent -writeback Switches on write caching again if normal UPS condition is restored.

LOW BATTERY dcsupsevent –stop shutdown /s /c "Powerloss"

Stops DataCore services. Shuts down Windows and creates an Eventlog entry "Powerloss."

See FAQ #705 for more details: http://datacore.custhelp.com/app/answers/detail/a_id/705

Page 17: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 17

Set up event log monitoring alerts SANsymphony-V R8 comes with a System Health monitoring tool which can be configured. Please refer to the SANsymphony-V online help system for more information. http://www.datacore.com/SSV-Webhelp/ SANsymphony 7.x and SANmelody 3.x enter all events (information, warnings, and errors) to the System and/or the Application log of the DataCore Storage Server’s Windows OS. This allows easy integration to any monitoring application. All DataCore events entered in the Application and System logs come from an Event Source with the prefix of Dcs. (see table below) Install the agent of your monitoring application (Microsoft Systems Manager, HP OpenView, What's up Gold, Nagios, EventSentry etc.) on the DataCore Storage Server. The agent should be configured so that it scans the: Application Log System Log

for events with Source "Dcs*" and Type "Error" or "Warning" and reports them to the monitoring management application or to the administrator’s or help desk’s email address. If you do not use any kind of management or event log monitoring software in your company, please refer to DataCore Support FAQ #1174 – "How to send an e-mail alert for DataCore Storage Server events" for further information about how email notification can be achieved alternatively. http://datacore.custhelp.com/app/answers/detail/a_id/1174

DataCore events beginning with Dcs… in SANmelody 3.x

Page 18: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 18

Backup and restore a DataCore Storage Server

General: It is not recommended to install a backup agent or backup software on a DataCore Storage Server and do a full or incremental backup from time to time for following reasons:

1. Backup software may try to get exclusive access to a file during the backup process. If the backup software tries to lock a DataCore system or configuration file it may cause a malfunction or crash.

2. Restoring a DataCore Storage Server from a full backup (on tape or disk) may be a long process: Install the Windows operating system, install service packs and hot fixes, install backup agent, connect to backup server or tape drive, begin restore and wait to complete. There are smarter and faster ways to restore a DataCore Storage Server.

The DataCore Storage Server itself does not hold much configuration information. The majority of the configuration information like pool configuration, volume/vdisk information etc. is stored in the meta data header on the physical disks in the backend. On SANmelody 3.x and SANsymphony 7.x this information is stored in two configuration files: sanmapper.dat and sanmappergen.txt located in C:\Program Files\DataCore Software\SANmelody or SANsymphony\SANmapper on 32bit OS and C:\Program Files (x86)\ DataCore Software\SANmelody or SANsymphony\SANmapper on 64bit OS. On SANsymphony-V this information is stored in the xconfig.xml file located in C:\Program Files\DataCore\SANsymphony. These files are always identical on every DataCore Storage Server within a Partnership/Server Group or Region and are updated on each update of the configuration. Emergency Repair Disk (ERD) SANmelody 3.x and SANsymphony 7.x only SANmelody 3.x and SANsymphony 7.x support creating an ERD. All essential configurations can be stored on a disk (memory stick, DVD or network share). Update the ERD after each configuration change. Please refer to the SANmelody or SANsymphony Help system for more information. Backup Configuration – SANsymphony-V only If a hardware (e.g., disk failure) or software failure should occur, your SANsymphony-V configuration will be simple to recreate. Configuration files can be preserved from the SANsymphony-V Management Console or by using a Windows PowerShell™ script file provided in the SANsymphony file folder. Configuration files are restored by running the Windows PowerShell script file. See the SANsymphony-V Help system for more information. Boot Partition Image It might be helpful to create an image file of the boot partition (C:\) before carrying out major maintenance tasks, such as installing a new Windows Service Pack, etc. Make sure that all DataCore services are stopped before creating an image file. This image is of this specific point in time which conserves the status of all mirrors in the registry and can’t be used at a later point. Restore / use this image only if your immediate attempt to update fails. It is not suitable for use in any other situation or at a later time since actual mirror states will not be preserved. Loss of data may result if this backup is reapplied improperly.

Page 19: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 19

Restoring a DataCore Storage Server The description of the following steps is just informational. Please do not try to restore a DataCore Storage Server in a production environment without consulting or assistance from DataCore Technical Support.

In order to restore a DataCore Storage Server, install the Windows operating system and SANmelody or SANsymphony from scratch. Rejoin the Partnership/Server Group or Region. In HA environments, the configuration files are automatically copied to the freshly installed machine and the environment is restored. Installing the Windows operating system and SANmelody, SANsymphony or SANsymphony-V can be significantly sped up if an image of the C:\ partition is available.

Page 20: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 20

5 – SAN Design Guide

Avoiding Single Points of Failure (SPOF)

Pooling and sharing resources is a good idea in many regards. However, centralizing contains a risk too. If one instance fails, several services might be affected. For this reason, modern IT infrastructures – not only storage environments – provide various levels of fault tolerance and redundancy. In a true High Availability (HA) environment, the outage of any single device may not impede the availability of the whole system, but some constraints may still apply. For example, performance may be temporary degraded in the event of a failure and during the process of recovery. To ensure this any "Single Point of Failure" (SPOF) must be strictly eliminated from the design. High Availability is an end-to-end process; from the user to the data. If only some links in the user-data-chain are highly available, it cannot be considered a true HA environment. In this document we just pay attention to what we can “see” from our storage perspective. However, many more factors should be considered like application availability (such as clustering), network availability, power supply, climate control, and so on. The logical diagram below shows an HA storage environment – everything is doubled. Notice that there are two HBA/NIC cards in the Application Server/Host, connected to two independent fabrics, two DataCore servers each controlling separate storage, all data is mirrored, links between all devices are redundant. For any single failure, there is always a second path to reach the physical storage location. In addition to redundancy, separate components limit the effect of environmental impact. Place components in different locations (racks, rooms, buildings), connect devices to different power circuits, and ensure that Application Servers/Hosts, network infrastructure, and air conditioning units are also redundant. Please see also FAQ #1278 Best Practice: Mapping Multi-Pathed Mirror to Application Servers for the highest availability http://datacore.custhelp.com/app/answers/detail/a_id/1278/kw/1278

Application Server/Host

SAN infrastructure

DataCore Storage Servers

Storages / Data

Fabric Fabric

Page 21: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 21

Fibre Channel Cabling and Port Settings

In order to ensure stable and manageable operations, some basic rules for Fibre Channel SAN setup and cabling should be followed. Fibre Channel SANs are typically highly customized to the particular customer environment. This circumstance makes it challenging to issue generally valid recommendations. Especially in regards to the cabling layout, this will have to be adjusted to satisfy various needs. Fibre Channel frontend ports (to Application Servers/Hosts) First consider the frontend ports of the environment. Pay attention to following points:

Connect all frontend ports to the same fabric or switch (as shown in diagrams below). Set all frontend ports to "SCSI Mode: Target only. Activate the "Disable Target While Stopped" option on all frontend ports.

NOTE: SANsymphony-V will set the SCSI mode and Disabled Target While Stopped setting automatically when the port role of Front-end port only is selected. Some scenarios require a different type of cabling layout, such as for long-distance ISLs or to meet operating system vendor recommendations. Please refer to Technical Bulletins available on the DataCore Technical Support Web site and check with your Application Server/Host operating system vendor for other requirements. By default, all FC ports are set to target & initiator mode. Frontend ports should operate in target only SCSI mode because some operation systems get "confused" if they see a port operation in both modes simultaneously. "Disable Target While Stopped" option switches the port off if the local DataCore Storage Server services are stopped. Some operating systems require this behavior and it speeds up failover if a DataCore Storage Server is shut down. NOTE: SANsymphony-V will set the SCSI mode and Disabled Target While Stopped setting automatically when the port role of Front-end port only is selected.

ISL

ISL

Page 22: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 22

Fibre Channel mirror ports DataCore Storage Server mirror ports can be connected either directly (point-to-point) or through switches. Both solutions have pros and cons and their importance must be decided case by case for your environment. In general, at least two physically separated Fibre Channel mirror channels/ports should be used to avoid single points of failure.

* Note: Prior to SANmelody 3.0 PSP2: If mirror channels/ports traverse switches/fabrics, please make sure that the target – initiator relationships are configured correctly. Set one mirror channel/port per DataCore Storage Server in "Target only" mode, the other channel/port in "Initiator only" mode. Connect the mirror target port to the same switch/fabric as the frontend ports. Connect the initiator port to the opposite switch/fabric (as shown in the diagram above). For SANmelody 3.0 PSP2 and greater and SANsymphony its recommended to have 2 dedicated and independent mirror paths between DataCore Storage Servers in each direction, to leave the FC Mirror Port SCSI mode settings at default values (Target/Initiator mode) and turn on Redundant Mirror Paths. Each mirrored volume/vdisk will then have a total of 4 mirror paths with 2 redundant mirror paths in each direction.

PRO Simplicity (less configuration, no switch involved)

Mirrors do not break if switch goes down (such as

for firmware upgrades)

CON Some HBAs have issues with point-to-point

connections, please check Qualified HBA List Distance limitation of cables

Not feasible with more than two DataCore Storage

Servers (SANsymphony 7.x only)

PRO No issues with point-to-point connections

Allows configurations with more than two DataCore

Storage Servers (SANsymphony 7.x only) Longer distances possible (such as with stretched

fabrics) CON Switch outage (such as for firmware upgrades) may

cause mirror recoveries Mirror channel/port direction (initiator to target

relationship) must be configured exactly as shown in diagram above

Page 23: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 23

Zoning Considerations (Fibre Channel Switch Configuration)

Why zone? Zoning is a fabric-based service in a Fibre Channel SAN that groups host and storage nodes that need to communicate. Zoning allows nodes to communicate with each other only if they are members of the same zone. Nodes can be members of multiple zones. Zoning not only prevents a host from unauthorized access of storage assets, it also stops undesired host-to-host communication and fabric-wide Registered State Change Notification (RSCN) disruptions. RSCNs are issued by the fabric nameserver and notify end devices of events in the fabric, such as a storage node going offline. Zoning isolates these notifications to the nodes that require the update. This is important for non-disruptive IO operations, because RSCNs have the potential to disrupt storage traffic. Zoning approach There are multiple ways to group SAN hosts and storage nodes for a zoning configuration. Hosts rarely need to interact directly with each other and storage ports never initiate SAN traffic by their nature as targets. The recommended grouping method for zoning is "Single Initiator Zoning (SIZ)" as shown in the diagram below. With SIZ, each zone has a single HBA and one or more storage ports. If the HBA has both disk and tape storage devices, then two zones should be created: one zone with the HBA and the disk devices, and a second zone with the HBA and the tape devices (see FAQ #645 ). SIZ is optimal because it prevents any host-to-host interaction and limits RSCNs to the zones that need the information within the RSCN. On Initiators GTPRLO has to be turned off.

Single Initiator Zones

Target

Initiator

Initiator

Initiator Target

SAN switch

Page 24: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 24

Soft or hard zoning There are two types of zoning available in a fabric: soft zoning (WWN or alias based) and hard zoning (port based). Soft zoning is the practice of identifying and grouping end nodes by their World Wide Name (WWN) or their respective alias name. The WWNs or aliases are entered in the fabric nameserver database. The nameserver database is synchronized and distributed across the fabric to each switch. Therefore, no matter what port a HBA is plugged into, it queries the nameserver to inquire the devices in its zone. With hard zoning, nodes are identified and grouped by the switch port number they are connected to. Switch ports in the same zone can communicate with each other regardless of “who” is connected to those ports. The fabric nameserver is not involved in this zoning mechanism. Soft zoning implies a slightly higher administrative effort when setting up the fabric compared to hard zoning. However, it has significant advantages concerning management and solving connectivity issues. Due to the fact that soft zoning is handled by the nameserver, a node can easily be moved to another switch/port within the fabric. Conversely, a node cannot be plugged into a wrong port by accident – it simply does not matter where a node is plugged in, it always sees the correct zone members. Furthermore, soft zoning provides valuable information for solving fabric issues, especially if people who are not familiar with the fabric setup (like DataCore Technical support staff) are involved. For instance, it is easier to understand that alias SQL_SERVER can communicate (is in the same zone) with alias DATACORE_TARGET. Troubleshooting in a hard zoned fabric tends to be more cumbersome and requires the precise (and up-to-date) documentation of the physical connections between hosts and fabric (SAN diagram, cabling scheme and so on). Naming conventions Naming conventions are very important to simplifying zoning configuration management. User-friendly alias names ensure that zone members can be understood at a glance and configuration errors minimized. Good practice for host aliases is to reference the hostname plus the particular HBA (number, slot, WWN). In case of DataCore Storage Servers, it might be also be helpful pointing out the channel/port role (frontend, backend, mirror). Storage arrays which have multiple controllers and/or ports – this is also useful information to be mentioned in the alias name. Following are some examples of alias names: Hostname + HBA port e.g. SAPPROD_HBA0 DataCore Server name + Function + last 4 digits of WWN e.g. SDS01_Frontend_2E5C Storage Array + Controller + Port # e.g. XYR5412_Ctr1_Port0 Zone names should reflect their members. Following the rule that there should be just one initiator member per zone, the initiator alias is unique and is a good name for the zone name too, optionally supplemented by the targets this initiator can access. For example: A zone that connects the DataCore Storage Server’s backend port to the storage array XYR5412 with two FC ports: Zone name: SDS01_Backend-XYR5412 contains the members:

SDS01_Backend_2E5D (initiator) XYR5412_CtrA_Port0 (target) XYR5412_CtrB_Port1 (target)

Page 25: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 25

iSCSI (LAN) Cabling and Port Settings

In general, the same rules for Fibre Channel environments apply to iSCSI environments. Since iSCSI uses IP protocol and Ethernet infrastructure, these additional general rules apply:

iSCSI traffic should be strictly separated from the public (user) network. Use separate hardware (LAN switches) for iSCSI storage traffic or at least different virtual LANs

(VLANs). Use separate NICs for iSCSI and for inter-storage server and management communication. Deactivate every other protocol or service except for "Internet protocol (TCP/IP)" on NICs used for

iSCSI. See help guide for detailed instructions. Port 3260 needs to be open on the firewall to allow iSCSI traffic. On DataCore Storage Servers as well as Application Servers/Hosts disable these settings as they can

cause disconnects and performance problems: o disable AutoTuning on W2K8: netsh interface tcp set global autotuninglevel=disabled o disable RSS: netsh interface tcp set global rss=disabled o disable TOE: netsh int tcp set global chimney = disabled

For all Datacore Server products do not disable any DataCore Software iSCSI Adapters under Server Manager, Diagnostics, Device Manager, or DataCore Fibre-Channel Adapters. If some DataCore Software iSCSI Adapters are disabled, when the DataCore Server is rebooted it can lead to the incorrect channels becoming unavailable after the reboot. This can lead to broken iSCSI connections to DataCore Software iSCSI targets. To prevent unwanted iSCSI use of Datacore Software iSCSI Adapters: Re-name the Channel Alias in SANsymphony and SANmelody and Server Ports in SANsymphony-V so it is clear that these DataCore Software iSCSI Adapters are not to be used for iSCSI.

o Other possibilities are as follows: in SANsymphony and SANmelody, turn on Challenge-Handshake Authentication Protocol (CHAP) on the channel by using iSCSI Manager. Right-click the channel, select Properties, Authentication and in Authentication Method select CHAP, then OK. In SANsymphony-V remove the Front End (FE) and Mirror (MR) port roles from the Server Port to prevent use or session log in by an iSCSI initiator.

iSCSI frontend ports (to Application Servers/Hosts) First consider the frontend ports of the environment. Pay attention to the following points:

Connect all frontend ports to the same switch or VLAN (as shown in diagrams below). Activate the "Disable Target While Stopped" option on all frontend ports.

NOTE: SANsymphony-V will set the Disabled Target While Stopped setting automatically when the port role of Front-end port only is selected.

Page 26: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 26

Some scenarios require a different type of cabling / iSCSI connection layout, such as with Wide Area Network (WAN) links or special operating system vendor recommendations. Please refer to Technical Bulletins available on the DataCore Technical Support Web site and check with your Application Server/Host operating system vendor for other requirements. By default, all iSCSI ports are set to Target mode. Do not install Microsoft iSCSI Initiator software unless you intend to use iSCSI mirror links.

Page 27: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 27

iSCSI mirror ports DataCore Storage Server mirror ports can be connected either directly (crossover cable) or through switches. Both solutions have pros and cons and their importance must be decided case by case for your environment. In general, at least two physically separated iSCSI mirror channels/ports should be used to avoid single points of failure. In order to use iSCSI for mirroring, Microsoft iSCSI Initiator software must be installed on the DataCore Storage Servers but do not install Microsoft MPIO option! Do not configure NIC teaming or bonding when using the MS iSCSI initiator as outlined in their release notes. Redundant mirror paths are not eligible on ISCSI mirror paths. See FAQ #1002 Best Practice: Do Not Check "Enable multi-path" when installing Microsoft's iSCSI Initiator Software. http://datacore.custhelp.com/app/answers/detail/a_id/1002/ Please refer to DataCore Help System for more information. Note: There is one dedicated iSCSI mirror path between DataCore Storage Servers in each direction.

PRO Simplicity (less configuration, no switch involved)

Mirrors do not break if switch goes down (such as

for firmware upgrades)

CON Distance limitation of cables

Not feasible with more than two storage servers

(SANsymphony only)

PRO Allows configurations with more than two storage

servers (SANsymphony only) Longer distances possible (such as for WAN links) CON Switch outage (such as for firmware upgrades) may

cause mirror recoveries

192.168.0.2

192.168.1.3 192.168.1.1

192.168.0.4

192.168.1.1

192.168.0.2

192.168.1.3 192.168.0.4

Page 28: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 28

6 – Pool performance aspects

Disk performance is one the most important characteristics in a storage environment. For designing and implementing appropriate solutions it is crucial to understand the difference between a “classic” storage array and a DataCore thin provisioned pool (disk storage pool). In a classic environment, Application Servers/Hosts have exclusive access to a disk or RAID set. The RAID set handles the IO traffic of this particular server or application and should be optimized accordingly. Within a DataCore thin provisioned pool, physical disks or RAID sets are typically shared among multiple Application Servers/Hosts, the IO pattern those disks experience may look very different.

Understanding DataCore thin provisioning Technology

In a DataCore thin provisioned pool, physical disks are put together into a group (pool of storage disks) from which volumes are created. From these volumes, several virtual volumes/vdisks are created and mapped/served to multiple Application Server/Hosts. The DataCore thin provisioned pool provisions storage allocation units on the physical disks as required. However, the data content of the virtual volumes/vdisks is typically equally distributed across all physical disks within a pool. Several factors make a significant difference in performance characteristics, such as: Disk type (Fibre Channel, SAS, SATA) Amount of disks Rotation speed RAID level LUN configuration

AppServer/Host

Disk Pool

Virtual Volumes/vdisk

Page 29: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 29

Disk Types

Different disk technologies have significant performance differences. Disk performance is primarily represented by three key factors: Average Seek Time The time the read/write head needs to physically move to the correct place. IOs per Second The amount of operations a disk can handle per second. MBs per Second The amount of data a disk can transfer per second.

Typical average disk performance values

Disk type Avg. Seektimes Avg. IOs per Second Avg. MB per Second Application

Fibre Channel 3.5 ms (15K RPM); 3.8 ms (10K RPM)

180 IOPS (15K RPM) 150 IOPS (10K RPM)

20 MB (15K RPM) 15 MB (10K RPM)

Performance-intensive, high availability, random access

SAS 2.5" 3.2 ms (15K RPM); 3.5 ms (10K RPM)

200 IOPS (15K RPM) 160 IOPS (10K RPM)

23 MB (15K RPM) 16 MB (10K RPM)

Performance-intensive, high availability, random access

SATA 9.5 ms (7.2K RPM) 12 ms (5.4K RPM)

75 IOPS (7.2K RPM) 60 IOPS (5.4K RPM)

Avg. MB/s varies see paragraph below Capacity-intensive, sequential access

Compared to drive manufacturer's data sheets these values seem to be low. Published performance values by drive and storage array vendors are often misleading and generally represent the best case scenario. In an environment where physical disks are shared among several Application Servers/hosts the access pattern is typically highly random which causes a lot of repositioning of the actuators (disk’s read/write head assemblies). For this reason the average “real world” performance is often much lower than the benchmark maximum measured in the lab. Fibre Channel and SAS disks have comparable technologies and performance characteristics. For instance, 2.5" SAS disks have slightly better performance typically due to the fact the average seek time is shorter on a smaller platter. FC and SAS disks can respond quickly to random IO requests and are the first choice for performance-intensive and highly random IO patterns like database applications and email server applications. SATA disks have a different technology inside the box compared to FC and SAS disks. They have a less expensive mechanical design that results in slower rotation speeds and higher seek times for positioning the read/write heads. SATA disks can perform very well (up to 90% of FC/SAS disks performance) if they are accessed by sequential read/write traffic and large IO sizes. If SATA disks need to respond to a large number of small, random IO requests they may deliver poor response times. SATA drives are the first choice for capacity-hungry application with low performance requirements or mainly sequential IO like archive systems, media streaming or backup to disk. Typical Disk Type Usage (RAID levels discussed later).

Disk Type RAID Level Storage Tier Typical Applications

FC & SAS RAID 1 / 10 Tier 1 Heavily used database, email, ERP systems etc.

FC & SAS RAID 5 Tier 2 File Service, lower loaded database applications

SATA RAID 5 / 6 Tier 3 Archive, Media Storage (x-ray, video), Backup to disk

Page 30: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 30

Amount of Disks

An old rule of thumb is that the more spindles a disk array contains the more performance it delivers. This rule still applies. A RAID controller or storage controller in a storage array can distribute the incoming IO requests to all disks in a RAID set. The disks are acting independently from each other so the performance of the RAID set is the sum of all disks grouped in a RAID set. Following are examples showing two RAID 5 sets of the same capacity. One set is built with a few large relatively slow SATA disks and one set is built with many small relatively fast SAS disks. The calculated performance of the RAID set with the small SAS disks is approximately 15 times higher compared to the set with large SATA disks. When designing the appropriate disk layout of a storage solution, the amount of disks being used is a significant factor. Not just the requested capacity is essential, but the overall performance requirements define the number of disks to satisfy performance needs.

RAID Layout

Physical disks connected to a RAID controller can be grouped in RAID sets and carved into LUNs presented to the host in several ways. Understanding the relationship between physical disk/LUN grouping and the DataCore pool technology is crucial to get best performance results. Spindle counts in a RAID set As shown above, the more disks a RAID set contains, the higher the performance typically. In order to get good performance out of a RAID set a certain number of spindles is necessary. On the other hand, as the number of physical spindles increases, the rebuild time to recover from a disk failure also increases. Of course, this is highly dependent on the chosen RAID level and the disk type. A balance must be found between performance and rebuild time and it is difficult to make general recommendations. Total LUN counts Each LUN is seen by the operating system as one disk and has one IO queue. The IO queue is the “pipe” that transports IOs between host and disk. The more IO queues (disks) are seen by the host, the more IOs can be transported in parallel. This is another reason for having more, smaller RAID sets instead of one large one. Quantity of partitions carved from a RAID set Limit the number of LUNs carved from a RAID set to one (or to as few as possible). The DataCore pool algorithm distributes data and IOs across all disks within a pool. If those “disks” are in reality LUNs residing on the same RAID set, this may lead to disk thrashing as the actuators are incessantly repositioned from one partition to another on the physical disks. This is especially the case with disks having high seek times (time to position the head) and may result in poor performance. See also FAQ #1376: Best Practice: Optimal stripe size of a Storage Array attached to a DataCore Storage Server? http://datacore.custhelp.com/app/answers/detail/a_id/1376

One SATA Disk: 75 IOPs RAID: 5 Total capacity: 2 TB Total IOPS: 225

3 x 750 GB SATA drives

One SAS Disk: 200 IOPs RAID: 5 Total capacity: 2 TB Total IOPS: 3000

15 x 146 GB SAS drives

Page 31: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 31

The following three examples demonstrate the relationship between number of disks, RAID set, LUNs and the DataCore pool technology. Example 1 – 15 Disks in 1 RAID Set, 1 LUN exported to the pool:

Example 2 -- 15 Disks in 1 RAID Set, 3 LUNs created from RAID set and exported to the pool:

Pro: Good performance due to striping across 15 physical

spindles small storage loss for RAID overhead (depending on

chosen RAID level) Con: DataCore has just a single I/O queue to the disks which

may result in congestion If one physical disk fails, the whole LUN is affected by

RAID rebuild RAID rebuild may take a long time on large RAID sets

and degrade performance Cons outweigh Pros. Configuration is OK, but there might be a better solution.

Pro: small storage loss for RAID overhead (depending on

chosen RAID level) Con: If one physical disk fails, all three LUNs are affected by

RAID rebuild RAID rebuild may take a long time on large RAID sets

and degrades performance The DataCore pool concept of distributing allocated

blocks conflicts with the RAID layout – creates additional seek and rotational latency on the physical disks, thus degrading performance

Avoid this type of configuration due to too many disadvantages and probability of poor performance.

Page 32: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 32

Example 3 – 15 Disks in 3 RAID Sets, 1 LUN for each RAID Set exported to the pool:

Example 4 - No RAID level – “just a bunch of disks” (JBOD)

If JBODs are used in a pool, a physical disk failure most likely will impact every volume in the pool. DataCore Storage Server mirroring of all volumes is mandatory to prevent outage and data loss. Depending on the amount of data that must be recovered from the mirror partner, long recovery times may result.

Pro: DataCore has three I/O queues to the disks. The distribution algorithm spreads out the load across

LUNs and increase performance A failed physical disk affects just one LUN RAID rebuild is quicker with fewer disks. Con: Greater storage loss for RAID overhead (depending on

RAID level) Pros outweigh Cons. Good balance between performance/availability and costs. Recommended configuration in this type of situation.

Pro Full use of capacity Many IO queues to disks Pool spreads loads across spindles Con One failed disk will most likely affect all volumes Disk failure may cause long recovery time Bad blocks on disks are not recognized

Page 33: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 33

RAID 0 – Striping

RAID 0 sets have very high performance typically and the highest risk potential. DataCore Storage Server mirroring of all volumes is mandatory to prevent outage and data loss. A single failed disk destroys the whole affected LUN/pool and causes all data to be recovered from the mirror partner side. RAID 1 – Mirroring

Pools containing multiple RAID 1 sets typically have the highest security level and high performance. The DataCore pool algorithm distributes the load across all LUNs. Disk failures affect just one LUN and usually recover very quickly. Pools with multiple RAID 1 sets are recommended for non-mirrored volumes and applications which cause lots of small random IOs (like database and email applications). Pools which contain numerous volumes accessed by many Application Servers/Hosts may experience highly random IO too.

Pro Full use of capacity Highest write performance Highest read performance Con Highest risk potential DataCore has just a single I/O queue to the disks which

may result in congestion One failed disk destroys all data in the pool Disk failure may cause long recovery time

Pro High security level High sequential read/write performance High random read/write performance DataCore spreads load across all RAID1 sets Failed disk doesn’t significantly affect performance Quick recovery from disk failure Con 50% capacity loss

Page 34: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 34

RAID 10 (respectively RAID 01) – Striping across Mirrors

RAID 10 sets do not have significant advantages compared to multiple RAID1 sets in a pool. Just in some specific types of access pattern (e.g. heavy sequential reads) the Application Server/host may benefit from the underlying block-oriented striping of the RAID set.

Generally, multiple RAID 1 sets provide the same result plus more advantages (such as more IO queues) and are usually preferred over RAID 10 configurations. RAID 5 – Striping with Parity

RAID 5 sets perform very well with highly sequential access patterns and random reads. Due to the nature of recalculating and updating parity information, heavy random writes may suffer low performance. Disk failures cause significant performance decrease during rebuild. RAID 5 sets are good for applications with mainly sequential IO or highly random reads like file server, average loaded databases and so on. Creating multiple, smaller RAID5 sets are preferred over fewer large ones.

Pro High security level High sequential read/write performance Highest random read/write performance Con 50% capacity loss Less IO queues compared to multiple RAID1

Pro Moderate capacity loss High sequential read/write performance High random read performance Con Low random write performance Moderate security level Failed disk / rebuild impacts performance

Page 35: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 35

General Storage/Disk Pool notes

Same type of disk, speed, size per pool Physical disks or RAID sets used in one pool should be equal in regard to their capacity, disk technology and performance characteristics. The DataCore pool algorithm distributes the allocated data blocks equally across all disks in a pool – so it makes little sense having slower and faster or larger and smaller “spots” (disks) within one pool. If a pool capacity is intended to be expanded and the original disks/type/capacity etc. are no longer available, it might be worthwhile creating a new pool and migrating virtual volumes/vdisks (replace the volumes). Different pools for different purposes Match pool performance and capacity characteristics with application needs. For example, a pool containing a high count of spindles of smaller capacity does have a significant higher IO performance typically compared to one with fewer disks of high capacity. If performance/capacity requirements vary between applications (as is typically the case) create multiple pools of different characteristics. Virtual volumes/vdisks created out of a pool can later on be migrated to another pool if requirements change over time. Following table shows examples for different pool characteristics*:

Pool Disk / RAID Tier Level Purpose, Applications

"GOLD" FC Disks / RAID 1 Tier 1 (High Performance) ERP systems, loaded database application, email systems with lots of users

"SILVER" SAS Disks / RAID 5 Tier 2 (Economy Storage) files & print services, test and development systems, less loaded databases

"BRONZE" SATA Disks / RAID 5 Tier 3 (High Capacity) media storage, archive application, backup-to-disk

*Note: These are examples. It does not necessarily mean that a pool of RAID 5 sets is not suitable for email applications or a pool with SATA disk is not capable serving a file server. The best RAID layout depends only on the effective performance requirements of the particular environment.

Pools for Snapshot destination volumes Pools dedicated to the creation of Snapshot destination volumes solely may have special requirements regarding the storage allocation size. Please see section "Snapshot Best Practices" of this document for more information.

Page 36: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 36

Long I/O Metrics (For SANmelody 3.x or SANsymphony 7.x only)

DataCore recommends that you are proactive and setup each DataCore Storage Server for “Long I/O metrics” at installation time or afterwards even when there may be no performance problems. Long I/O metrics monitors for high I/O latency above certain thresholds across all storage and mirror disks and when triggered will post symonlog # 2031 event messages and start a MS Performance Monitor counter log which collects DataCore Performance Objects on that DataCore Storage Server. To configure follow the instructions in the topic “Long I/O metrics” in the DataCore Storage Server help guide. DataCore recommends that you then periodically monitor for symonlog # 2031 messages in the system event logs on windows 2003 and windows 2008 (prior to R2) OSs and in Server Manager | Diagnostics | Event Viewer |Applications and Services Logs/Microsoft/Windows/Diagnosis-PLA/Operational on a windows 2008 R2 OS. Also periodically check if the MS Performance Monitor counter log has been started. If there are messages and the log has started, after about 1 hour of the log starting, manually stop the counter log and send DataCore Support the resultant counter log in *.blg format and new support bundles in a Severity 3 incident.

Page 37: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 37

7 – Synchronous Mirror Best Practices

Match Size and Performance Characteristics

The two volumes (primary/preferred and secondary/alternate) of a mirror virtual volume/vdisk should be of the same size. Primary and secondary volumes of a mirror should have the same performance characteristics. In a synchronous mirror relationship, the slowest member determines the resulting performance. In DataCore environments where both volumes have independent cache instances, disk performance differences can be compensated to a certain degree by caching IOs. However, in some scenarios the true disk performance comes into effect, for instance after a disk failure occurs. In this case, caching is switched off to avoid any loss of Data kept in cache and IO requests are rerouted to the secondary volume directly (see diagram below). If the secondary volume has significant lower performance characteristics as the primary volume, the Application Server/Host will experience notably higher response times from its disk. The secondary disk will not only carry the IO load from the Application Server/Host solely, in addition it will serve the resynchronization traffic when the primary disk comes up again. For this reason, both sides of a mirror should be of the same “quality.” On the other hand, there is nothing wrong with using disks of different performance – as long as the possible results are acceptable in the particular environment.

Cache

Application Server/Host

Preferred Disk SAS 15K RPM

Alternate Disk SATA 7200 RPM

When one disk goes down, write caching is disabled for the affected virtual disk and a log is started. The DataCore Server then sends the IO to the partner, who also operates in write-thru mode and waits for a return. The write IO is acknowledged to the Application Server/Host only after the data is written to the physical disk.

OK

LOG

write IO

Cache

DataCore Server DataCore Server

Page 38: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 38

Synchronous Mirroring over Long Distances

There is no set rule concerning the distance between the primary/preferred and secondary/alternate volumes of a synchronous mirror. The limit will be determined by the acceptable latency of the application using the mirrored volume/vdisk. Read IOs are always processed locally (primary volume) while Write IOs need to be transmitted to the remote DataCore Storage Server (secondary volume). Writes IOs to a stretched mirror experiences a longer latency, the time an IO needs to travel to the remote side plus the same amount of time the acknowledge needs for returning.

In environments where a direct connection (dark fiber) between sites is used, latency is normally no problem, for example:

Dark fiber links can be stretched up to 10 km (with 1300 nm laser) or 35 km (with 1550 nm laser). A dark fiber link of 35 km adds a latency of around 5 micro seconds per km.

o A microsecond (µs) is equal to one millionth of a second or one thousandth of a millisecond (ms).

Typical SCSI transactions require a transaction to transverse the link 8 times or four roundtrips.

This means a dark fiber link of 35 km adds a latency of 5 µs * 35km * 8(trips)= 1400 micro seconds(µs) = 1.4 milliseconds (ms) which is negligible for most applications, but it could affect time sensitive transactional Application Servers/Hosts such as databases which can send a lot of small I/O per second. However there are other considerations that can add to the reliability and latency across a link which needs to be taken into account: Degradation of the signal along the cable; this tends to increase the longer the link and brings down

reliability without extra hardware to correct the degradation. If the link (HBAs and FC switches, link hardware) has not enough FC Buffer Credit, latency can

increase. The faster FC speed you want to use (1Gbps, 2Gbps, 4Gbps, 8Gbps etc..) and the longer the link the more FC Buffer Credits you will need to fully utilize the link.

DataCore recommends that you talk to your HBA, FC switch, link hardware and cable provider if you have concerns about any of the above bullet points.

Page 39: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 39

Long distance iSCSI connections, inter-site links which use routers for tunneling Fibre Channel through IP (FCIP, iFCP, FCoE) or other WAN infrastructures may significantly increase latency. In those cases, the real link latency must be correctly measured and the impact considered in regards to the response time of the particular application.

In general, link distances between synchronous mirror members should neither exceed 35 km nor traverse WAN connections. If this is a demand in your environment, please contact DataCore Technical Support prior to setup.

Page 40: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 40

8 – Snapshot Best Practices

Snapshot Destinations Pool Considerations

Use dynamic pools for Snapshot destination volumes (SANsymphony-V only uses dynamic pools) Snapshot source and destination volumes must be of the same size. A snapshot destination normally contains much less data than the source volume (just the changed blocks). For best efficiency, we recommend using volumes out of a dynamic pool for snapshot destination volumes. This ensures that just the amount of disk space is allocated for the volume when required. Use small allocation unit size for Snapshot destination Pools With snapshots, the first change occurring on a source volume block causes the migration of a corresponding chunk of the original source volume’s blocks to the destination. For this reason, given heavy utilization of snapshot relationships with many small write IOs (typical for email applications) it is advisable to create the snapshot pool with a small allocation unit size. The small allocation unit size will result in better capacity utilization of the snapshot pool. The unit size will depend on the size of the write IOs to the source and the minimum unit size can be 1MB.

Snapshot Performance Considerations

Snapshot destination virtual volumes/vdisks on fast disks Snapshot copy-on-first-write process: For every incoming write IO to an unmodified chunk, the original chunk needs to be relocated to the snapshot destination volume before the Application Server/Host receives the write acknowledgement. The faster the disk behind the destination volume, the quicker this can be accomplished. If the disks behind the snapshot destination volumes are significant slower in performance than the source volumes, this may impact the overall performance of the production virtual volume (source). Use secondary/alternate volume of mirror virtual volume/vdisk for snapshot source Snapshots can be created from either the primary/preferred or the secondary/alternate volume of a mirror virtual volume/vdisk. The secondary volume of a mirror has a lower workload than the primary volume, since in normal operation the secondary volume is only receiving write requests whereas the primary volume receives both write and read IO requests. The secondary volume of a mirror is therefore more suitable to carry the additional load (copy-on-first-write) of an active snapshot relationship. Limit number of snapshots per virtual volume Every active snapshot relationship adds a certain amount of additional load to the source disk. Even if SANmelody or SANsymphony have no set limits on snapshots per source volume, a high number of active snapshots should be avoided. While a couple of snapshots may not noticeably influence performance, numerous snapshots can slow down the source significantly. In addition, there is no valid reason for having many snapshots per source disk. Snapshot technology is sometimes confused as a replacement for backup or continuous data protection. In this case, the usage of other solutions may be considered, such as DataCore Traveller CPR or Continuous Data Protection (CDP).

Page 41: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 41

Quiesce Application Server/Host and flush cache before snap A snapshot is a virtual image of a given point in time. In order to ensure valid data in the snapshot image it is crucial that the snapshot source volume is in an idle and consistent state in the moment it gets enabled. Following five steps must be observed (in order):

1. Quiesce application / stop IO to disk - ensures and that no more write changes occur during the activation (enable) of the snapshot.

2. Flush application cache and/or Operation System cache - ensures that all data is written to the disk and no data remains in cache instances.

3. Enable/Create snapshot relationship - the snapshot source disk is in a consistent state. 4. Resume normal operation of application - after the snapshot is enabled normal operation of

application can be resumed. 5. Serve or mount snapshot volume to another server (optional) - snapshot volume can now be

mapped to an Application Server/Host. NOTE: If the snapshot volume is already mapped, it must be unmounted and remounted again to force the file system to import the changed state. Some operating systems need to be rebooted in order to see the changes.

The above mentioned steps are usually automated and can be achieved by some simple scripts. For more information how to use the DataCore Scripting CLI, issuing remote commands via DataCore Remote Command Service, using Volume Shadow Copy Services (VSS) and so forth, see SANmelody or SANsymphony Help system and DataCore training course manuals.

Page 42: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 42

9 – AIM/Replication Best Practices

Relocate page files to non-replicated Virtual volumes/vdisks

Pagefiles (like Windows pagefile.sys or VM swap files) are being continuously altered from the OS. If a pagefile resides on a virtual disk that is replicated; all these changes are transmitted too and may cause high replication traffic. The pagefile content is, however, irrelevant on the AIM/Replication destination side. Especially in larger deployments it is recommended to relocate page and swap files to virtual volumes/vdisks which are not being replicated to avoid unnecessary traffic.

Link Throughput

The “pipe” which connects the source with the remote site must be large enough to transport all the changes (write IOs) which happen to the source volume. In other words, if 10 GB of data changed during the day, 10 GB must be replicated to the remote site and the inter-site link must be capable of transporting the data. Calculation values: A dedicated link of 1 Mb/s can carry about 300 MB/hr, or respectively 7.2 GB/day.

Buffer Location and Size

The disks where source and destination buffers reside should have RAID protection to survive disk failures. Also, the buffers should be on dedicated, well performing RAID sets locally attached to the DataCore Storage Server (not virtual disks) and should not be shared with other LUNs – especially not with the DataCore Storage Server OS partition or LUNs to be used for volumes or pools. Note: AIM/Replication buffer and OS should not come from same RAID set! Use of the same SCSI controller for the buffer and OS can also negatively impact performance. Source buffer The source buffer stores all IOs which are not yet transmitted to the remote site for all replicated volumes. The size of the source buffer should be considered after determining the maximum allowable IP link downtime. Replication is asynchronous in that sense that the destination virtual disk can be out of synch and can contain older data than the source at any point in time and not in any other sense. Application server/host I/O to source virtual disks can be degraded if the source buffer has relatively high latency. It is best practice to have the source buffer on very fast storage with low latency. Size the buffer after considering the possibility of IP link downtime between the source and destination servers. The appropriate size of a buffer is determined by multiplying the amount of data that is expected to be transferred daily by the maximum allowable IP link downtime. For example, your IP link goes down over a weekend. If the amount of data changes is 20 GB/day and the IP link downtime could go uncorrected for two days, create a buffer that is at least 40 GB. It is better to up-size the buffer to allow for unforeseen increases in data transfers or miscalculations. If your buffer is 100 GB, then changes for several days can be safely stored. A general rule of thumb applies: Use a fast local RAID 1 of 100 GB for the buffer and expand if needed. Destination buffer (not needed with SANsymphony-V) The destination buffer stores the IOs which are not yet destaged to the destination disk. Generally, the destination buffer can be much smaller than the source buffer due to the fact that the destination disk is not supposed to be offline for a longer time. However, the destination buffer should also be a local attached, dedicated, fast RAID set. It is best practice to set all destination volumes to 'write-through'. See FAQ #1016: Best Practice: AIM Destination Virtual volumes Exclude AIM/Replication buffers from anti-virus applications If the DataCore Storage Server has an anti-virus application installed, exclude the source or destination buffer directory from being scanned. Replication files contain SCSI commands and raw data – nothing a virus scanner would detect. Scanning these files will slow down replication processing and transmission unnecessarily. See FAQ #1063: Best Practice: AIM/Replication and Virus Scanning Software

Page 43: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 43

Backup Applications use timestamps

Backup solutions which backup files and reset the 'Archive Bit' cause a lot of data changes due to the fact that they touch every file being backed up. A full backup for instance may cause a lot of data changes and accordingly a high value of data need to be replicated to the remote site. (Note: This behavior is not specific to DataCore; it applies to every asynchronous replication solution.) In order to avoid a high data change rate due to archive bit based backups, timestamp based backups should be used instead. Backups relying on timestamps typically do not touch and change the files backed up. Today almost all major backup applications are capable of doing timestamp based backups so this issue can be easily eliminated.

Use Snapshots to Access AIM/Replication Destination Volumes

During normal AIM/Replication operation, the destination volume is continuously accessed and updated by the source service. For this reason, it cannot be accessed by an application server/host without stopping and breaking the AIM/Replication set. If the AIM/Replication destination volume is intended to be accessed, such as for testing or backup purposes, a snapshot should be taken and mapped to the Application Server/Host. To ensure a consistent state in the snapshot, some rules apply as discussed in the Snapshot chapter of this document. SANmelody and SANsymphony are capable of issuing remote snapshot commands. For more details, see the SANmelody, SANsymphony or SANsymphony-V Help system and DataCore training course manuals. Be aware of FAQ #808: Best Practice: Using Snapshot Source and Destination Volumes with Dynamic Disks on the same Application Server/Host http://datacore.custhelp.com/app/answers/detail/a_id/808

Page 44: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 44

10 – Continuous Data Protection (CDP)

Data protection requires adequate resources (memory, CPU, disk capacity) and should not be enabled on DataCore Servers with limited free resources. Use dedicated pools for data-protected virtual disks. Disk pools used for data-protected virtual disks and history logs should have sufficient free space at all times. Disk pool thresholds and email notification via tasks should be configured for notification when disk pool free space reaches the attention threshold to ensure sufficient free space. Enabling data protection for a virtual disk may decrease I/O performance and should be used with caution to protect mission critical data only. The default history log size (5% of the virtual disk size with a minimum size of 8 GB) may not be adequate for all virtual disks. The history log size should be set according to I/O load and retention time requirements. Once set, the retention period can be monitored and the history log size can be increased if necessary. The current actual retention period for the history log is provided in the Virtual Disk Details>Info Tab (see Retention period).

When copying large amounts of data at one time to newly created virtual disks, enable data protection after copying the data to avoid a significant I/O load.

After an event that requires restoration of data, I/O to the affected virtual disk should be immediately suspended and then rollbacks should be created. In this manner, older data changes will stop being destaged and rollbacks will not expire. Keep I/O suspended until virtual disk recovery is complete. Rollbacks should only be created for the purpose of finding a consistent condition prior to a disastrous event and restoring the virtual disk data using the best rollback. Delete rollbacks if they are no longer needed.

Page 45: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 45

11 – Application Server/Host Best Practices

Special instructions apply to some Application Server/Host operating systems. Please refer to the DataCore Technical Support Web site Technical Bulletins: http://datacore.custhelp.com/app/answers/detail/a_id/578 Some Application Server/Hosts have special virtual volume name requirements please refer to the DataCore Technical Support Web site Technical Bulletins as well as FAQ #836: (applicable only with SANmelody 3.x and SANsymphony 7.x and earlier) http://datacore.custhelp.com/app/answers/detail/a_id/836 On Windows Application Server/Hosts you cannot use any other fail over software if you use DataCore MPIO as both use parts of Microsoft MPIO. You might not be able to install DataCore MPIO if there is another third party fail over Software installed or volumes coming from SANmelody or SANsymphony will not be detected by DataCore MPIO on this Application Server/Host. Please refer the DataCore MPIO Release Notes and the Help guide at FAQ #1380 DataCore Manuals and Administration Guides: http://datacore.custhelp.com/app/answers/detail/a_id/1380/kw/1380 If you use Exchange or Database applications refer to FAQ #1248 http://datacore.custhelp.com/app/answers/detail/a_id/1248

Page 46: Datacore - Best Practices

April 08, 2011 DataCore Best Practices Guide Page 46

This document is published by:

DataCore Software Corporation Worldwide Headquarters Corporate Park 6300 NW 5th Way Ft. Lauderdale, FL 33309 United States of America Telephone: +1 (954) 377-6000 Fax: +1 (954) 938-7953 Internet: http://www.datacore.com Emails: [email protected]

Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, software feature, or service offered or to be offered by DataCore Software. DataCore Software reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a DataCore Software sales office for information on feature and product availability.