19
IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010 Performance metrics and reference architecture for 30,000 Exchange users on IBM Flex System

IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Embed Size (px)

DESCRIPTION

Learn about IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010. IBM builds, tests, and publishes reference configurations and performance metrics to provide clients with guidelines for sizing their Microsoft Exchange Server 2010 environments. This document highlights the IBM Flex System x240 Compute Node and IBM Flex System V7000 Storage Node and how they can be used as the foundation for your Exchange 2010 infrastructure. For more information on Pure Systems, visit http://ibm.co/J7Zb1v. Visit http://on.fb.me/LT4gdu to 'Like' the official Facebook page of IBM India Smarter Computing.

Citation preview

Page 1: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010 Performance metrics and reference architecture for

30,000 Exchange users on IBM Flex System

Page 2: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 2

© Copyright IBM Corporation 2013

Contents

2 Overview

2 Introduction

5 Test Configuration

8 Solution Validation Methodology

11 Validation Results

13 Reference Architecture

Overview IBM® builds, tests, and publishes reference configurations and

performance metrics to provide clients with guidelines for sizing

their Microsoft ® Exchange Server 2010 environments. This

document highlights the IBM Flex System™ x240 Compute

Node and IBM Flex System V7000 Storage Node and how they

can be used as the foundation for your Exchange 2010

infrastructure.

To demonstrate the performance of the x240 Compute Node and

Flex V7000 Storage Node, 9,000 mail users are hosted on the

x240 with the mailbox databases residing on the Flex V7000.

Multiple tests are run to validate both the storage and server

running at that workload. The performance metrics are then used

to design a highly available reference architecture for a fictitious

organization with 30,000 employees.

IBM has tested and is releasing this configuration, which is built

using these key components:

IBM Flex System V7000 Storage Node

IBM Flex System x240 Compute Node

Microsoft Windows® Server 2008 R2 SP1

Microsoft Exchange Server 2010 SP1

Introduction This document provides performance characteristics and

reference architecture for an Exchange 2010 mail environment

hosted on an IBM Flex System x240 Compute Node and IBM

Flex System V7000 Storage Node.

Microsoft Exchange Server 2010

Exchange Server 2010 software gives you the Flexibility to

tailor your deployment to your unique needs and provides a

simplified way to help keep e-mail continuously available for

your users.

This Flexibility and simplified availability comes from

innovations in the core platform on which Exchange is built.

These innovations deliver numerous advances in performance,

scalability, and improved reliability, while lowering the total

cost of ownership when compared to an Exchange Server 2007

environment.

A new, unified approach to high availability and disaster

recovery helps achieve improved levels of reliability by reducing

the complexity and cost of delivering business continuity. With

new features, such as Database Availability Groups and online

mailbox moves, you can more easily and confidently implement

mailbox resiliency with database-level replication and failover,

all with familiar Exchange management tools.

Administrative advances in Exchange Server 2010 can help you

save time and lower operational costs by reducing the burden on

your IT staff. A new role-based security model, self-service

capabilities, and the Web-based Exchange Control Panel allow

you to delegate common or specialized tasks to your users

without giving them full administrative rights or increasing help-

desk call volume.

For more information

For more information about Microsoft Exchange Server 2010,

visit the following URL:

microsoft.com/exchange/2010/en/us/default.aspx

IBM PureFlex System

To meet today’s complex and ever-changing business demands,

you need a solid foundation of server, storage, networking, and

software resources. Furthermore, it needs to be simple to deploy,

and able to quickly and automatically adapt to changing

conditions. You also need access to, and the ability to take

advantage of, broad expertise and proven guidelines in systems

management, applications, hardware maintenance and more.

IBM PureFlex System is a comprehensive infrastructure system

that provides an expert integrated computing system. It

combines servers, enterprise storage, networking, virtualization,

and management into a single structure. Its built-in expertise

enables organizations to manage and deploy integrated patterns

of virtual and hardware resources through unified management.

These systems are ideally suited for customers who want a

system that delivers the simplicity of an integrated solution

while still able to tune middleware and the runtime environment.

IBM PureFlex System uses workload placement based on virtual

machine compatibility and resource availability. Using built-in

virtualization across servers, storage, and networking, the

infrastructure system enables automated scaling of resources and

true workload mobility.

IBM PureFlex System has undergone significant testing and

validation so that it can mitigate IT complexity without

compromising the Flexibility to tune systems to the tasks

Page 3: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 3

© Copyright IBM Corporation 2013

businesses demand. By providing both Flexibility and

simplicity, IBM PureFlex System can provide extraordinary

levels of IT control, efficiency, and operating agility. This

combination enables businesses to rapidly deploy IT services at

a reduced cost. Moreover, the system is built on decades of

expertise. This expertise enables deep integration and central

management of the comprehensive, open-choice infrastructure

system. It also dramatically cuts down on the skills and training

required for managing and deploying the system. The

streamlined management console makes it easy to use and

provides a single point of control to manage your physical and

virtual resources for a vastly simplified management experience.

IBM PureFlex System combines advanced IBM hardware and

software along with patterns of expertise. It integrates them into

three optimized configurations that are simple to acquire and

deploy so you get fast time to value.

IBM PureFlex System has the following configurations:

IBM PureFlex System Express, which is designed for small

and medium businesses and is the most affordable entry

point for PureFlex System.

IBM PureFlex System Standard, which is optimized for

application servers with supporting storage and networking,

and is designed to support your key ISV solutions.

IBM PureFlex System Enterprise, which is optimized for

transactional and database systems. It has built-in

redundancy for highly reliable and resilient operation to

support your most critical workloads.

Figure 1: Front and rear view of the IBM PureFlex System Enterprise

Chassis

IBM offers the easy-to-manage PureFlex System with the IBM

Flex System V7000 storage node to tackle the most complex

environments.

IBM Flex System V7000 Storage Node

IBM Flex System V7000 Storage Node is an integrated piece of

the PureFlex System and is designed to be easy to use and

enable rapid deployment. The Flex V7000 storage node supports

extraordinary performance and Flexibility through built-in solid

state drive (SSD) optimization and thin provisioning

technologies. With non-disruptive migration of data from

existing storage, you also get simplified implementation,

minimizing disruption to users. In addition, advanced storage

features like automated tiering, storage virtualization, clustering,

replication and multi-protocol support are designed to help you

improve the efficiency of your storage. As part of your Flex or

PureFlex System, Flex V7000 can become part of your highly

efficient, highly capable, next-generation information

infrastructure.

Highlights

A single user interface to manage and virtualize internal and

third-party storage that can improve storage utilization

Built-in tiering and advanced replication functions are

designed to improve performance and availability without

constant administration

Single user interface simplifies storage administration to

allow your experts to focus on innovation

Page 4: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 4

© Copyright IBM Corporation 2013

Figure 2: IBM Flex System V7000 Storage Node

Flex V7000 system details

Flex V7000 enclosures support up to twenty-four 2.5-inch drives

or up to twelve 3.5-inch drives. Control enclosures contain

drives, redundant dual-active intelligent controllers, and dual

power supplies, batteries and cooling components. Expansion

enclosures contain drives, switches, power supplies and cooling

components. You can attach up to nine expansion enclosures to

a control enclosure supporting up to 240 drives.

Key system characteristics are:

Internal storage capacity: up to 36 TB of physical storage

per enclosure

Disk drives: SAS disk drives, near-line SAS disk drives and

solid-state drives can be mixed in an enclosure to give you

extraordinary Flexibility

Cache memory: 16 GB cache memory (8GB per controller)

as a base feature—designed to improve performance and

availability

Ports per control enclosure: Eight 8 Gbps Fibre Channel

host ports, four 1 Gbps and optionally four 10 Gbps iSCSI

host ports

Ports per File Module: Two 1 Gbps and two 10 Gbps

Ethernet ports for server attachment and management, two

8 Gbps FC ports for attachment to Flex V7000 control

enclosures

IBM Flex System x240 Compute Node

IBM Flex System x240 Compute Node, an element of the IBM

PureFlex System, provides outstanding performance for your

mission-critical applications. Its energy-efficient design supports

up to 16 processor cores and 768 GB of memory capacity in a

package that is easy to service and manage. With outstanding

computing power per watt and the latest Intel® Xeon®

processors, you can reduce costs while maintaining speed and

availability.

Highlights

Optimized for virtualization, performance and highly

scalable networking

Embedded IBM Virtual Fabric allows IO Flexibility

Designed for simplified deployment and management

To meet today’s complex and ever-changing business demands,

the x240 compute node is optimized for virtualization,

performance and highly scalable I/O designed to run a wide

variety of workloads. The Flex System x240 is available on

either your PureFlex System or IBM Flex System solution.

Figure 3: IBM Flex System x240 Compute Node

For more information

For more information about IBM PureFlex System visit the

following URL:

ibm.com/systems/pureFlex/Flex_overview.html

Test Configuration The test configuration described in this section is designed to

demonstrate the server performance for a preconfigured

Exchange 2010 user population running a specific workload. It

is not designed as an Exchange “solution” because it does not

include high-availability features. If a production environment

were implemented as described in the Test Configuration portion

Page 5: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 5

© Copyright IBM Corporation 2013

of this document, the server itself would be a single point of

failure. For a valid Exchange solution which is based on the tests

performed and illustrated in this section, please turn to the

section labeled, Reference Architecture on page 13.

To demonstrate the performance characteristics of the x240

compute node and the Flex V7000 storage node, the test

configuration is designed to support 9,000 Exchange mailboxes.

The Client Access Server (CAS) and Hub Transport Server

(HUB) are deployed in a 1:1 ratio with the mailbox server in a

multi-role assignment (i.e., the CAS role and the Transport role

are installed on the same physical server with one CAS/HUB

server per mailbox server). For testing purposes, the CAS/HUB

server is deployed virtually on physical hardware separate from

the mailbox server.

This test configuration uses two domain controllers in a single

Active Directory forest.

Exchange Load Generator 2010 (LoadGen) is used to generate

user load for the server performance evaluation testing (see

details below). Three LoadGen clients are required to generate

sufficient load to simulate 9,000 mailboxes.

Exchange Server Jetstress 2010 is used to run stress tests against

the Flex V7000 storage node.

I/O Connectivity

Figure 4 shows the internal I/O links between the compute nodes

in the Flex System Enterprise Chassis and the four I/O modules

in the rear of the chassis.

Each of these individual I/O links can be wired for 1 GB or 10

GB Ethernet, or 8 or 16 GBps Fibre Channel. You can enable

any number of these links. The application-specific integrated

circuit (ASIC) type on the I/O Expansion adapter dictates the

number of links that can be enabled. Some ASICs are two port

and some are four port. For a two port ASIC, one port can go to

one switch and one port to the other providing high availability.

Figure 4: Internal connections to the I/O modules

Figure 5 illustrates the Fibre Channel 8 GB internal connections

between the x240 compute node and the Flex V7000 storage

node. A single dual-port HBA provides the 8 GB internal

connections to the storage.

Figure 5: Internal 8 GB connections between the server and storage

Page 6: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 6

© Copyright IBM Corporation 2013

Figure 6 illustrates the test environment’s design and mail flow.

Traffic originates from the LoadGen client. The network switch

routes traffic to the ADX, which assigns the traffic to the CAS

server (in a production environment the ADX would route traffic

to load-balanced CAS server which is part of a CAS array).

Traffic routes back through the network switch and to the CAS

server. The CAS server then routes the package to the mailbox

server.

Figure 6: Mail flow

Server Configuration

The x240 compute node is equipped with two Intel Xeon E5-

2670 2.6 GHz 8-core processors and 192GB of memory.

Hyperthreading is enabled.

Storage Configuration

The underlying storage design consists of multiple hard disk

types, combined into logical groups called MDisks, which are

then used to create storage pools. The storage pools are then

divided into volumes which are assigned to host systems.

An MDisk (managed disk) is a component of a storage pool that

is comprised of a group of identical hard disks that are part of a

RAID array of internal storage. Figure 7 lists the disks used to

build the MDisks used in this test. MDisk0 is comprised of eight

300 GB SAS 15k hard drives. MDisk1 is comprised of two 400

GB SSD hard drives. MDisk2 is comprised of eight 900 GB

SAS 10k hard drives.

A storage pool is a collection of storage capacity that provides

the capacity requirements for a volume. One or more MDisks

make up a storage pool.

A volume is a discrete unit of storage on disk, tape, or other data

recording medium that supports some form of identifier and

parameter list, such as a volume label or input/output control. By

default, all volumes that you create are striped across all

available MDisks in one storage pool.

Figure 7: MDisks comprised of internal disks

Figure 8 illustrates the storage pools (and the MDisks that make

up each particular pool) as well as the logical volumes created

from each of the storage pools.

Page 7: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 7

© Copyright IBM Corporation 2013

Figure 8: Storage pool design

The mailbox server supports 12 mailbox databases as well as 12

log files. The three volumes shown in Figure 8 are assigned to

accommodate the mailbox databases and three smaller volumes

are assigned to accommodate the log files.

Figure 9 illustrates the mailbox database and log distribution

between the volumes.

Figure 9: Volumes assigned to the x240 compute node

Solution Validation Methodology The testing required two phases: the storage performance

evaluation phase and the server performance evaluation phase.

Microsoft provides two tools to evaluate both aspects of an

Exchange environment: Exchange Server Jetstress 2010

(Jetstress) for testing the performance of the storage system, and

Microsoft Exchange Load Generator (LoadGen) for testing

server performance.

Storage Validation

Storage performance is critical in any type of Exchange

deployment. A poorly performing storage subsystem will result

in high transaction latency, which will affect the end-user

experience. It is important to correctly validate storage sizing

and configuration when deploying Exchange in any real-world

scenario.

In order to facilitate the validation of Exchange storage sizing

and configuration, Microsoft provides a utility called Exchange

Server Jetstress 2010. Jetstress simulates an Exchange I/O

workload at the database level by interacting directly with the

Extensible Storage Engine (ESE). The ESE is the database

technology that Exchange uses to store messaging data on the

mailbox server role. Jetstress can simulate a target profile of user

count and per-user IOPS, and validate that the storage subsystem

is capable of maintaining an acceptable level of performance

with the target profile. Test duration is adjustable and can be set

to an extended period of time to validate storage subsystem

reliability.

Page 8: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 8

© Copyright IBM Corporation 2013

Testing storage systems using Jetstress focuses primarily on

database read latency, log write latency, processor time, and the

number of transition pages repurposed per second (an indicator

of ESE cache pressure). The Jetstress utility returns a Pass or

Fail grade, which is dependent on the storage performance.

Test Execution and Data Collection

To ensure that the storage can handle the load generated by

9,000 users, Jetstress is installed and run on the mailbox server.

The Jetstress instance running on the mailbox server simulates

load for 9,000 mailboxes.

Although Jetstress is installed and run from the mailbox server,

it does not test the performance characteristics of the server

itself. Jetstress is designed to test the performance characteristics

of the storage system only.

After the test completes, Jetstress generates a report with a Pass

or Fail grade for the test.

Server Validation

For validating server performance, Microsoft provides a utility

called Exchange Load Generator.

LoadGen is a pre-deployment validation and stress testing tool

that introduces various types of workloads into a test (non-

production) Exchange messaging system.

LoadGen simulates the delivery of multiple MAPI client

messaging requests to an Exchange mailbox server. To simulate

the delivery of these messaging requests, LoadGen is installed

and run from client computers which have network connectivity

to the Exchange test environment. These tests send multiple

messaging requests to the Exchange mailbox server, which

causes a mail-based performance load on the environment.

After the tests are complete, you can use the results to assist

with:

Verifying the overall deployment plan

Identifying bottlenecks on the server

Validating Exchange settings and server configurations

LoadGen Profile

The LoadGen profile information in the table below was used to

validate the test environment.

LoadGen Configuration Value

Messages Send/Receive Per Day 100

Average Message Size 75 KB

Mailbox Size 250 MB

Testing for Peak Load

When validating your server design, it is important to test the

solution under anticipated peak workload rather than average

workload. Most companies experience peak workload in the

morning when employees arrive and check e-mail. The

workload then tapers off throughout the remainder of the day.

Based on a number of data sets from Microsoft IT and other

clients, peak load is generally equal to two times the average

workload throughout the remainder of the work day.

LoadGen uses a task profile that defines the number of times

each task will occur for an average user within a simulated day.

The total number of tasks that need to run during a simulated

day is calculated as the number of users multiplied by the sum of

task counts in the configured task profile. LoadGen then

determines the rate at which it should run tasks for the

configured set of users by dividing the total number of tasks to

run in the simulated day by the simulated-day length. For

example, if LoadGen needs to run 1,000,000 tasks in a simulated

day, and a simulated day is equal to 8 hours (28,800 seconds),

LoadGen must run 1,000,000 ÷ 28,800 = 34.72 tasks per second

to meet the required workload definition. By default, LoadGen

spreads the tasks evenly throughout the simulated work day.

To ensure that the Exchange solution is capable of sustaining the

workload generated during the peak average, modify LoadGen

settings to generate a constant amount of load at the peak

average level, rather than spreading out the workload over the

entire simulated work day. To increase the amount of load to the

desired peak average, divide the default simulated day length (8

hours) by the peak-to-average ratio (2) and use this as the new

simulated-day length.

Page 9: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 9

© Copyright IBM Corporation 2013

To change the simulated-day length, modify the following

section of the LoadGenConfig.xml file to reflect a 4-hour

simulated day:

<SimulatedDayLengthDuration>P0Y0M0DT8H0M0S<

/SimulatedDayLengthDuration>

Change to:

<SimulatedDayLengthDuration>P0Y0M0DT4H0M0S<

/SimulatedDayLengthDuration>

Test Execution and Data Collection

Testing large Exchange environments requires a staged start of

the LoadGen clients to prevent RPC requests from queuing up.

When RPC requests in queue exceed 1.5 times the number of

simulated users, LoadGen initiates test shutdown. Therefore, a

4-hour, ramp-up time is required to stage the LoadGen client

startup, resulting in a 16-hour test duration. Performance data is

collected for the full 16-hour duration of all five test runs.

Performance summary data is then taken from an 8-hour stable

period. Figure 10 shows the collected performance data for the

entire test with the stable period highlighted in gray.

Figure 10: Sample Perfmon data showing the highlighted stable

period

Content Indexing

Content indexing was disabled during the test period due to the

added time between test runs required to completely index the

databases. A production environment with content indexing

enabled could experience an additional load of up to 20%.

Test Validation Criteria

To verify server performance, Microsoft Performance Monitor

(Perfmon) was used to record the data points described in this

section.

Mailbox Server

The following table lists the performance counters collected on

the MAILBOX server as well as the target values.

Perfmon Counter Target Value

MSExchangeIS

Mailbox(_Total)\Messages Queued

For Submission

< 50 at all

times

MSExchangeIS

Mailbox(_Total)\Messages

Delivered/sec

Scales linearly

with number

of mailboxes

MSExchangeIS\RPC Requests < 70 at all

times

MSExchangeIS\RPC Averaged

Latency

Average < 10

Processor(_Total)\%Processor Time

<80%

MSExchangeIS Mailbox(_Total)\Messages Queued For

Submission

The total number of messages queued for the submission counter

shows the current number of submitted messages not yet

Page 10: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 10

© Copyright IBM Corporation 2013

processed by the transport layer. This value should be below 50

at all times.

MSExchangeIS Mailbox(_Total)\Messages

Delivered/sec

The messages delivered per second counter shows the rate that

messages are delivered to all recipients. This indicates current

message delivery rate to the store. This is not so much a

performance metric of the mailbox server, but more an

indication that the LoadGen servers are generating the

appropriate level of load.

This number should scale parallel to the number of simulated

mailboxes.

MSExchangeIS\RPC Requests

The number of RPC requests indicates the overall RPC requests

currently executing within the information store process. This

should be below 70 at all times.

MSExchangeIS\RPC Averaged Latency

RPC averaged latency indicates the RPC latency in milliseconds

(msec), averaged for all operations in the last 1,024 packets.

This should not be higher than 10 msec on average.

Page 11: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 11

© Copyright IBM Corporation 2013

For information about how clients are affected when overall

server RPC averaged latencies increase, visit the following

URL:

technet.microsoft.com/en-us/library/dd297964.aspx

Processor(_Total)\%Processor Time

For physical Exchange deployments, use the

Processor(_Total)\% Processor Time counter and verify that this

counter is less than 80 percent on average.

For more information

For more information about Exchange 2010 performance

monitoring, visit the following URL:

technet.microsoft.com/en-us/library/dd335215.aspx

Validation Results This section lists the results of the Jetstress and LoadGen tests

run against the Flex V7000 storage node and the x240 compute

node.

Storage Results

The storage passed rigorous testing to establish a baseline that

would conclusively isolate and identify any potential bottleneck

as a valid server performance-related issue.

Figure 11 shows a report generated by Jetstress. The 9,000

mailbox test passed with low latencies.

Figure 11: Jetstress results

Figure 12 shows the data points collected during the Jetstress

test run. The second column shows the averaged latency for I/O

Database Reads on all databases. To pass, this number must be

below 20 msec. The highest result for this test is 17.451, which

is somewhat of an outlier; the remaining database instances

performed well below the 17.451 msec response time of instance

3.

The third column shows the averaged latency for I/O Log Writes

on all databases. To pass, this number must be below 10 msec.

The highest result for this test is 0.647.

Page 12: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 12

© Copyright IBM Corporation 2013

Figure 12: Transactional I/O performance

The Jetstress test results show that at 9,000 mailboxes, the

storage performs exceedingly well and has remaining headroom

for additional I/O.

Server Results

This section describes the performance results for the x240

compute node hosting 9,000 mailbox users.

Figure 13 below shows the test results for the x240 compute

node. The first column lists the performance counters and the

expected target values. The second column lists the average

recorded value for each of the counters. The third column lists

the maximum recorded value for each of the counters. The test

values we are most interested in are accentuated in boldface

italics.

Figure 13: Test results for the x240 compute node

The x240 handled the load exceedingly well. The maximum

Messages Queued value remains well below the recommended

maximum of 50. The maximum RPC Requests are also well

below the recommended maximum of 70. The RPC Averaged

Latency is 1.48 which is well below the recommended

maximum average of 10 msec.

The last row in Figure 13 shows the processor load on the x240

compute node. Even under peak load, the processor does not

exceed 24% utilization.

The results of this test demonstrate the x240 compute node is

quite capable for this Exchange 2010 workload on this particular

hardware configuration.

Page 13: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 13

© Copyright IBM Corporation 2013

Reference Architecture This section describes a highly available Exchange 2010

reference architecture that is based on the test results above.

Customer Profile

The example used for this reference architecture is a fictitious

organization with 30,000 employees. The employee population

is split evenly between two regions. Each region has a datacenter

capable of housing server and storage hardware.

The company has determined the average number of emails sent

and received per day for each user is approximately 100, with an

average email size of 75KB. Each user will be assigned a 500

MB mailbox.

High availability

If an organization has multiple datacenters, the Exchange

infrastructure can be deployed in one or distributed across two or

more sites. Typically the service level agreement currently in

place will determine the degree of high availability and the

placement of the Exchange infrastructure.

In this example, the organization has two datacenters with a user

population that is evenly distributed between the two. The

organization has determined site resiliency is required; therefore,

the Exchange 2010 design will be based on a multiple site

deployment with site resiliency.

Backups

Exchange 2010 includes several new features that provide native

data protection that, when implemented correctly, can eliminate

the need for traditional backups. Traditionally backups are used

for disaster recovery, recovery of accidentally deleted items,

long term data storage, and point-in-time database recovery.

Each of these scenarios is addressed with new features in

Exchange 2010 such as high availability database copies in a

database availability group, recoverable items folders, archiving,

multiple-mailbox search, message retention, and lagged database

copies.

In this example, the organization has decided to forgo traditional

backups in favor of using an Exchange 2010 native data

protection strategy.

Number of database copies

Before determining the number of database copies needed, it is

important to understand the two types of database copies.

High availability database copy – This type of database copy

has a log replay time of zero seconds. When a change is made in

the active database copy, changes are immediately replicated to

passive database copies.

Lagged database copy – This type of database copy has a pre-

configured delay built into the log replay time. When a change is

implemented in the active database copy, the logs are copied

over to the server hosting the lagged database copy, but are not

immediately implemented. This provides point-in-time

protection which can be used to recover from logical corruption

of a database (logical corruption occurs when data has been

added, deleted, or manipulated in a way the user or administrator

did not expect).

Log replay time for lagged database copies

IBM recommends using a replay lag time of 72 hours. This gives

administrators time to detect logical corruption that occurred at

the start of a weekend.

Another factor to consider when choosing the number of

database copies is serviceability of the hardware. If only one

high availability database copy is present at each site, the

administrator is required to switch over to database copy hosted

at a secondary datacenter every time a server needs to be

powered off for servicing. To prevent this, maintaining a second

database copy at the same geographic location as the active

database copy is a valid option to maintain hardware

serviceability and to reduce administrative overhead.

Microsoft recommends having a minimum of three high

availability database copies before removing traditional forms of

backup. Because our example organization chose to forgo

traditional forms of backup, they require at least three copies of

each database.

Page 14: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 14

© Copyright IBM Corporation 2013

In addition to the three high availability database copies, the

organization has chosen to add a fourth, lagged database copy, to

protect against logical corruption.

Database availability groups

With Exchange 2010, the former data protection methods in

Exchange 2007 (Local Continuous Replication, Single Copy

Clusters, Cluster Continuous Replication and Standby

Continuous Replication) have evolved into Database

Availability Groups (DAG). The DAG is the new building block

for highly available and/or disaster recoverable solutions.

A DAG is a group of up to 16 mailbox servers that host a set of

replicated databases and provide automatic database-level

recovery from failures that affect individual servers or databases.

Microsoft recommends minimizing the number of DAGs

deployed for administrative simplicity. However, in certain

circumstances multiple DAGs are required:

You deploy more than 16 mailbox servers

You have active mailbox users in multiple sites

(active/active site configuration)

You require separate DAG-level administrative boundaries

You have Mailbox servers in separate domains. (DAG is

domain bound)

In our example, the organization is deploying an active/active

site configuration; therefore, they require at least two DAGs.

Mailbox Servers and Database Distribution

Given the decisions above, we can determine the number of

mailbox servers and the mailbox database distribution.

The organization needs at least four servers to support the three

highly available database copies and one lagged copy (a server

can host both lagged database copies and high availability copies

simultaneously).

Figure 14 below illustrates the mailbox database distribution

amongst the required physical servers for one of the two DAGs.

The second DAG will be the mirror image of the first DAG,

with database copies one and two at Site B, and the third copy

and the lagged copy at Site A.

Figure 14: Database distribution amongst servers (per DAG)

This design enables the organization to withstand up to two

server failures without loss of data. For example, if Server 2

fails, the passive copies (number 2) for each database hosted by

Server 2 will activate on Server 1. If Server 1 then fails, the third

database copy hosted at the secondary site could be activated.

With two servers at each site hosting active mailboxes (15,000

users per site), the entire population of 30,000 users is divided

equally amongst the four servers (two servers per DAG)

resulting in 7,500 users per server at normal run time (no failed

servers). The test results above have conclusively shown the

x240 compute node consumes roughly 24% of its processing

power to handle the workload generated by 9,000 users. With a

single server failure, a server would be required to handle the

workload generated by 15,000 users. The additional load of

4,000 users is well within the remaining processing capacity of

the x240 compute node.

Client Access Servers and Transport Servers

Page 15: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 15

© Copyright IBM Corporation 2013

To ease deployments, the Client Access Server (CAS) role, and

the Hub Transport Server (HUB) role are often installed together

on a single physical server, separate from the mailbox servers.

The CAS/HUB servers are then deployed in a 1:1 ratio with

mailbox servers (e.g. one CAS/HUB server per one mailbox

server).

For example, this organization requires four mailbox servers per

DAG for a total of eight mailbox servers. Therefore, eight

additional servers, installed with both the CAS role and the HUB

role, are required to handle the workload generated by 30,000

users.

Storage Sizing

For sizing the storage required by Exchange 2010, Microsoft has

created the Mailbox Server Role Requirements Calculator.

To download the calculator, and get information on its use see

the following URL:

blogs.technet.com/b/exchange/archive/2009/11/09/3408737.a

spx

To correctly estimate the number of disks required, and to align

with the testing performed above, a few variables are configured

in the Mailbox Server Role Calculator:

Disk Type – 900 GB 10k 2.5” SAS

RAID 1/0 Parity Grouping – 4+4 (to more closely simulate

the 8-disk MDisk groups in a RAID10 configuration)

Override RAID configuration – Yes; configured for

RAID10 on the database and log LUNs

After customization is complete, the calculator determines 216

disks are required at each site to host the mailbox databases and

logs and to provide a restore LUN for each server.

The Final Configuration

Figure 15 summarizes the end result of the sizing efforts.

The two sites are labeled Site A and Site B. Each site has 15,000

local Exchange users.

Two DAGs span the sites and are labeled DAG-1 and DAG-2 in

the diagram. Site A is the primary site for DAG-1 and Site B is

the primary site for DAG-2.

Primary sites

Primary site refers to the active copies of the mailbox databases

being geographically co-located with the 15,000 Exchange users

of that site.

Two network connections are required for the mailbox servers;

one network for MAPI traffic and one network for database

replication. The networks are labeled as MAPI and Replication

in the diagram.

Each of the DAGs has four mailbox servers. At the primary site

(where the users are co-located with their Exchange mailboxes)

two mailbox servers host the active database copy (Copy-1 in

the diagram) and the first passive copy (Copy-2 in the diagram).

At the secondary site, two mailbox servers host the third passive

copy (Copy-3 in the diagram) and the lagged database copy (Lag

Copy in the diagram). The second DAG is a mirror of the first

DAG. The two mailbox servers hosting the active copy of the

database and the first passive copy are located at Site B while

the third and fourth servers hosting the third passive copy and

the lagged database copy are located at Site A.

Figure 15: The final configuration

Page 16: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 16

© Copyright IBM Corporation 2013

In addition to the mailbox servers, each DAG has four

CAS/HUB servers; two at the DAG’s primary site and two at the

secondary site.

Each site has a global catalog server to provide redundancy at

the domain controller level.

Hardware Required

With redundant SAN switches, network switches, and power

modules, the IBM Flex System Enterprise Chassis provides the

high availability and fault tolerance necessary for an enterprise

class Exchange environment.

Take advantage of redundant power supplies

IBM recommends multiple circuits be used to power the Flex

System Enterprise Chassis, so in the case of a tripped breaker,

the chassis does not become a single point of failure.

Figure 16 illustrates the hardware required at each site to

support the organization’s 30,000 user population.

Each site requires four x240 compute nodes to host the mailbox

role, four x240 compute nodes to host the CAS/HUB role, and if

not already present an additional x240 compute node for a global

catalog server. Each mailbox server should have at least 128 GB

of memory installed and each CAS/HUB server should have at

least 32 GB of memory installed. Each x240 compute node

should have two Intel Xeon E5-2670 2.6 GHz 8-core processors.

To host the database files, each site’s Flex System Enterprise

Chassis will require the Flex System V7000 storage node fully

populated with 900 GB 10K SAS hard disk drives. In addition,

eight fully populated (with the same drive type) V7000 or Flex

V7000 expansion drawers are also required.

Finally, IBM recommends using a hardware-based (rather than

software-based) network load balancer such as the Brocade

ADX 1000 series as shown in the figure below.

Figure 16: Hardware required at each site

Conclusion The x240 compute node and Flex V7000 storage node

performed well throughout the test durations. These tests

demonstrate the capability of the x240 and Flex V7000 in

supporting 9,000 Exchange 2010 mailboxes on a single x240

compute node. Although the tests are not a true-to-life

deployment scenario the results can be used to build highly

available Exchange architectures as shown in the Reference

Architecture section.

With high availability and fault tolerance built into the platform,

IBM Flex System is a solid foundation for an enterprise

Exchange environment.

Page 17: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 17

© Copyright IBM Corporation 2013

About the author Roland G. Mueller works at the IBM Center for Microsoft

Technologies in Kirkland, Washington (just 5 miles from the

Microsoft main campus). He has a second office in building 35

at the Microsoft main campus in Redmond, Washington to

facilitate close collaboration with Microsoft.

Roland has been an IBM employee since 2002 and has

specialized in a number of different technologies including:

virtualization, bare-metal server deployment, and Exchange

Server infrastructure sizing, design, and performance testing.

[email protected]

Page 18: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Systems and Technology Group April 2013 Technical Whitepaper

Page | 18

© Copyright IBM Corporation 2013

Page 19: IBM Flex System: A Solid Foundation for Microsoft Exchange Server 2010

Page | 19

© Copyright IBM Corporation 2013

© Copyright IBM Corporation 2013

IBM Systems and Technology Group

Dept. U2SA

3039 Cornwallis Road

Research Triangle Park, NC 27709

Produced in the United States of America

April 2013

All Rights Reserved

IBM, the IBM logo and ibm.com are trademarks or registered trademarks of

International Business Machines Corporation in the United States, other

countries, or both. If these and other IBM trademarked terms are marked on

their first occurrence in this information with a trademark symbol (® or ™),

these symbols indicate U.S. registered or common law trademarks owned by

IBM at the time this information was published. Such trademarks may also

be registered or common law trademarks in other countries. A current list of

IBM trademarks is available on the Web at “Copyright and trademark

information” at ibm.com/legal/copytrade.shtml Other company,

product and service names may be trademarks or service marks of others.

References in this publication to IBM products and services do not imply

that IBM intends to make them available in all countries in which IBM

operates.