DEMYSTIFYING SOFTWARE DEFINED DATACENTRE...VMware NSX for network virtualization. All four nodes communicate with each other using NSX Virtual Network overlay on physical network switches

DEMYSTIFYING SOFTWAREDEFINED DATACENTRE

Anuj SharmaAdvisor Solutions ArchitectureDell [email protected]

Knowledge Sharing Article © 2017 Dell Inc. or its subsidiaries.

2017 Dell EMC Proven Professional Knowledge Sharing 2

Table of Contents

Abstract .......................................................................................................................... 3

Software Defined Datacentre ........................................................................................ 4

Challenges Overcome by SDDC ...................................................................................... 9

Transformation to SDDC ................................................................................................ 9

Best Practices ............................................................................................................... 11

Reasons to Choose Buy vs Build .................................................................................. 16

References ................................................................................................................... 18

Figures

Figure 1 Traditional Data Center Layers ........................................................................ 4

Figure 2 Converged Infrastructure ................................................................................. 5

Figure 3 CI CAGR ............................................................................................................ 5

Figure 4 4-Node Hyper Converged Appliance ............................................................... 6

Figure 5 HCI CAGR .......................................................................................................... 7

Figure 6 Blocks Racks Appliances ................................................................................... 8

Figure 7 SDDC Topology ............................................................................................... 16

Disclaimer: The views, processes or methodologies published in this article are those of the author. They do not necessarily reflect Dell EMC’s views, processes or methodologies.


Abstract

IT organizations supporting business are experiencing a paradigm shift as they move

from second platform to third platform infrastructure. The shift is driven largely by

Employees based at locations across the globe such as, remote offices, working

from home, and client sites.

Devices used by the employees to perform their daily job responsibilities, i.e.

smartphones, tablets, laptops, virtual desktops and thin clients.

Stringent Time to Market timelines for new products and updates.

Support for multiple application development platforms.

Exponential structured and unstructured data growth.

On-demand infrastructure requirements.

IT organizations have to support the business in overcoming the above challenges and

therefore have become one of the most critical units in an enterprise. In today’s

economic climate, IT organizations have to optimize existing infrastructure if they are

to deliver results with reduced budgets. The Software-Defined Datacentre (SDDC) has

become the buzzword for IT organizations to address these challenges. This

Knowledge Sharing article examines:

What a Software Defined Datacentre is

How Software Defined Datacentre overcome business challenges

How to build a solid foundation for a Software Defined Datacentre

Approaches to transform your existing datacentre into a Software Defined

Datacentre

A Software Defined Datacentre design

Industry-leading infrastructure technologies to support the software

Best practices to follow during design and implementation phase


Software Defined Datacentre

SDDC is a hot topic. IT organizations continuously debate it and IT vendors conduct

workshops and seminars on it. But what is SDDC? Let’s look back and see how

traditional datacentres look.

Figure 1 Traditional Datacentre Layers

Different hardware for storage, compute and networking.

Different teams managing storage, networking and compute hardware.

Network, storage and compute can be scaled independently.

Pain points of a traditional datacentre

Lack of coordination between departments.

Complex troubleshooting procedures as coordination among different teams

is required which leads to longer resolution time.

Interoperability issues between different datacentre components leading to

availability and performance issues.

Managing multiple vendor support contracts.

IT Team spends most of its time merely keeping the lights on, i.e. managing

day-to-day operations, leaving limited time to plan for the future.

Tedious budget planning due to various streams and vendors involved in IT.

To address these pain points, IT vendors devised their own solutions. Referred to as

Converged Infrastructure, this approach to datacenter management seeks to

minimize compatibility issues between servers, storage systems and network devices

while also reducing costs for cabling, cooling, power and floor space. Converged

Systems are pre-engineered and tested systems.

APPLICATIONS

OPERATING SYSTEM

COMPUTE HARDWARE

NETWORK HARDWARE

STORAGE HARDWARE


Figure 2 Converged Infrastructure

It’s the entire datacentre compute, network, hypervisor and storage stack tested,

engineered and delivered together as one system. Thus, converged infrastructure

addresses all the pain points of a traditional datacenter architecture. This is why the

converged infrastructure market is one of the fastest growing in the IT industry. Major

cloud service providers, telcos, and banks are building their datacenters with

Converged infrastructure as the foundation.

Sever virtualization technologies such as VMware vSphere and Network Virtualization

Technologies such as NSX are being extensively deployed on Converged

Infrastructures for better interoperability and performance. Converged Infrastructure

vendors ship pre-engineered and tested systems with factory installed server and

network virtualization technologies.

Figure 3 shows Converged Infrastructure growth tracker with almost 9% compound

annual growth rate (CAGR).

Figure 3 CI CAGR


On the sidelines of datacenter hardware evolution, datacentre software that was

taking the lead in evolution. It all started with server virtualization with VMware taking

the market by storm and today it is the market leader in server virtualization. Later,

EMC acquired ScaleIO for storage virtualization followed by VMware introducing

VSAN for storage virtualization and then NSX for network virtualization.

The motivating factor behind all these was to address the infrastructure, economic,

agility, scalability, and elasticity challenges enterprises faced. Virtualization of all the

components addresses these challenges. This evolution complemented converged

infrastructure’s architecture with server virtualization and network virtualization

technologies. Converged infrastructure now is the first choice for many enterprises to

run tier 1 application workloads. Enterprises can take advantage of virtualization along

with other converged infrastructure benefits.

Some workloads are not considered the best candidate for Converged Infrastructure

due to monetary and other reasons. Also there are lot of SME organizations that don’t

require all the benefits that a Converged Infrastructure offers and still follow a build it

yourself approach to save on costs. To address this market segment and workload

needs, IT vendors developed hyper converged racks and appliances. Hyper Converged

Infrastructure (HCI) utilizes storage, network and server virtualization technologies

deployed on commodity off-the-shelf hardware. This approach focuses on maximum

use of software and minimum use of hardware, enabling organizations to start with a

small datacentre footprint and scale over time.

POOLED STORAGE

POOLED CPU MEMORY

HYPERVISOR VIRTUALIZATION

NETWORK VIRTUALIZATION

Figure 4 4-Node Hyper Converged Appliance


For example, Figure 4 depicts a 4-Node Hyper Converged appliance using

ScaleIO for storage virtualization so that all four nodes contribute local

storage to a common storage pool.

VMware vSphere for server virtualization to pool local CPU and Memory

resources to a common memory and CPU pool.

VMware NSX for network virtualization. All four nodes communicate with

each other using NSX Virtual Network overlay on physical network

switches. Network switches are used as physical transport layer between

nodes to carry NSX VXLAN packets. All other network-related tasks are

taken care of by NSX as it creates a virtual universal transport layer across

nodes.

Clearly software is the core of a Hyper Converged Appliance. This is the reason HCI

appliances are synonymous with Software Defined Datacenter. Converged

infrastructures are also a crucial component in a Software Defined Datacenter.

For me, an ideal software defined datacenter comprises Converged Infrastructure

Blocks and Hyper Converged Appliances/Racks as both complement each other in

regard to catering to various customer workloads.

Figure 5 HCI CAGR

Its 105% CAGR clearly shows that HCI appliances are welcomed by the market.

Following the same approach as CI, vendors such as Dell EMC engineered and

designed rack-scale systems based on Hyperconverged architecture. A common

question surfaces now, what to use and when to use? Let’s try to address this.


Figure 6 Blocks Racks Appliances

As sown in Figure 6, it all depends upon design preference and workload use case.

Blocks

Design focus is to use industry-proven hardware for servicing Tier 1, Tier 2 application

workloads where we can scale hardware, network and compute independently of

each other with ability to have various configuration options per workloads.

Racks

Design focus is to be flexible for servicing Tier 2, Tier 3 applications with system

architecture defined by SDS layer and commodity off-the-shelf components. Racks

are designed and engineered as an entire rack based on hyper converged architecture.

All internode communication happens through network hardware layer engineered,

designed and dedicated for rack components.

Appliances

Design focus is to be simple and start small with liberty to scale for serving Tier2 and

Tier 3 applications. This again uses SDS and commodity off-the-shelf components.

Nodes communicate with each other using customer network hardware layer.


Challenges Overcome by SDDC

Evolution of SDDC has enabled organizations to overcome the following challenges:

No Boundaries

A Software Defined Datacenter frees organizations from limitations of

hardware or geographical boundaries.

Infrastructure/Platform requirements of a Business Unit in Canada can be

served by resources available in Singapore.

Agility

Datacentre’s have now become agile, flexible, and robust.

Pooling

Organization Datacenters no longer operate as independent silos.

Organization DataCenters across the globe contribute to the global resource

pool, providing economies of scale.

On-Demand

IT Organizations now directly contribute to the success of the Enterprise by

providing on-demand application/platform requirements in today’s dynamic

business environment.

Choice

IT Organizations now have the ability to support multiple workloads, datasets,

and platforms across the board.

Efficiency

IT Organizations have become more efficient with optimal use of resources

across the board.

IT organizations are major contributors to a successful Enterprise today.

Transformation to SDDC

Let us now discuss the approach to Transform and Build. It is of utmost importance

that the decision to choose SDDC components is made after in-depth analysis and

thought process.


Hardware is as critical as software in a SDDC. One should be wise in choosing the

Hardware components for SDDC.

Server Vendor

o Choose a Server Vendor that has been in the server industry for a

long time and has worked extensively with Network, Storage and

Server Virtualization Software vendors.

o This is very important as server hardware complements the

software in terms of compatibility for optimal performance.

o As per the above two points it’s critical that you choose a Server

Vendor that complement your Network and Server Virtualization

vendors.

o As server is an important component in SDDC, choose a Server

vendor and model that has published results in terms of reliability

and supportability.

Network Vendors

o The importance of network vendor is often overlooked as it’s felt

that server and software is most important.

o But network path is critical as all nodes contribute to storage,

network and compute pool so it is vital to choose a network

vendor and hardware that has been designed and engineered with

scalability of network virtualization in consideration.

Choosing the correct Storage, Server and Network Virtualization technologies.

The same principle that we discussed for choosing hardware applies to software

as well. Make sure the Software technologies selected complement the hardware

chosen. This means that hardware and software vendors have published results of

compatibility and performance.

Confirm there is enough evidence that hardware and software vendors have

worked extensively with each other on testing and have a roadmap ahead in terms

of relationship and supporting each other’s technologies.

Server, Network and Storage Virtualization technologies integration is at the core

of a SDDC. Make sure that the technologies chosen have enough integration use

cases available of them together. There are roadmaps available from respective

vendors for the future in terms of supportability and development. As this

technology space is emerging and very dynamic, it becomes increasingly important

that every technology complements one another as they evolve and vendors have

plans in place to support each other. Lastly, don’t forget that these choices should


complement the reasons you are transforming, i.e. need for scalability, elasticity,

on demand, etc.

Non-Disruptive Maintenance Procedures

Once the environment is in production, downtimes are least desired. It’s

important that the technologies chosen allow maintenance operations such as

upgrades and node additions to be performed with no or minimal downtime.

Elasticity

This is an important factor of a SDDC. Make sure to evaluate the scalability

parameters of the Server, Network, and Storage Virtualization technologies

chosen. For example, maximum number of datastores, maximum datastore size,

datastore snapshot options, server virtualization maximum memory/CPU support,

network virtualization technology features like micro segmentation, etc.

Customer Support

Ensure that the chosen vendors have a proven support structure in place to

support you in the event of issues on production environment. For example, a

hardware vendor can provide you with a replacement in case of failure in minimal

timelines and have a local field support structure in place.

Best Practices

Best practices for various layers and topology of SDDC.

Server

Verify that the server firmware installed is as per hypervisor compatibility

documents.

Make sure BIOS firmware is as per hypervisor compatibility documents.

Go through the documentation and make sure any BIOS-specific settings are

applied carefully such as HT, VT, NUMA, SD Card Mirroring, Server Power

Management, Integrated RAID Controller Mode, etc.

Boot Sequence should be configured as per the documentation, for example,

network followed by the local storage for hypervisor like SATADOM and flash.


Hypervisor

Hypervisor installed should be as per computability matrix.

Hypervisor should be patched as per latest patch updates.

Nodes connect to the Physical Switches so appropriate teaming policies

should be selected in conjunction with the settings on the switch. For

example, for ESXi we recommend IP Hash as teaming policy but this requires

ether channel on switches. This is very important, otherwise inconsistent

settings across nodes can lead to service disruption.

Make sure that the compatible RAID Controller drivers are installed on all

nodes as it’s a critical performance and availability factor.

NTP should be configured across the hypervisor nodes for time sync.

Storage Virtualization

It is important that correct RAID Controller drivers are installed.

Storage Virtualization technologies like VSAN and ScaleIO have

recommended settings for different kind of nodes. All Flash nodes have some

specific settings and hybrid nodes have different settings. Ensure that the

settings are applied as per the documentation.

It’s important that storage traffic have dedicated network interfaces. This

implies that dedicated NIC cards should be configured for Storage Traffic and

redundancy should be factored. Also MTU size of 9000 is recommended for

Storage Network. I have personally seen a dramatic increase in performance

as soon as MTU size is changed to 9000.

If network supports MTU 9000, ensure that it is configured across the data

path, i.e. from hypervisor across to virtual switches across to the network

switches.

Ensure desired failure protection is selected, i.e. how many node or disk

failures cluster can sustain.

There are other kernel-level parameters that need to be tuned for optimal

performance. This varies from kernel to kernel and from one storage

virtualization layer to another. For reference, below are some of the

important parameters that need to be tuned for ScaleIO systems for

optimum performance. ScaleIO systems have two type of nodes; SDC and

SDS. SDS are the nodes that contribute their storage to a common pool of

storage, whereas SDC are the nodes that access storage from common pool

of storage. I stress this point because performance of the system depends on

how these parameters are tuned.


Tuning ESXi SDS nodes

o Change the Maximum Transmission Unit (MTU) setting to 9,000 on

the vSwitches and on the SVM (VM that is installed on each ESXi host

as a part of Installation).

Tuning ESXi SDC Nodes

o After the SDC is installed, type the following esxcli command:

esxcli system module parameters set -m scini -p "netConSchedThrd=4

mapTgtSockets=4 netSockRcvBufSize=4194304

netSockSndBufSize=4194304"

Furthermore, if you issue this command, ESX will delete other existing

parameters. Therefore, the SDC GUID and MDM IP address should be

provided as part of the same command. For example:

esxcli system module parameters set -m scini -p "netConSchedThrd=4

mapTgtSockets=4 netSockRcvBufSize=4194304

netSockSndBufSize=4194304 IoctlIniGuidStr=12345678-90AB-CDEF-1234-

567890ABCDEF IoctlMdmIPStr=192.168.144.128"

o To increase per device queue length (which can be lowered by

default by ESX to 32), type the following esxcli command:

esxcli storage core device set -d <DEVICE_ID> -O <QUEUE_LENGTH>

where <QUEUE_LENGTH> can be number in the range 32-256

(default=32).

For example:

esxcli storage core device set -d

eui.16bb852c56d3b93e3888003b00000000 -O 256

Tuning RedHat SDS Nodes

o Perform the following steps, for all NICs in the ScaleIO system:

Note: Prior to activating MTU settings on the logical level, you must

set Jumbo frames = MTU 9000\9126 on the physical switch ports that

are connected to the server. Failure to do so may lead to network

disconnects and packet drops. Refer to your relevant vendor

guidelines on how to configure Jumbo Frame support.

Confirm jumbo frame is enabled. To test, type:

ping -M do -s 8972 <DESTINATION_IP_ADDRESS>

o Perform one of the following:

For persistent configurations, change the txqueuelen parameter to

10,000, by adding the following line to the file rc.local:

ip link set dev NIC_NAME> txqueuelen 10000


o To modify the I/O scheduler of the devices, type the following on

each server, for each SDS device to the rc.local file.

echo noop > /sys/block/sd*/queue/scheduler

For example:

echo noop > /sys/block/sda/queue/scheduler

o It is recommended to change the kernel tunables by copying the

content of /opt/emc/ scaleio/sds/cfg/emc.conf into /etc/sysctl.conf

(while leaving /opt/emc/scaleio/sds/cfg/scaleio.conf as is).Type:

chmod +x /etc/rc.d/rc.local

Tuning RedHat SDC Nodes

o Edit the file /etc/init.d/scini on each SDC node by adding the

following parameters in the /sbin/insmod $DRV_BIN line. Adjust the

parameter values to the needs of your workload.

netConSchedThrd=8 netSockSndBufSize=4194304

netSockRcvBufSize=4194304 mapTgtSockets=4

After the editing, the line should look similar to this:

/sbin/insmod $DRV_BIN netConSchedThrd=8

netSockSndBufSize=4194304 netSockRcvBufSize=4194304

mapTgtSockets=4

o Restart the service by typing the following command:

systemcto restart scini

o It is recommended to change the kernel tunables by copying the

content of /opt/emc/scaleio/sdc/cfg/emc.conf into /etc/sysctl.c onf

(while leaving /opt/emc/scaleio/sdc/cfg/scaleio.confas is).

o For all NICs in the ScaleIO system, perform the following MTU

configuration steps performed for SDS nodes.

o Restart the SDC Node.

Network Virtualization

o Network Virtualization technologies have different MTU

requirements. Make sure MTU is set correctly. For example, 1600 is

the minimum MTU size for NSX.

o Also, make sure that recommended compute and memory resources

are provided to the Network Virtualization VM’s as per the design

guides.


o Network Virtualization components should be deployed considering

redundancy and availability. For example, Edge ECMP can be

configured in Active/Active mode, should be used for large

deployments, stateless failover, multiple (1 per Edge) peerings to the

external network, failover can take between 3 and 10 seconds. NSX

6.1 or higher has to be run for this feature.

We have seen many instances of service disruption where above practices are not

followed for building SDDC. So it’s important to follow the guidelines for each layer in

terms of design and implementation best practices.

Network Topology and Management Workload Segregation

o We can deploy spine leaf topology where all nodes connect to top

of the rack redundant switches and top of rack switches connect to

the spine switches as shown below for redundancy and scalability.

o We can have a dedicated Management Cluster where all

Management VM’s reside. For example, in a vCenter Environment

vCenter Server , PSC’s , NSX Manager , NSX Controllers residing on a

dedicated cluster, NSX edges residing on a dedicated cluster and

production clusters are separate .

o Apart from this we can have Management switches on top of each

rack for connecting out-of-band management of equipment like

BMC, IDRAC Ports, Management Ports, etc.


Figure 7 SDDC Topology

I hope that the above section gives you a good idea of the factors that should be

considered for transforming and building a SDDC.

Reasons to Choose Buy vs Build

In the previous section we discussed some of the design factors as well as best

practices for building a SDDC. By now we all are aware that building a SDDC requires

considerable amount of planning, design, and implementation effort. Vendors such as

Dell EMC have designed and engineered SDDC stacks which provide multiple benefits

to an organization. This will likely compel organizations to buy pre-engineered and

designed integrated SDDC stacks rather than build their own. Benefits of these stacks

include:

Ready to go from Day 1

These stacks are ready for deploying production workloads from first week of

delivery as all SDDC stacks come pre-deployed and configured as per

customer requirements.

Pre-engineered and designed

Stacks are pre-engineered and designed for scalability, availability and

reliability with all parameter tuning, design, and implementation best

practices followed across the storage, network, and compute virtualization

stack.


Single Point of Support

Customer doesn’t need to manage multiple vendors as whole stack is

supported by a single vendor which leads to faster problem resolution time.

Thus, making it very beneficial for the customer.

Optimal use of Customer Team

Customer Team can focus on more important tasks rather than focusing on

keeping the lights on.

Lifecycle Management

This is another important aspect as product upgrade lifecycle is managed for

the whole stack by the vendor. This avoids service disruption issues that

customers might face in build scenarios where upgrade of one component

leads to incompatibility with other components, leading to service disruption.

Time to market is considerably reduced

Project delivery timelines are considerably reduced with SDDC stacks. In build

scenarios, SDDC infrastructure would require months to be ready for

production workloads due to long evaluation, procurement, design, and

implementation process. Meanwhile, SDDC stacks come ready for deploying

workloads immediately from first week for delivery.

Dedicated teams working on SDDC stacks with well-defined future roadmap

Vendors such as Dell EMC have dedicated engineering teams working every

day on evolving these stacks. For instance with VxRack SDDC, VxRack FLEX,

and VxRail, engineering teams from VMware and Dell EMC are working

together as one team to deliver a best-of-breed integrated SDDC stack. They

have a well-defined roadmap for the product line in terms of features, etc.


References

www.google.com

www.vmware.com

www.dellemc.com

Dell EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying and distribution of any Dell EMC software described in this publication requires an applicable software license. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries.

http://www.google.com/

http://www.vmware.com/

http://www.dellemc.com/

Documents

DEMYSTIFYING SOFTWARE DEFINED DATACENTRE...VMware NSX for network virtualization. All four nodes communicate with each other using NSX Virtual Network overlay on physical network switches