Creating a Windows 2003 Load Balanced Cluster Using VMWare Server

  • Upload
    sady

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Creating a Windows 2003 Load Balanced Cluster using VMWare Server (used to be GSX but is now the free version) Summary:

We wanted to utilize the RAM capabilities of Windows 2003 x64 (the 64-bit version of Windows) to allow for many user sessions in a Terminal Server environment. Unfortunately, the vendor did not yet support a 64-bit OS. We had already purchased 3 servers so we decided to run two x86 (32-bit) Windows 2003 Terminal Server VM's per machine. We then offered to load balance these virtual machines across separate hosts so that the end user would always get the most out of each machine. These systems needed to be as highly available as possible and we thought we covered our tracks pretty well. I thought I would document our process to help others since I had quite a bit of trouble getting everything to run in the first place.

THE MOST IMPORTANT step we learned since we have a CISCO network, there MUST be a static ARP entry for the Clustered IP's MAC address on the CISCO switch. The cluster must run in a "Multicast" mode and the way CISCO implements multicasting requires this static entry or else packets to and from the cluster will be ignored when traversing a CISCO gateway. The ARP entry only needs to be added to the gateway of the subnet where the clustered nodes reside so if there is a redundant gateway the CISCO gateway, then both switches need the ARP entry. This may be true for other network types as well, but is definitely true for CISCO networks.

Equipment:

3 Dell 2950's

Each server has 4 NICs (2 On-board Broadcom NICs and 2 single port Intel NICs all purchased from Dell)

Each Server has 16GB of RAM

Each server has 4x146GB 2.5 inch SAS drives

Host setup:

Each server was setup the same; I changed the names to protect the innocent :-). The systems are loaded with Windows 2003 x64 Standard on each server. These systems all had local storage so we simply made a RAID5 virtual disk out of 3 drives with the 4th being a hot spare. Then made a C: drive for the OS (about 12GB in size) and the rest of the space as a big D: drive where the virtual machines will sit. We also put a 24GB page file on the D: drive. We loaded VMWare Server 1.0.4-56528 (though I'm sure the current versions work similarly) on the D: drive and pointed our default guest directory to a folder called D:\Virtual Machines.

Once the OS was loaded, we setup the network cards. Here is where some special considerations are needed. The VMWare documentation says that each guest will need a live NIC and a NIC dedicated to the load balancing. The instructions also say that the load balanced NIC should be mapped to a separate NIC (either host only or external). With those points in mind, we teamed the 2 on board NICs and treated each of the Intel NICs as a separate attached NIC. Each of the Broadcom NICs was then plugged into one of 2 edge switches in the rack (for a primary and secondary connection) and each Intel NIC was plugged into primary or secondary switch as well (one Intel NIC to each).

The IP scheme for all machines must be on the SAME SUBNET for all this to work. That will mean lots of IPs in this situation. I give examples below with labels and then I will describe what we did:

10.0.0.50 = Full IP for VMHOST01

10.0.0.51 = 1st support IP for VMHOST01

10.0.0.52 = 2nd support IP for VMHOST01

10.0.0.53 = Full IP for VMHOST02 10.0.0.54 = 1st support IP for VMHOST02 10.0.0.55 = 2nd support IP for VMHOST02 10.0.0.56 = Full IP for VMHOST03 10.0.0.57 = 1st support IP for VMHOST03 10.0.0.58 = 2nd support IP for VMHOST03

10.0.0.60 = Virtual System Full IP for VMGUEST 01 (on VMHOST01) actual network NIC 10.0.0.61 = Load Balanced NIC IP for VMGUEST01 (on VMHOST01) 10.0.0.62 = Virtual System Full IP for VMGUEST 02 (on VMHOST01) actual network NIC

10.0.0.63 = Load Balanced NIC IP for VMGUEST02 (on VMHOST01)

10.0.0.64 = Virtual System Full IP for VMGUEST 03 (on VMHOST02) actual network NIC

10.0.0.65 = Load Balanced NIC IP for VMGUEST03 (on VMHOST02)

10.0.0.66 = Virtual System Full IP for VMGUEST 04 (on VMHOST02) actual network NIC

10.0.0.67 = Load Balanced NIC IP for VMGUEST04 (on VMHOST02)

10.0.0.68 = Virtual System Full IP for VMGUEST 05 (on VMHOST03) actual network NIC

10.0.0.69 = Load Balanced NIC IP for VMGUEST05 (on VMHOST03)

10.0.0.70 = Virtual System Full IP for VMGUEST 06 (on VMHOST03) actual network NIC

10.0.0.71 = Load Balanced NIC IP for VMGUEST06 (on VMHOST03)

10.0.0.75 = Shared (load balanced) IP address for cluster

ourclustername.ourdomainname.net=our unique DNS name for our cluster

As you can see there are a lot of repeats, so the same thing usually occurs on a per machine basis. On the first machine (VMHOST01), we assigned the full IP of 10.0.0.50 to the team created from the 2 on-board Broadcom NICs. This team was given the gateway information, had the subnet mask info, had DNS server entries, was set to register this IP with DNS, and even had the WINS entries. We next assigned the 1st support IP 10.0.0.51 IP to the first Intel NIC. We did NOT add a gateway, only the IP and subnet mask. This NIC also had NO DNS entries, did NOT register with DNS, had NO WINS entries, and we disabled NetBIOS over TCP/IP on the WINS tab as well; this secondary information is only needed by the first, live IP. The second Intel NIC was configured the similarly to the first Intel NIC with the 2nd support IP of 10.0.0.52 and its subnet mask, but all other settings were disabled just like the first Intel NIC.

We also had to make some adjustments to the "Advanced Settings" (in the network connections folder under the "Advanced" menu) on the "Adapters and Bindings" tab. We made sure that the "Teamed" virtual adapter was listed at the top in the "Connections" order, followed by each of the team members, and then finally the Intel NICs at the bottom. Just an FYI we do backups of our VMWare machines by backing up open files on the host each night; if there is ever a problem, we simply restore the whole virtual machine itself. These changes help the backup see the host correctly. Backing up the host only also saves us on backup licenses since our backup vendor counts a virtual machine and a full client if we load an agent on it. We repeated the OS and IP settings across the other 2 host machines (VMHOST02 and VMHOST03). VMWare Software setup:

Next in line is to setup the VMWare software and settings on virtual machines. First we made sure the VMWare network settings will utilize what we want to do as far as routing virtual NIC information out of specific NICs. For what I'm calling the "Virtual System full IP", I want to make sure that on each guest system, this IP is mapped to the created teamed NIC on the host. In VMWare, on the menus, we went to "Host", " Virtual Network Settings". Then on the "Host Virtual Network Mapping" tab, we changed the VMnet0 to map to the teamed adapter (for the Broadcom's it was called the "BASP Virtual Adapter"). We kept the other defaults but we set VMnet2 to the 1st lintel NIC and VMnet3 to the 2nd Intel NIC. Now we can start building the guest machines. Guest Machine setup:

We built a base VM and we set the following parameters:

Memory to max (3.6GB)

1x20GB hard drive (set to grow as needed) Only 1 NIC for now (after cloning we will add a second virtual NIC); settings left default

2 processors

Set for 2003 Standard

Other parameters set after cloning

Next we loaded Windows 2003 x86 (32bit) standard as VMGUEST01 and loaded all patches, virus scanning, etc. We then shut it down and copied it 6 times (now we can save the base VM if there are any problems and use it to build more later).

To create VMGUEST01 and each subsequent VM, we will needed some extra tweaking before each was added to the domain, terminal services was loaded, and network load balancing was loaded. We need to make this system unique and not just a copy. We could run sysprep, but that is tedious because you have to re-run all initial setup parameters. Instead we ran the "NewSID" utility from www.systeminternals.com (bought by Microsoft... the direct link to the utility is http://technet.microsoft.com/en-us/sysinternals/bb897418.aspx) we gave it a random SID and a new name on each system and let it reboot. Next we had to take care of the Network adapters

When the system came back on, we went into device manager and uninstalled the virtual VM NIC. Unfortunately, the "NewSID" utility does not assign unique GUIDs to the NIC's so if we did not take this step, every cloned machine would appear to be using the same NIC which will cause problems later when setting up network load balancing in windows. After uninstalling the virtual NIC in device manager, we shut down the VM.

Next, we went into the properties of each VM and removed the existing virtual Ethernet adapter it has to be removed and re-added so the guest will think that "new hardware" has been added. Then, we added one back, making sure that its network connection is "custom" and used VMnet0 (remember VMnet0 should be mapped to the teamed NIC of the host). After, we added a second virtual Ethernet adapter to the virtual system changing the "Network Connection" to use VMnet2 (that means traffic from this second adapter will be bridged to go out of the first Intel NIC that was setup earlier). There were 2 guest machines on this host so we needed to do something similar with the second machine. When the second guest was shutdown, remove and re-added the first virtual Ethernet adapter; this first adapter needed its network connection customized to point to the VMnet0 adapter. The second guest also needed a second adapter; however, this second adapter needed to be customized to point to VMnet3 (the traffic from this adapter would now be bridged to go out of the second, non-teamed Intel NIC in the host). We made sure we ran the NewSID on the second guest machine after all that as well. These steps made sure that a "new" adapter would be installed in the guest (assigning a unique GUID in the OS) and allow for the traffic to flow correctly in the Network Load Balanced cluster I emphasize it because it messed us up as we tried to get things working.

We made all the similar changes on the rest of the guests across the other 2 host machines and turned on all the guest machines. Now we could work on the VMGUESTs and get their IP's setup and enable network load balancing. Enabling network load-balancing for terminal services across all 6 virtual machines:

First, we assigned the IP to the guest machine. In our example, we assigned 10.0.0.60 to the first adapter (which we renamed "1 Actual network" to help us out). This adapter got the full gamut of settings including IP, subnet, gateway, DNS servers, registration, and WINS server settings. The second NIC on this guest (which we renamed "2 LBN") would get the IP address of 10.0.0.61 and the subnet settings but nothing else. We would also turn off DNS registration and disable NetBIOS over TCP/IP on this second NIC. We assigned the IP's as outlined across the guests and made sure all IPs are pingable (the host, each of its 2 Intel NIC IPs, and each of the 2 IP's on each guest).

Since this is a Terminal Server Load Balanced Cluster, we had to do all the stuff needed for terminal services on each GUEST machine. We added them to the domain, loaded Terminal Services on each guest, and pointed them to the licensing server in the domain. We tested them each to be sure that Terminal Services was working fine.

Now we could setup the Network Load Balancing cluster for Terminal Services across the 6 VMGUEST machines. We logged locally to the first guest machine (VMGUEST01) and started the Network Load Balancing Manager in the Administrative Tools folder. We right clicked on the "Network Load Balancing Clusters" icon and chose "New Cluster". In the dialog, added the IP for the cluster (10.0.0.75 for our example), the subnet mask, the "Full Internet Name" (ourclustername.ourdomainname.net), and MOST IMPORTANTLY chose "Multicast" as the Cluster operation mode. Because of all the bridging and switching used with VMWare, Unicast just didn't seem to work right. Clicked next, leaving the next spot at it defaults (unless additional cluster IP's may be needed we didn't need them).

Clicked Next again, and we came to the Port rules. Since this was a Terminal Server cluster we only needed a rule for port 3389 (FYI all of this can be adapted to use for port 80 or 443 for web services as well, or it can be left to all ports if you have an app that uses a wide range). We removed the default rule and added a new rule. In the "Add" dialog, we left "All" checked, changed the port range to go from 3389 to 3389, changed Protocols to TCP, and set the Filtering mode to "Multiple host" with single affinity. About that last setting we kept the affinity to single so that if a user got disconnected from their Terminal session, they could re-connect to the same session on the same host node the same setting is useful for websites that need to maintain persistent connections. If thats not important, then you can change affinity to "None". We clicked ok and next when done.

Later we added the hosts that would be a part of the cluster. We typed the name of the first guest machine (VMGUEST01) and clicked connect. There should be 2 NIC's listed (in our example they would be "1 Actual Network" and "2 LBN". If there are not 2 NICs, it meant that to Windows, one of the NIC's has the same GUID as the other and would screw up the NLB configuration. The NIC would need to be uninstalled on the guest machine, the guest machine shut down, the virtual Ethernet adapter removed from the VMWare settings, and re-added in VMWare before proceeding. When re-stared, the TCP/IP settings could be re-added and then we could see 2 NICs). From the 2 NICS, we selected/highlighted the Load Balance NIC ("2 LBN" in our example) and clicked next.

In The "Host Parameters" dialog, we assigned a unique "Priority" number (this would be different for each guest). Also, we made sure it was pulling in the correct IP information that was already assigned to the "2 LBN" NIC on the VMGUEST01. All the rest was default and click finish. The app went through the motions and added the 1st node to the cluster. When it was green in the manager, the next nodes were added simply by right-clicking on the new cluster name and choosing "Add host to cluster" and going through some of the same motions for each successive node.

Afterwards, node priority tweaking could be done on the port parameters. We left ours at default (Equal) since they were all the same hardware essentially. But if the port parameters on an individual host were opened, an adjustment could be made to the "Load Weight" lower numbers mean the system would more likely be connected to over higher numbered nodes.

Hope this helps sorry for being long winded and changing grammatical tense and person from time to time but I can be long winded. I just like detail in what I am doing so I wanted to give the same.

Brian Young

Sr. Server Analyst

[email protected]