Author
hoangnhu
View
234
Download
8
Embed Size (px)
Deploying FlexPodinfrastructures, best
practices from the fieldRamses Smeyers Technical Leader Services
BRKVIR-2260
A collection of hints and tips gathered while deploying, managing and troubleshooting FlexPOD deployments for over 5 years
Customer deployments
TAC troubleshooting cases
Customer / Partner sessions
What is this session about ?
What is a FlexPOD ?
Deployment gotchas
Performance analysis
Upgrade the stack
UCS-Director automation
Agenda
What is a FlexPOD ?
What is a FlexPOD ?
FlexPod is a integrated computing, networking, and storage solution developed by Cisco and NetApp. Its configurations and workloads are published as Cisco Validated Designs. FlexPod is categorized by established and emerging client needs
Validated technologies from industry leaders in computing, storage, networking, and server virtualization
A single platform built from unified computing, fabric, and storage technologies, with popular and trusted software virtualization
Integrated components that help enable you to centrally manage all your infrastructure pools
An open design management framework that integrates with your existing third-party infrastructure management solutions
http://www.cisco.com/en/US/netsol/ns741/networking_solutions_program_home.html
CVD
Architecture - Nexus
Architecture - ACI
Deployment gotchas
Deployment gotchas
MTU 9000
ALUA
FCoE
Management / production network overlap
Hypervisor load-balancing
ACI Dynamic discovery
MTU 9000
Problem ? End-2-End MTU 9000 is not working
MTU needs to be defined @
VMware
Nexus 1000V (if NFS traffic is handled by Nexus 1000V)
UCS
Nexus 5000
NetApp
Use the correct ping on VMware
~ # vmkping -I vmk1 -s 9900 42.33.80.6
PING 42.33.80.6 (42.33.80.6): 9900 data bytes
9908 bytes from 42.33.80.6: icmp_seq=0 ttl=255 time=0.355 ms
~ # vmkping -I vmk1 -s 9900 42.33.80.6 -d
PING 42.33.80.6 (42.33.80.6): 9900 data bytes
sendto() failed (Message too long)
MTU 9000
With d >8972 will
not work (-d do
not fragment)
VMware
Nexus 1000V
Global bdsol-vc01vsm-01# show running-config | inc jumbo
bdsol-vc01vsm-01# show running-config al | inc jumbo
system jumbomtu 9000
Port-profile
bdsol-vc01vsm-01(config)# port-profile type ethernet Uplink
bdsol-vc01vsm-01(config-port-prof)# system mtu 9000
Check bdsol-vc01vsm-01# show interface ethernet 3/3
Ethernet3/3 is up
Hardware: Ethernet, address: 0050.5652.0a1f (bia 0050.5652.0a1f)
Port-Profile is Uplink
MTU 1500 bytes
Encapsulation ARPA
Define MTU on
QOS System Class
vNIC template
QoS Policies
UCS
Nexus 5000 Nexus 7000
policy-map type network-qos jumbo
class type network-qos class-fcoe
pause no-drop
mtu 2158
class type network-qos class-default
mtu 9216
multicast-optimize
system qos
service-policy type network-qos jumbo
system jumbomtu 9216
interface Ethernet1/23
mtu 9216
NetApp
bdsol-3240-01-B> rdfile /etc/rc
hostname bdsol-3240-01-B
ifgrp create lacp dvif -b ip e1a e1b
vlan create dvif 3380
ifconfig e0M `hostname`-e0M flowcontrol full netmask 255.255.255.128 mtusize 1500
ifconfig e0M inet6 `hostname`-e0M prefixlen 64
ifconfig dvif-3380 `hostname`-dvif-3380 netmask 255.255.255.0 partner dvif-3380 mtusize 9000 trusted wins up
route add default 10.48.43.100 1
routed on
options dns.enable on
options nis.enable off
savecore
bdsol-3240-01-B> ifconfig dvif-3380
dvif-3380: flags=0x2b4e863 mtu 9000 dad_attempts 2
inet 42.33.80.2 netmask 0xffffff00 broadcast 42.33.80.255
inet6 fe80::a0:98ff:fe36:1838 prefixlen 64 scopeid 0xd autoconf
partner dvif-3380 (not in use)
ether 02:a0:98:36:18:38 (Enabled interface groups)
Traffic flow (NFS is on Gold)
VMware UCS (Gold CoS 4) 5K NetApp
NetApp 5K UCS (Best Effort) vNIC Drop (no CoS on traffic)
Solution
Remark traffic on Nexus 5000 NetApp interfaces to CoS 4
ACL to match NFS traffic remark to CoS 4
Watch out for return traffic
ALUA
ALUA ?
Verify ALUA on VMware
~ # esxcli storage nmp device list
naa.60a98000443175414c2b4376422d594b
Device Display Name: NETAPP Fibre Channel Disk
(naa.60a98000443175414c2b4376422d594b)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on;explicit_support=off;
explicit_allow=on;alua_followover=on;{TPG_id=2,TPG_state=AO}{TPG_id=3,TP
G_state=ANO}}
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config:
{policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=3:
NumIOsPending=0,numBytesPending=0}
Path Selection Policy Device Custom Config:
Working Paths: vmhba2:C0:T0:L0, vmhba1:C0:T0:L0
Is Local SAS Device: false
Is Boot USB Device: false
Verify ALUA on NetApp
bdsol-3220-01-A> igroup show -v
bdsol-esxi-23 (FCP):
OS Type: vmware
Member: 20:00:00:25:b5:52:0a:0e (logged in on: 0d, vtic)
Member: 20:00:00:25:b5:52:0b:0e (logged in on: 0c, vtic)
UUID: 6c2a1c9a-f539-11e2-8cbf-123478563412
ALUA: Yes
Report SCSI Name in Inquiry Descriptor: Yes
igroup show -v
igroup set alua yes
igroup show -v
FCoE
Nexus 5000interface vfc103
bind interface port-channel103
switchport trunk allowed vsan 70
no shutdown
interface port-channel103
description bdsol-6248-03-A
switchport mode trunk
switchport trunk allowed vlan 970
interface Ethernet1/3
description bdsol-6248-03-A
switchport mode trunk
switchport trunk allowed vlan 970
spanning-tree port type edge trunk
spanning-tree bpdufilter enable
channel-group 103 mode active
Default policy (show running-config all)policy-map type network-qos fcoe-default-nq-policy
class type network-qos class-fcoe
pause no-drop
mtu 2158
class type network-qos class-default
multicast-optimize
system qos
service-policy type fcoe-default-nq-policy
cisco.com documented MTU 9000 Policypolicy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multicast-optimize
system qosservice-policy type network-qos jumbo
QOS policy
No FCoE
defined
UCS
Management / production network
overlap
Problem customer is unable to reach the UCS management interface (+ NetApp and Nexus 5000)
Situation:
Topology: Netapp Nexus 5000 UCS
Mgmt topology 3750 connected to NetApp mgmt / Nexus 5000 mgmt0 / UCS management + connected to Nexus 5000 via port-channel / VPC
Management VLAN is also used for Server / VM traffic
Cause:
VM/Server caused ethernet/VLAN overload on 3750 due to broadcast flooding
Management / production network overlap
Hypervisor load-balancing
Hypervisor load-balancing UCS-B
Each Fabric Interconnect has a port-channel towards the Nexus 5000 vPC pair
Fabric interconnects are connected for clustering no data traffic is on the link
The hypervisor running on a blade has 2 independent connections no switch dependent protocols can be used
Using IP-hash algorithms will cause MAC flaps on the UCS FIs and N5Ks
ACI - Port-group traffic distribution settingsBy default, port-groups instantiated on the DVS default to route based on originating virtual
port-id which is. This works well with all types of servers.
If you attach a LACP policy to your vSwitch Policies, you will set the traffic distribution to IP
hash! Be careful with UCS-B series as this is not supported (no vPC between the FIs).
See also Setting up an Access Policy with Override Policy for a Blade Server Using
the GUI http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/getting-
started/b_APIC_Getting_Started_Guide/b_APIC_Getting_Started_Guide_chapter_010.html
#d19263e464a1635
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/1-x/getting-started/b_APIC_Getting_Started_Guide/b_APIC_Getting_Started_Guide_chapter_010.html#d19263e464a1635
ACI Dynamic discovery
Prerequisites & Supported Config
For Dynamic EPG download, we support only one L2 Hop between Host/ESX and iLeaf
Management Address *Must* to be configured on L2/BladeSwitch.
CDP or LLDP must be configured to advertised Mgmt-TLV on BladeSwitch
UCS-FI need to be on version 2.2(1c) or later
If there is need for multiple L2 hops, EPG with static Binding (using node+port selectors) should be used.
ESX Discovery in FabricFor Dynamic download of policies to Leaf
APICvCenter
1
2
3 1. Leaf send LLDP* to ESX(includes Leaf port name)
2. ESX send parsed LLDP information to vCenter
3. APIC receives LLDP information from vCenter
4. APIC download Policy for VMs behind ESX to the Leaf node when using immediate
4
*Can use CDP instead of LLDP
LLDP enabled VIC consumes LLDP
C-Series disabl