Upload
tylorkytasaari
View
518
Download
14
Embed Size (px)
Citation preview
7/26/2019 Nexus Troubleshooting
1/127
BRKCRS-3145
Troubleshooting theCisco Nexus 5000 / 2000
Series Switches
7/26/2019 Nexus Troubleshooting
2/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 2
Objectives
Be able to quickly isolate problematic nodes in thedatacenter
Become familiar with troubleshooting in NX-OS
Understand Nexus 5000 and Nexus 2000 platformdetails
Gain comfort using Nexus 5000 and Nexus 2000day to day
7/26/2019 Nexus Troubleshooting
3/127
7/26/2019 Nexus Troubleshooting
4/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 4
4
Problem Isolation
A problem well stated is a problem half solved
Source: Charles F. Kettering, Engineer and Inventor
7/26/2019 Nexus Troubleshooting
5/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 5
Troubleshooting Tool #1
A current, accurate diagram
Physical ports
Logical ports
Spanning-tree root andblocked ports
Helpful to use standardformats
.jpg, .bmp, .pdf
If you cannot describe how your network should beoperating, time may be wasted
N7k-1 N7k-2
N5k-1 N5k-2 N5k-3 N5k-4
vPC
po1
vPC
Po2
vPC peer-keep
e1/1 - e1/1
vPC peer-link
e1/2, 2/2
Po100
Domain 100
RSTP Root
N5k-5
e1/10 - e1/10
e1/12 - e1/12
STP BLK
vPC peer-link
e1/1, 1/2
Po101
Domain 101
vPC peer-link
e1/1, 1/2
Po102
Domain 102
e1/30 e1/31
e3/1 e4/1
e1/30 e1/31e1/30 e1/31e1/30 e1/31
e3/1 e4/1
e3/2 e4/2e3/2 e4/2
7/26/2019 Nexus Troubleshooting
6/127
7/26/2019 Nexus Troubleshooting
7/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 7
Which show tech?As of 5.0(3), there are 68
N5k-1# show tech-support ?aaa Display aaa information
aclmgr ACL commands
adjmgr Display Adjmgr information
arp Display ARP information
ascii-cfg Show ascii-cfg information for technical support personnel
assoc_mgr Gather detailed information for assoc_mgr troubleshooting
bcm-usd Gather detailed information for BCM USD troubleshooting
bootvar Gather detailed information for bootvar troubleshooting
brief Display the switch summarybtcm Gather detailed information for BTCM component
callhome Callhome troubleshooting information
cdp Gather information for CDP trouble shooting
...
session-mgr Gather information for troubleshooting session manager
snmp Gather info related to snmp
sockets Display sockets status and configuration
spm Service Policy Manager
stp Gather detailed information for STP troubleshootingsysmgr Gather detailed information for sysmgr troubleshooting
time-optimized Gather tech-support faster, requires more memory & disk space
track Show track tech-support information
vdc Gather detailed information for VDC troubleshooting
vpc Gather detailed information for VPC troubleshooting
vtp Gather detailed information for vtp troubleshooting
xml Gather information for xml trouble shooting
7/26/2019 Nexus Troubleshooting
8/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 8
Log your outputRedirect and Append
N5k-1# show clock >bootflash:debug-file.txtN5k-1# show mac address-table >>bootflash:debug-file.txtN5k-1# show running-config | count >>bootflash:debug-file.txt
N5k-1# show file bootflash:debug-file.txtMon Apr 4 02:39:41 UTC 2011
7/26/2019 Nexus Troubleshooting
9/127
7/26/2019 Nexus Troubleshooting
10/127
7/26/2019 Nexus Troubleshooting
11/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 11
When to call TAC
A description of the problem observed, withevidence / clues, along with time and scope
A current network diagram
All parties involved in the problem
show tech is not necessary, but if you must makedrastic changes such as reloading or replacinghardware, grab this first
Any targeted outputs, especially around the time ofthe event in question
You think you have found a bug, but a quick searchof defects or release notes on cisco.com may be
faster
Most efficient if you have the following:
7/26/2019 Nexus Troubleshooting
12/127
7/26/2019 Nexus Troubleshooting
13/127
7/26/2019 Nexus Troubleshooting
14/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 14
CLI list and grep
ctrl-c terminates output
NX-OSOperation Tips
N5k-3# show cli list | grep switchportshow system default switchport san
show interface switchport
show interface switchport
N5k-3# show tech-support
---- show tech-support ----
ctrl-cN5k-3#
7/26/2019 Nexus Troubleshooting
15/127
7/26/2019 Nexus Troubleshooting
16/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 16
volatile: filesystem is virtual, use as scratch if needed
Obviously volatile, will not survive a reload log: filesystem is in root /
NX-OSFile Structure
N5k-1# debug logfile CiscoLive_debugsN5k-1# show debugOutput forwarded to file CiscoLive_debugs (size: 4194304 bytes)
Debug level is set to Minor(1)
N5k-1# dir log:
0 Apr 04 01:14:01 2011 CiscoLive_debugs31 Mar 11 11:38:35 2011 dmesg
0 Mar 11 11:38:57 2011 libfipf.4365
79101 Apr 04 00:34:02 2011 messages
6670 Apr 04 00:06:01 2011 startupdebug
N5k-1# copy log:CiscoLive_debugs tftp:Enter vrf: management
Enter hostname for the tftp server: 10.91.42.134Trying to connect to tftp server......
Connection to Server Established.
|
TFTP put operation was successful
N5k-1# clear debug-logfile CiscoLive_debugs-OR-
N5k-1# undebug all
7/26/2019 Nexus Troubleshooting
17/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 17
Troubleshooting Nexus 5000 / 2000
Problem Isolation
Platform Overview
NX-OS Operation
FSM
MTS
Crashes
Nexus 5000
Nexus 2000
Platform Overview and troubleshooting Redundancy operation and troubleshooting
7/26/2019 Nexus Troubleshooting
18/127
7/26/2019 Nexus Troubleshooting
19/127
7/26/2019 Nexus Troubleshooting
20/127
7/26/2019 Nexus Troubleshooting
21/127
7/26/2019 Nexus Troubleshooting
22/127
7/26/2019 Nexus Troubleshooting
23/127
7/26/2019 Nexus Troubleshooting
24/127
7/26/2019 Nexus Troubleshooting
25/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 25
NX-OSMTS
recv queue should not grow old
SAP 0 is an invalid identifier and causes 300messages to queue, and growing.
Observed impact is various show commands timing
out such as show log and show run
N5k-1# show system internal mts buffers details
Node/Sap/queue Age(ms) SrcNode SrcSAP DstNode DstSAP OPC MsgId MsgSize
sup/32/recv 319672424 0x101 25330 0x101 0 7662 1221952768 192sup/32/recv 319669986 0x101 25336 0x101 32 188 1221953842 328
sup/32/recv 319609082 0x101 25344 0x101 0 7663 1221971222 2452...
sup/32/recv 227324 0x101 32550 0x101 32 188 1301415915 328
sup/32/recv 165509 0x101 32560 0x101 0 7663 1301432732 2452
sup/32/recv 101893 0x101 32565 0x101 0 7662 1301448663 192
7/26/2019 Nexus Troubleshooting
26/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 26
NX-OSMTS
MTS messages have been addressed to SAP 0 dueto a bug.
Reload was needed to clear this scenario
N5k-1# sh system internal mts sup sap 0 description
Not implementedN5k-1# sh system internal mts sup sap 32 descriptionSyslog Sup Node Cfg
N5k-1# show system internal sysmgr service name syslogd
Service "syslogd" ("syslogd", 75):
UUID = 0x21, PID = 3924, SAP = 32
State: SRV_STATE_HANDSHAKED (entered at time Sat May 15 05:01:202010). Restart count: 1
Time of last restart: Sat May 15 05:01:20 2010. The service never
crashed since the last reboot.
Tag = N/A
Plugin ID: 0
7/26/2019 Nexus Troubleshooting
27/127
7/26/2019 Nexus Troubleshooting
28/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 28
NX-OS attempts to create a core file with information helpful to aid in findingand fixing the problem
stack trace
memory contents
Some processes in NX-OS are able to be restarted in a stateful manner.
Nexus 5000 is a single-supervisor platform; critical processes require asystem restart upon a crash.
NX-OSCrashes
2010 Sep 10 16:19:27.411 N5k-1 %$ VDC-1 %$ %SYSMGR-2-
SERVICE_CRASHED: Service "fwm" (PID 2723) hasn't caught signal
6 (core will be saved).
A syslog message is sent just before crash and system restart
7/26/2019 Nexus Troubleshooting
29/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 29
show process log
View status of all processes, including if a core was created
N5k-1# show process log
Process PID Normal-exit Stack Core Log-create-time
--------------- ------ ----------- ----- ----- ---------------
eth_port_channel 2743 N Y N Wed Mar 17 17:20:57 2010
eth_port_channel 2761 N Y N Tue Aug 3 19:14:58 2010
fwm 2703 N Y N Fri Oct 8 19:24:12 2010...
N5k-1# show process log pid 2703======================================================
Service: fwm
Description: Forwarding manager Daemon
Started at Thu Oct 7 14:51:51 2010 (151707 us)
Stopped at Fri Oct 8 19:24:12 2010 (203577 us)
Uptime: 1 days 4 hours 32 minutes 21 seconds
Start type: SRV_OPTION_RESTART_STATELESS (23)
Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2)
...
NX-OSCrashes
7/26/2019 Nexus Troubleshooting
30/127
7/26/2019 Nexus Troubleshooting
31/127
7/26/2019 Nexus Troubleshooting
32/127
7/26/2019 Nexus Troubleshooting
33/127
7/26/2019 Nexus Troubleshooting
34/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 34
To talk about forwarding errors and troubleshooting, drops are usually part ofthis discussion
We have to know a basic hardware layout in order to know where to look for
problems
The following hardware overview is a preview of
BRKARC-3452 Cisco Nexus 5000/5500 and 2000 Switch Architecture
Hardware overview
7/26/2019 Nexus Troubleshooting
35/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 35
Nexus 5000 is a distributed
forwarding architecture Unified Port Controller (UPC)
ASIC interconnected by asingle stage Unified CrossbarFabric (UCF)
Unified Port Controllers provide
distributed packet forwardingcapabilities
A l l port to port traffic passesthrough the UCF (Fabric)
Four switch ports managed byeach UPC
14 UPC in Nexus 5020
7 UPC in Nexus 5010
Unified Crossbar
Fabric
Unified Port
Controller
SFP SFP SFP SFP SFP SFP SFP SFP
SFP SFP
Unified Port
Controller
SFP SFP SFP SFP
Unified Port
Controller
Unified Port
Controller
SFP SFP SFP SFP
Unified Port
Controller
. . .
Nexus 5000 Hardware OverviewData Plane Elements
7/26/2019 Nexus Troubleshooting
36/127
7/26/2019 Nexus Troubleshooting
37/127
7/26/2019 Nexus Troubleshooting
38/127
7/26/2019 Nexus Troubleshooting
39/127
7/26/2019 Nexus Troubleshooting
40/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 40
Nexus 5000/5500 Hardware OverviewControl Plane Elements
CPU
South
Bridge
NIC
Unified Port
Controller
In-band traffic is identified by the UPCand punted to the CPU via twodedicated UPC interfaces, 5/0 and 5/1,which are in turn connected to eth3and eth4 interfaces in the CPUcomplex
Eth3 handles Rx and Tx of low prioritycontrol pkts
IGMP, CDP, TCP/UDP/IP/ARP (formanagement purpose only)
Eth4 handles Rx and Tx of high
priority control pkts STP, LACP, DCBX, FC and FCoE
control frames (FC packets come toSwitch CPU as FCoE packets)
There is a built-in control-plane policer tolimit the amount of traffic punted to CPU
eth3 eth4
NIC
mgmt0
7/26/2019 Nexus Troubleshooting
41/127
7/26/2019 Nexus Troubleshooting
42/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 42
Nexus 5000 Hardware OverviewControl Plane Elements
CPU
Intel LV Xeon
1.66 GHz
South
Bridge
NIC
Unified PortController
Monitoring of in-band traffic via NX-OS
built-in ethanalyzer (sniffer) Eth3 is equivalent to inbound-lo
Eth4 is equivalent to inbound-hi
eth3 eth4
N5k-2# ethanalyzer local sniff-interface ?inbound-hi Inbound(high priority) interfaceinbound-low Inbound(low priority) interface
mgmt Management interface
N5k-2# sh hardware internal cpu-mac inband counterseth3 Link encap:Ethernet HWaddr 00:0D:EC:B2:0C:83
UP BROADCAST RUNNING PROMISC ALLMULTI MULTICAST MTU:2200 Metric:1RX packets:3 errors:0 dropped:0 overruns:0 frame:0TX packets:630 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000RX bytes:252 (252.0 b) TX bytes:213773 (208.7 KiB)Base address:0x6020 Memory:fa4a0000-fa4c0000
eth4 Link encap:Ethernet HWaddr 00:0D:EC:B2:0C:84UP BROADCAST RUNNING PROMISC ALLMULTI MULTICAST MTU:2200 Metric:1RX packets:85379 errors:0 dropped:0 overruns:0 frame:0TX packets:92039 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:33960760 (32.3 MiB) TX bytes:25825826 (24.6 MiB)Base address:0x6000 Memory:fa440000-fa460000
CLI view of in-band control plane data
7/26/2019 Nexus Troubleshooting
43/127
7/26/2019 Nexus Troubleshooting
44/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 44
Nexus 5000 Forwardingcut-through vs. store and forward
Store and forward switching is still utilized when the ingress
data rate is slower than the egress data rate.
Cut-through switching is utilized to achieve low latency throughthe switch fabric.
Bits are serialized in from the ingress port until enough ofthe packet header has been received to perform a
forwarding and policy lookup Once a lookup decision has been made and the fabric has
granted access to the egress port bits are forwardedthrough the fabric
Egress port performs any header rewrite (e.g. CoS marking)and MAC begins serialization of bits out the egress port
A drop cannot happen on ingress due to any switching logic oreven a CRC error. Only faulty hardware or connections cancause a drop on ingress.
Discards can occur on ingress due to queuing configurationand traffic patterns.
7/26/2019 Nexus Troubleshooting
45/127
7/26/2019 Nexus Troubleshooting
46/127
7/26/2019 Nexus Troubleshooting
47/127
7/26/2019 Nexus Troubleshooting
48/127
7/26/2019 Nexus Troubleshooting
49/127
7/26/2019 Nexus Troubleshooting
50/127
7/26/2019 Nexus Troubleshooting
51/127
7/26/2019 Nexus Troubleshooting
52/127
7/26/2019 Nexus Troubleshooting
53/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 53
Finding the source of CRC errors CRC errors are introduced in 3 ways:
Bad physical connection
copper, fiber, transceiver, phy
stomping due to intentionally originated errors
Received bad CRC stomped from neighboring cut-throughswitch.
Start by finding any RX CRC counters.
If none, then this switch is responsible for originating
Use interrupt counters to find the reason and port, if intentional
Log in to next switch upstream of CRC counters, check forRX CRC there.
Use the above logic to determine if this switch is originatingany errors.
Finally, inspect optics/pluggables, fiber/cables andtroubleshoot as a Layer 1 issue. Change cable and port tofind where the problem follows.
7/26/2019 Nexus Troubleshooting
54/127
7/26/2019 Nexus Troubleshooting
55/127
7/26/2019 Nexus Troubleshooting
56/127
7/26/2019 Nexus Troubleshooting
57/127
7/26/2019 Nexus Troubleshooting
58/127
7/26/2019 Nexus Troubleshooting
59/127
7/26/2019 Nexus Troubleshooting
60/127
7/26/2019 Nexus Troubleshooting
61/127
7/26/2019 Nexus Troubleshooting
62/127
7/26/2019 Nexus Troubleshooting
63/127
7/26/2019 Nexus Troubleshooting
64/127
7/26/2019 Nexus Troubleshooting
65/127
Finding the source of CRC errors
7/26/2019 Nexus Troubleshooting
66/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 66
Finding the source of CRC errorsScenario #1: Physical Issue
N7k-1
N5k-2N5k-1
e1/11 e1/12
e1/7 e1/7
e1/1e1/4
VLAN 7
VLAN 8
e1/5
N5k-2# show hardware internal gatos asic 0 counters interruptGatos 0 interrupt statistics:Interrupt name |Count |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw2_INT_eg_pkt_err_cb_bm_eof_err |1 |0 |0 |0gat_fw2_INT_eg_pkt_err_eth_crc_stomp |1 |0 |0 |0gat_fw2_INT_eg_pkt_err_e802_3_len_err |1 |0 |0 |0
e1/5
e1/3
Front Panel Internal
e1/1 7:2
e1/5 7:1
e1/3 0:2
Interrupt counters incrementupon transmit of errored frame
7/26/2019 Nexus Troubleshooting
67/127
7/26/2019 Nexus Troubleshooting
68/127
7/26/2019 Nexus Troubleshooting
69/127
Finding the source of CRC errors
7/26/2019 Nexus Troubleshooting
70/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 70
Finding the source of CRC errorsObservations, scenario #2
N7k-1
N5k-2N5k-1
e1/11 e1/12
e1/7 e1/7
e1/1e1/4
VLAN 7
VLAN 8
e1/5e1/5
e1/3
N7k-1# show interface e1/11
RX4 unicast packets 0 multicast packets 0 broadcast packets
4 input packets 5672 bytes
0 jumbo packets 0 storm suppression packets
0 runts 0 giants 1 CRC 0 no buffer1 input error 0 short frame 0 overrun 0 underrun 0
ignored
7/26/2019 Nexus Troubleshooting
71/127
Finding the source of CRC errors
7/26/2019 Nexus Troubleshooting
72/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 72
Finding the source of CRC errorsScenario #2: MTU Exceeded
N7k-1
N5k-2N5k-1
e1/11 e1/12
e1/7 e1/7
e1/1e1/4
VLAN 7
VLAN 8
e1/5e1/5
e1/3
Front Panel Internal
e1/1 7:2
4000B frame
transmittedN5k-1# show hardware internal gatos port e1/1 counters
rx
RX_PKT_SIZE_IS_1519_TO_2047 | 0
RX_PKT_SIZE_IS_2048_TO_4095 | 1RX_PKT_SIZE_IS_4095_TO_8191 | 0
RX_PKT_SIZE_IS_8192_TO_9216 | 0
RX_PKT_SIZE_GT_9216 | 0
Hardware counters keep trackof size ranges.
7/26/2019 Nexus Troubleshooting
73/127
7/26/2019 Nexus Troubleshooting
74/127
Finding the source of CRC errors
7/26/2019 Nexus Troubleshooting
75/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 75
Finding the source of CRC errorsScenario#2: MTU Exceeded
N7k-1
N5k-2N5k-1
e1/11 e1/12
e1/7 e1/7
e1/1e1/4
VLAN 7
VLAN 8
e1/5e1/5
e1/3
N5k-1# show hardware internal gatos asic 0 counters interruptGatos 0 interrupt statistics:Interrupt name |Count |ThresRch|ThresCnt|Ivls
-----------------------------------------------+--------+--------+--------+----
gat_fw1_INT_eg_pkt_err_cb_bm_eof_err |1 |0 |1 |0gat_fw1_INT_eg_pkt_err_eth_crc_stomp |1 |0 |1 |0gat_fw1_INT_eg_pkt_err_ip_pyld_len_err |1 |0 |1 |0gat_mm1_INT_rlp_tx_pkt_crc_err |1 |0 |1 |0
Front Panel Internal
e1/1 7:2
e1/7 0:1
Leaving the egress interface,
the CRC has been stomped and
other interrupts have fired.
Note the egress interface will
aggregate all frames from various
source interfaces. Adding up
counters can be tricky.
7/26/2019 Nexus Troubleshooting
76/127
7/26/2019 Nexus Troubleshooting
77/127
7/26/2019 Nexus Troubleshooting
78/127
7/26/2019 Nexus Troubleshooting
79/127
7/26/2019 Nexus Troubleshooting
80/127
7/26/2019 Nexus Troubleshooting
81/127
7/26/2019 Nexus Troubleshooting
82/127
7/26/2019 Nexus Troubleshooting
83/127
NX-OS
7/26/2019 Nexus Troubleshooting
84/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 84
N5k-1# show system resources
Load average: 1 minute: 0.95 5 minutes: 1.54 15 minutes: 1.46
Processes : 281 total, 4 running
CPU states : 26.7% user, 26.7% kernel, 46.5% idle
Memory usage: 2073408K total, 1412172K used, 661236K free
N5k-1# show process cpu sort | exclude 0.0
PID Runtime(ms) Invoked uSecs 1Sec Process
----- ----------- -------- ----- ------ -----------
4230 398 5011881 0 22.0% snmpd
4204 1467 84869127 0 20.2% gatosusd
4226 433 5601856 0 5.5% statsclient
4264 1380 391510 3 3.7% ethpm
4302 254 103 2468 1.8% netstack
Ethanalyzer and CPU
Using to aid in identifying external causes of high CPU utilization
7/26/2019 Nexus Troubleshooting
85/127
NX-OS
7/26/2019 Nexus Troubleshooting
86/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 86
N5k-1# show process cpu history
1 1
754669098990899966777977656766876775178734455655456466545645
006186077990796258300801881187120477641015900150830621684070
100 ### ### ## #
90 ########### #
80 ########### # # # #
70 # ##################### ##### ## ###
60 # ################################# ### ## # ### #
50 #################################### ### ###################
40 #################################### ### ###################
30 #################################### #######################
20 ############################################################
10 ############################################################
0....5....1....1....2....2....3....3....4....4....5....5....
0 5 0 5 0 5 0 5 0 5
CPU% per second (last 60 seconds)
# = average CPU%
Ethanalyzer and CPU Observed spike in CPU (per second)
7/26/2019 Nexus Troubleshooting
87/127
7/26/2019 Nexus Troubleshooting
88/127
7/26/2019 Nexus Troubleshooting
89/127
7/26/2019 Nexus Troubleshooting
90/127
N 5000/5500 Q i
7/26/2019 Nexus Troubleshooting
91/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 91
Nexus 5000/5500 Queuing
Nexus 5000/5500 utilize ingress queuing Ingress queuing is helpful for data flows where many ports
talk to few, the load is spread across the sources
Simple flowcontrol mechanism can be implemented
end-to-end flowcontrol is necessary for FCoE
Ingress queuing is implemented by Virtual Output Queuing(VOQ)
VOQ prevents head of line blocking
One egress interface can be congested, but ingressbuff still accepts frame into other queues
8 class-based unicast VOQ per egress interface on everyingress interface
8 class-based multicast VOQ per ingress interface
N 5000/5500 Q i
7/26/2019 Nexus Troubleshooting
92/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 92
Nexus 5000/5500 Queuing
Ingress queuing implication on troubleshooting:
Drops occur at INGRESS!
You must think about where the flow originates on the switch todetermine where you would like to look for drops.
7/26/2019 Nexus Troubleshooting
93/127
7/26/2019 Nexus Troubleshooting
94/127
7/26/2019 Nexus Troubleshooting
95/127
7/26/2019 Nexus Troubleshooting
96/127
Nexus 5000/5500 QueuingScenario
7/26/2019 Nexus Troubleshooting
97/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 97
Scenario
N5k-1
e1/1
e1/5e1/5
e1/3
N5k-2
Server A Server B
Trunk
N5k-1# show platform fwm info asic-errors 7
Printing non zero Gatos error registers:
N5k-1# show hardware internal gatos asic 7 counters interrupt
Gatos 7 interrupt statistics:
Interrupt name |Count |ThresRch|ThresCnt|Ivls
Front Panel Internal
e1/1 7:2
e1/5 7:1
These outputs are also clean
Move on to the egress interface e1/5
In this case, e1/5 is on the same ASIC, so we have alreadygathered the output needed
Nexus 5000/5500 QueuingScenario
7/26/2019 Nexus Troubleshooting
98/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 98
Scenario
N5k-1
e1/1
e1/5e1/5
e1/3
N5k-2
Server A Server B
Trunk
N5k-1# show platform fwm info pif e1/5 | grep stats
Eth1/5 pd: tx stats: bytes 476497477 frames 0 discard 0 drop 0
Eth1/5 pd: rx stats: bytes 232322392 frames 0 discard 0 drop 0
Eth1/5 pd fcoe: tx stats: bytes 0 frames 0 discard 0 drop 0
Eth1/5 pd fcoe: rx stats: bytes 0 frames 0 discard 0 drop 0
Front Panel Internal
e1/1 7:2
e1/5 7:1
These outputs are clean
7/26/2019 Nexus Troubleshooting
99/127
7/26/2019 Nexus Troubleshooting
100/127
7/26/2019 Nexus Troubleshooting
101/127
Nexus 5000/5500 QueuingScenario
7/26/2019 Nexus Troubleshooting
102/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 102
Scenario
N5k-1
e1/1
e1/5e1/5
e1/3
N5k-2
Server A Server B
Trunk
N5k-1# show hardware internal gatos asic 7 counters interrupt
...
gat_lu_lkup1_INT_func_lo_drop_src_vlan_mbr|74 |...
Interrupt counters will agree that a given error has fired from thehardware
number is hex and
we do not record every interrupt due to the rate at whichinterrupts can hit CPU. Generally this number will be somewhatless than the show platform fwm info pif number
Front Panel Internal
e1/1 7:2
e1/5 7:1
7/26/2019 Nexus Troubleshooting
103/127
7/26/2019 Nexus Troubleshooting
104/127
7/26/2019 Nexus Troubleshooting
105/127
Spanning-tree
7/26/2019 Nexus Troubleshooting
106/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 106
Spanning-tree
N5k-1# show spanning-tree internal event-history all
-------------------- All the active STPs -----------
VDC01 VLAN0001
0) Transition at 848207 usecs after Thu Jan 13 05:05:54 2005
Root: 0000.0000.0000.0000 Cost: 0Age: 0 Root Port: none Port: none [STP_TREE_EV_UP]
1) Transition at 367168 usecs after Thu Jan 13 05:05:57 2005
Root: 8001.000d.ecd6.02fc Cost: 0 Age: 0 Root Port: none Port: Ethernet1/15[STP_TREE_EV_UPDATE_TOPO_RCVD_SUP_BPDU]
2) Transition at 373395 usecs after Thu Jan 13 05:05:57 2005
Root: 2063.00d0.0362.4c00 Cost: 2 Age: 1 Root Port: Ethernet1/15 Port: none[STP_TREE_EV_MULTI_FLUSH_LOCAL]
3) Transition at 434563 usecs after Thu Jan 13 05:06:00 2005
Root: 2063.00d0.0362.4c00 Cost: 2 Age: 1 Root Port: Ethernet1/15 Port: Ethernet1/15[STP_TREE_EV_MULTI_FLUSH_RCVD]
Checking all trees
7/26/2019 Nexus Troubleshooting
107/127
Troubleshooting Nexus 5000 / 2000
7/26/2019 Nexus Troubleshooting
108/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 108
Troubleshooting Nexus 5000 / 2000
Problem Isolation Platform Overview and troubleshooting
NX-OS Operation
Crashes
Nexus 5000Nexus 2000
Management
Queuing and forwarding
Logs
7/26/2019 Nexus Troubleshooting
109/127
7/26/2019 Nexus Troubleshooting
110/127
7/26/2019 Nexus Troubleshooting
111/127
Troubleshooting Nexus 5000 / 2000
7/26/2019 Nexus Troubleshooting
112/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 112
Troubleshooting Nexus 5000 / 2000
Problem Isolation Platform Overview and troubleshooting
NX-OS Operation
Crashes
Nexus 5000Nexus 2000
Management
Queuing and forwarding
Logs
FEX Drops
7/26/2019 Nexus Troubleshooting
113/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 113
FEX Drops
Network interface drops can be seen from N5kshow queuing interface as of 5.0(3)N1(1)
Best to attach to FEX to get detailed logs
Similar to Cat 6k or Nexus 7k linecard commands
Important to check here as FEX also have crashlogs, have their own CPU, and are responsible forcommunicating link state and offloading someprotocols like CDP.
N5k-1# attach fex 100
Attaching to FEX 100 ...
To exit type 'exit', to abort type '$.'
fex-100#
FEX Drops
7/26/2019 Nexus Troubleshooting
114/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 114
FEX Drops
Network interface drops can be seen from N5kshow queuing interface as of 5.0(3)N1(1)
Best to attach to FEX to get detailed logs
Similar to Cat 6k or Nexus 7k linecard commands
Important to check here as FEX also have crashlogs, have their own CPU, and are responsible forcommunicating link state and offloading someprotocols like CDP.
N5k-1# attach fex 100
Attaching to FEX 100 ...
To exit type 'exit', to abort type '$.'
fex-100#
7/26/2019 Nexus Troubleshooting
115/127
FEX Drops
7/26/2019 Nexus Troubleshooting
116/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 116
2148fex-100# dbgexec rw
rw> show ints
ASIC: 0:+-------+--------------------------+--------------+-----------+-----------+-----------+
| ASIC | Interrupt Bit Field | Count1 | Thresh1 | Count2 | Thresh2 |
| Port | | | | | |
+-------+--------------------------+--------------+-----------+-----------+-----------+
| 0-NI1 | not_synced_lane_3 | 1 | 0 | 0 | 1 |
| 0-NI1 | not_synced_lane_2 | 1 | 0 | 0 | 1 |
| 0-NI1 | not_synced_lane_0 | 1 | 0 | 0 | 1 |
| 0-NI1 | synced_lane_3 | 1 | 0 | 0 | 1 |
| 0-NI1 | synced_lane_2 | 1 | 0 | 0 | 1 |
| 0-NI1 | synced_lane_1 | 1 | 0 | 0 | 1 |
| 0-NI1 | synced_lane_0 | 1 | 0 | 0 | 1 |
| 0-NI1 | loc_fault | 1 | 0 | 0 | 1 |
| 0-NI1 | not_aligned | 1 | 0 | 0 | 1 |
| 0-NI1 | aligned | 1 | 0 | 0 | 1 |
+-------+--------------------------+--------------+-----------+-----------+-----------+
this output is clean, no wo_cr counters. *shows non-zero counters.
wo_cr indicates the buffer is without credit
FEX Drops
7/26/2019 Nexus Troubleshooting
117/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 117
FEX Drops
2148
rw> drops hiDropped packet counters for 0-HI0:
red_hix_cnt_rx_allow_vntag_drop : 0
red_hix_cnt_rx_echannel_drop : 0
red_hix_cnt_rx_fwd_drop : 0
red_hix_cnt_rx_mc_drop : 0
red_hix_cnt_rx_runt_pkt_drop : 0
red_hix_cnt_rx_src_vif_out_of_range_drop: 0
red_hix_cnt_tx_lb_drop : 11892
0-SS0 DDROP counters:
OQ0: Class0: 0 Class1: 0 Class2: 0 Class3: 0
OQ1: Class0: 0 Class1: 0 Class2: 0 Class3: 0
OQ2: Class0: 0 Class1: 0 Class2: 0 Class3: 0
OQ3: Class0: 0 Class1: 0 Class2: 0 Class3: 0
OQ4: Class0: 0 Class1: 0 Class2: 0 Class3: 0
0-SS0 ECC1: 0 ECC2: 0
0-SS0 wo_cr: 0 no cells: 0 mtu_vio: 0
7/26/2019 Nexus Troubleshooting
118/127
FEX Drops2248
7/26/2019 Nexus Troubleshooting
119/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 119
2248satctrl/qosctrl> show asic 0 0
SS Statistics:
SS No Credit* No Cells MTU Error OQ Discard Free Cells---+-----------+-----------+-----------+-----------+----------
0 0 0 0 0 10213
1 0 0 0 0 10213
...
Dropped packets per CoS due to OQ head-drop, OQ is per 8 port group:
OQ CoS 0 CoS 1 CoS 2 CoS 3 CoS 4 CoS 5 CoS 6 CoS 7
----+----------+----------+----------+----------+----------+----------+----------+-----------
NR0 0 0 0 0 0 0 0 0
NR1 0 0 0 0 0 0 0 0
NR2 0 0 0 0 0 0 0 0
NR3 0 0 0 0 0 0 0 0
NR4 0 0 0 0 0 0 0 0
NR5 0 0 0 0 0 0 0 0
----+----------+----------+----------+----------+----------+----------+----------+-----------
HR0 0 0 0 0 0 0 0 0
HR1 0 0 0 0 0 0 0 0
HR2 0 0 0 0 0 0 0 0
HR3 0 0 0 0 0 0 0 0
HR4 0 0 0 0 0 0 0 0
HR5 0 0 0 0 0 0 0 0
FEX Drops2248
7/26/2019 Nexus Troubleshooting
120/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 120
2248
fex130# dbgexec prt
prt> drops
PRT_SS_CNT_TAIL_DROP8 : 2 SS0
prt> show rmon 0 ni
+----------------------+----------------------+-----------------+----------------------+----------------------+-----------------+
| TX | Current | Diff | RX | Current| Diff |
+----------------------+----------------------+-----------------+----------------------+----------------------+-----------------+
| TX_PKT_LT64 | 0| 0| RX_PKT_LT64 |0| 0|
| TX_PKT_64 | 5| 1| RX_PKT_64 |8| 0|
| TX_PKT_65 | 2062219| 264039| RX_PKT_65 |4073560| 521532|
| TX_PKT_128 | 2149866| 274780| RX_PKT_128 |2060397| 263419|
| TX_PKT_256 | 1920669| 245601| RX_PKT_256
...
rmon counters are similar to the counters detailed on the N5k ports,helpful for error tracking and finding packets of a certain size
updates immediately show counters on n5k waits for the statsclient
Troubleshooting Nexus 5000 / 2000
7/26/2019 Nexus Troubleshooting
121/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 121
g
Problem Isolation
Platform Overview and troubleshooting
NX-OS Operation
Crashes
Nexus 5000Nexus 2000
Management
Queuing and forwarding
Logs
FEX Logs
7/26/2019 Nexus Troubleshooting
122/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 122
g
attach fex
dbgexec rw/prt (rw=2148, prt=2248)
Show ctx driver information
Show oper link states for L1 status
Show elog event log chronicling hardware and software interaction, helpful for L1 issues
Show ints interrupt counters
Show bootlog bootup messages
Show log any other logs
7/26/2019 Nexus Troubleshooting
123/127
Complete Your OnlineSession Evaluation
7/26/2019 Nexus Troubleshooting
124/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 1241
Receive 25 Cisco Preferred Access points for each sessionevaluation you complete.
Give us your feedback and you could win fabulous prizes. Points arecalculated on a daily basis. Winners will be notified by email afterJuly 22nd.
Complete your session evaluation online now (open a browserthrough our wireless network to access our portal) or visit one of theInternet stations throughout the Convention Center.
Dont forget to activate your Cisco Live and Networkers Virtualaccount for access to all session materials, communities, and on-
demand and live activities throughout the year. Activate your accountat any internet station or visit www.ciscolivevirtual.com.
http://www.ciscolivevirtual.com/http://www.ciscolivevirtual.com/7/26/2019 Nexus Troubleshooting
125/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 1251
Visit the Cisco Store forRelated Titles
http://theciscostores.com
http://theciscostore.com/http://theciscostore.com/7/26/2019 Nexus Troubleshooting
126/127
2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 126
7/26/2019 Nexus Troubleshooting
127/127
Thank you.