Nexus Troubleshooting

Embed Size (px)

Citation preview

  • 7/26/2019 Nexus Troubleshooting

    1/127

    BRKCRS-3145

    Troubleshooting theCisco Nexus 5000 / 2000

    Series Switches

  • 7/26/2019 Nexus Troubleshooting

    2/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 2

    Objectives

    Be able to quickly isolate problematic nodes in thedatacenter

    Become familiar with troubleshooting in NX-OS

    Understand Nexus 5000 and Nexus 2000 platformdetails

    Gain comfort using Nexus 5000 and Nexus 2000day to day

  • 7/26/2019 Nexus Troubleshooting

    3/127

  • 7/26/2019 Nexus Troubleshooting

    4/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 4

    4

    Problem Isolation

    A problem well stated is a problem half solved

    Source: Charles F. Kettering, Engineer and Inventor

  • 7/26/2019 Nexus Troubleshooting

    5/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 5

    Troubleshooting Tool #1

    A current, accurate diagram

    Physical ports

    Logical ports

    Spanning-tree root andblocked ports

    Helpful to use standardformats

    .jpg, .bmp, .pdf

    If you cannot describe how your network should beoperating, time may be wasted

    N7k-1 N7k-2

    N5k-1 N5k-2 N5k-3 N5k-4

    vPC

    po1

    vPC

    Po2

    vPC peer-keep

    e1/1 - e1/1

    vPC peer-link

    e1/2, 2/2

    Po100

    Domain 100

    RSTP Root

    N5k-5

    e1/10 - e1/10

    e1/12 - e1/12

    STP BLK

    vPC peer-link

    e1/1, 1/2

    Po101

    Domain 101

    vPC peer-link

    e1/1, 1/2

    Po102

    Domain 102

    e1/30 e1/31

    e3/1 e4/1

    e1/30 e1/31e1/30 e1/31e1/30 e1/31

    e3/1 e4/1

    e3/2 e4/2e3/2 e4/2

  • 7/26/2019 Nexus Troubleshooting

    6/127

  • 7/26/2019 Nexus Troubleshooting

    7/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 7

    Which show tech?As of 5.0(3), there are 68

    N5k-1# show tech-support ?aaa Display aaa information

    aclmgr ACL commands

    adjmgr Display Adjmgr information

    arp Display ARP information

    ascii-cfg Show ascii-cfg information for technical support personnel

    assoc_mgr Gather detailed information for assoc_mgr troubleshooting

    bcm-usd Gather detailed information for BCM USD troubleshooting

    bootvar Gather detailed information for bootvar troubleshooting

    brief Display the switch summarybtcm Gather detailed information for BTCM component

    callhome Callhome troubleshooting information

    cdp Gather information for CDP trouble shooting

    ...

    session-mgr Gather information for troubleshooting session manager

    snmp Gather info related to snmp

    sockets Display sockets status and configuration

    spm Service Policy Manager

    stp Gather detailed information for STP troubleshootingsysmgr Gather detailed information for sysmgr troubleshooting

    time-optimized Gather tech-support faster, requires more memory & disk space

    track Show track tech-support information

    vdc Gather detailed information for VDC troubleshooting

    vpc Gather detailed information for VPC troubleshooting

    vtp Gather detailed information for vtp troubleshooting

    xml Gather information for xml trouble shooting

  • 7/26/2019 Nexus Troubleshooting

    8/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 8

    Log your outputRedirect and Append

    N5k-1# show clock >bootflash:debug-file.txtN5k-1# show mac address-table >>bootflash:debug-file.txtN5k-1# show running-config | count >>bootflash:debug-file.txt

    N5k-1# show file bootflash:debug-file.txtMon Apr 4 02:39:41 UTC 2011

  • 7/26/2019 Nexus Troubleshooting

    9/127

  • 7/26/2019 Nexus Troubleshooting

    10/127

  • 7/26/2019 Nexus Troubleshooting

    11/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 11

    When to call TAC

    A description of the problem observed, withevidence / clues, along with time and scope

    A current network diagram

    All parties involved in the problem

    show tech is not necessary, but if you must makedrastic changes such as reloading or replacinghardware, grab this first

    Any targeted outputs, especially around the time ofthe event in question

    You think you have found a bug, but a quick searchof defects or release notes on cisco.com may be

    faster

    Most efficient if you have the following:

  • 7/26/2019 Nexus Troubleshooting

    12/127

  • 7/26/2019 Nexus Troubleshooting

    13/127

  • 7/26/2019 Nexus Troubleshooting

    14/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 14

    CLI list and grep

    ctrl-c terminates output

    NX-OSOperation Tips

    N5k-3# show cli list | grep switchportshow system default switchport san

    show interface switchport

    show interface switchport

    N5k-3# show tech-support

    ---- show tech-support ----

    ctrl-cN5k-3#

  • 7/26/2019 Nexus Troubleshooting

    15/127

  • 7/26/2019 Nexus Troubleshooting

    16/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 16

    volatile: filesystem is virtual, use as scratch if needed

    Obviously volatile, will not survive a reload log: filesystem is in root /

    NX-OSFile Structure

    N5k-1# debug logfile CiscoLive_debugsN5k-1# show debugOutput forwarded to file CiscoLive_debugs (size: 4194304 bytes)

    Debug level is set to Minor(1)

    N5k-1# dir log:

    0 Apr 04 01:14:01 2011 CiscoLive_debugs31 Mar 11 11:38:35 2011 dmesg

    0 Mar 11 11:38:57 2011 libfipf.4365

    79101 Apr 04 00:34:02 2011 messages

    6670 Apr 04 00:06:01 2011 startupdebug

    N5k-1# copy log:CiscoLive_debugs tftp:Enter vrf: management

    Enter hostname for the tftp server: 10.91.42.134Trying to connect to tftp server......

    Connection to Server Established.

    |

    TFTP put operation was successful

    N5k-1# clear debug-logfile CiscoLive_debugs-OR-

    N5k-1# undebug all

  • 7/26/2019 Nexus Troubleshooting

    17/127 2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 17

    Troubleshooting Nexus 5000 / 2000

    Problem Isolation

    Platform Overview

    NX-OS Operation

    FSM

    MTS

    Crashes

    Nexus 5000

    Nexus 2000

    Platform Overview and troubleshooting Redundancy operation and troubleshooting

  • 7/26/2019 Nexus Troubleshooting

    18/127

  • 7/26/2019 Nexus Troubleshooting

    19/127

  • 7/26/2019 Nexus Troubleshooting

    20/127

  • 7/26/2019 Nexus Troubleshooting

    21/127

  • 7/26/2019 Nexus Troubleshooting

    22/127

  • 7/26/2019 Nexus Troubleshooting

    23/127

  • 7/26/2019 Nexus Troubleshooting

    24/127

  • 7/26/2019 Nexus Troubleshooting

    25/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 25

    NX-OSMTS

    recv queue should not grow old

    SAP 0 is an invalid identifier and causes 300messages to queue, and growing.

    Observed impact is various show commands timing

    out such as show log and show run

    N5k-1# show system internal mts buffers details

    Node/Sap/queue Age(ms) SrcNode SrcSAP DstNode DstSAP OPC MsgId MsgSize

    sup/32/recv 319672424 0x101 25330 0x101 0 7662 1221952768 192sup/32/recv 319669986 0x101 25336 0x101 32 188 1221953842 328

    sup/32/recv 319609082 0x101 25344 0x101 0 7663 1221971222 2452...

    sup/32/recv 227324 0x101 32550 0x101 32 188 1301415915 328

    sup/32/recv 165509 0x101 32560 0x101 0 7663 1301432732 2452

    sup/32/recv 101893 0x101 32565 0x101 0 7662 1301448663 192

  • 7/26/2019 Nexus Troubleshooting

    26/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 26

    NX-OSMTS

    MTS messages have been addressed to SAP 0 dueto a bug.

    Reload was needed to clear this scenario

    N5k-1# sh system internal mts sup sap 0 description

    Not implementedN5k-1# sh system internal mts sup sap 32 descriptionSyslog Sup Node Cfg

    N5k-1# show system internal sysmgr service name syslogd

    Service "syslogd" ("syslogd", 75):

    UUID = 0x21, PID = 3924, SAP = 32

    State: SRV_STATE_HANDSHAKED (entered at time Sat May 15 05:01:202010). Restart count: 1

    Time of last restart: Sat May 15 05:01:20 2010. The service never

    crashed since the last reboot.

    Tag = N/A

    Plugin ID: 0

  • 7/26/2019 Nexus Troubleshooting

    27/127

  • 7/26/2019 Nexus Troubleshooting

    28/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 28

    NX-OS attempts to create a core file with information helpful to aid in findingand fixing the problem

    stack trace

    memory contents

    Some processes in NX-OS are able to be restarted in a stateful manner.

    Nexus 5000 is a single-supervisor platform; critical processes require asystem restart upon a crash.

    NX-OSCrashes

    2010 Sep 10 16:19:27.411 N5k-1 %$ VDC-1 %$ %SYSMGR-2-

    SERVICE_CRASHED: Service "fwm" (PID 2723) hasn't caught signal

    6 (core will be saved).

    A syslog message is sent just before crash and system restart

  • 7/26/2019 Nexus Troubleshooting

    29/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 29

    show process log

    View status of all processes, including if a core was created

    N5k-1# show process log

    Process PID Normal-exit Stack Core Log-create-time

    --------------- ------ ----------- ----- ----- ---------------

    eth_port_channel 2743 N Y N Wed Mar 17 17:20:57 2010

    eth_port_channel 2761 N Y N Tue Aug 3 19:14:58 2010

    fwm 2703 N Y N Fri Oct 8 19:24:12 2010...

    N5k-1# show process log pid 2703======================================================

    Service: fwm

    Description: Forwarding manager Daemon

    Started at Thu Oct 7 14:51:51 2010 (151707 us)

    Stopped at Fri Oct 8 19:24:12 2010 (203577 us)

    Uptime: 1 days 4 hours 32 minutes 21 seconds

    Start type: SRV_OPTION_RESTART_STATELESS (23)

    Death reason: SYSMGR_DEATH_REASON_FAILURE_SIGNAL (2)

    ...

    NX-OSCrashes

  • 7/26/2019 Nexus Troubleshooting

    30/127

  • 7/26/2019 Nexus Troubleshooting

    31/127

  • 7/26/2019 Nexus Troubleshooting

    32/127

  • 7/26/2019 Nexus Troubleshooting

    33/127

  • 7/26/2019 Nexus Troubleshooting

    34/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 34

    To talk about forwarding errors and troubleshooting, drops are usually part ofthis discussion

    We have to know a basic hardware layout in order to know where to look for

    problems

    The following hardware overview is a preview of

    BRKARC-3452 Cisco Nexus 5000/5500 and 2000 Switch Architecture

    Hardware overview

  • 7/26/2019 Nexus Troubleshooting

    35/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 35

    Nexus 5000 is a distributed

    forwarding architecture Unified Port Controller (UPC)

    ASIC interconnected by asingle stage Unified CrossbarFabric (UCF)

    Unified Port Controllers provide

    distributed packet forwardingcapabilities

    A l l port to port traffic passesthrough the UCF (Fabric)

    Four switch ports managed byeach UPC

    14 UPC in Nexus 5020

    7 UPC in Nexus 5010

    Unified Crossbar

    Fabric

    Unified Port

    Controller

    SFP SFP SFP SFP SFP SFP SFP SFP

    SFP SFP

    Unified Port

    Controller

    SFP SFP SFP SFP

    Unified Port

    Controller

    Unified Port

    Controller

    SFP SFP SFP SFP

    Unified Port

    Controller

    . . .

    Nexus 5000 Hardware OverviewData Plane Elements

  • 7/26/2019 Nexus Troubleshooting

    36/127

  • 7/26/2019 Nexus Troubleshooting

    37/127

  • 7/26/2019 Nexus Troubleshooting

    38/127

  • 7/26/2019 Nexus Troubleshooting

    39/127

  • 7/26/2019 Nexus Troubleshooting

    40/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 40

    Nexus 5000/5500 Hardware OverviewControl Plane Elements

    CPU

    South

    Bridge

    NIC

    Unified Port

    Controller

    In-band traffic is identified by the UPCand punted to the CPU via twodedicated UPC interfaces, 5/0 and 5/1,which are in turn connected to eth3and eth4 interfaces in the CPUcomplex

    Eth3 handles Rx and Tx of low prioritycontrol pkts

    IGMP, CDP, TCP/UDP/IP/ARP (formanagement purpose only)

    Eth4 handles Rx and Tx of high

    priority control pkts STP, LACP, DCBX, FC and FCoE

    control frames (FC packets come toSwitch CPU as FCoE packets)

    There is a built-in control-plane policer tolimit the amount of traffic punted to CPU

    eth3 eth4

    NIC

    mgmt0

  • 7/26/2019 Nexus Troubleshooting

    41/127

  • 7/26/2019 Nexus Troubleshooting

    42/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 42

    Nexus 5000 Hardware OverviewControl Plane Elements

    CPU

    Intel LV Xeon

    1.66 GHz

    South

    Bridge

    NIC

    Unified PortController

    Monitoring of in-band traffic via NX-OS

    built-in ethanalyzer (sniffer) Eth3 is equivalent to inbound-lo

    Eth4 is equivalent to inbound-hi

    eth3 eth4

    N5k-2# ethanalyzer local sniff-interface ?inbound-hi Inbound(high priority) interfaceinbound-low Inbound(low priority) interface

    mgmt Management interface

    N5k-2# sh hardware internal cpu-mac inband counterseth3 Link encap:Ethernet HWaddr 00:0D:EC:B2:0C:83

    UP BROADCAST RUNNING PROMISC ALLMULTI MULTICAST MTU:2200 Metric:1RX packets:3 errors:0 dropped:0 overruns:0 frame:0TX packets:630 errors:0 dropped:0 overruns:0 carrier:0

    collisions:0 txqueuelen:1000RX bytes:252 (252.0 b) TX bytes:213773 (208.7 KiB)Base address:0x6020 Memory:fa4a0000-fa4c0000

    eth4 Link encap:Ethernet HWaddr 00:0D:EC:B2:0C:84UP BROADCAST RUNNING PROMISC ALLMULTI MULTICAST MTU:2200 Metric:1RX packets:85379 errors:0 dropped:0 overruns:0 frame:0TX packets:92039 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:33960760 (32.3 MiB) TX bytes:25825826 (24.6 MiB)Base address:0x6000 Memory:fa440000-fa460000

    CLI view of in-band control plane data

  • 7/26/2019 Nexus Troubleshooting

    43/127

  • 7/26/2019 Nexus Troubleshooting

    44/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 44

    Nexus 5000 Forwardingcut-through vs. store and forward

    Store and forward switching is still utilized when the ingress

    data rate is slower than the egress data rate.

    Cut-through switching is utilized to achieve low latency throughthe switch fabric.

    Bits are serialized in from the ingress port until enough ofthe packet header has been received to perform a

    forwarding and policy lookup Once a lookup decision has been made and the fabric has

    granted access to the egress port bits are forwardedthrough the fabric

    Egress port performs any header rewrite (e.g. CoS marking)and MAC begins serialization of bits out the egress port

    A drop cannot happen on ingress due to any switching logic oreven a CRC error. Only faulty hardware or connections cancause a drop on ingress.

    Discards can occur on ingress due to queuing configurationand traffic patterns.

  • 7/26/2019 Nexus Troubleshooting

    45/127

  • 7/26/2019 Nexus Troubleshooting

    46/127

  • 7/26/2019 Nexus Troubleshooting

    47/127

  • 7/26/2019 Nexus Troubleshooting

    48/127

  • 7/26/2019 Nexus Troubleshooting

    49/127

  • 7/26/2019 Nexus Troubleshooting

    50/127

  • 7/26/2019 Nexus Troubleshooting

    51/127

  • 7/26/2019 Nexus Troubleshooting

    52/127

  • 7/26/2019 Nexus Troubleshooting

    53/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 53

    Finding the source of CRC errors CRC errors are introduced in 3 ways:

    Bad physical connection

    copper, fiber, transceiver, phy

    stomping due to intentionally originated errors

    Received bad CRC stomped from neighboring cut-throughswitch.

    Start by finding any RX CRC counters.

    If none, then this switch is responsible for originating

    Use interrupt counters to find the reason and port, if intentional

    Log in to next switch upstream of CRC counters, check forRX CRC there.

    Use the above logic to determine if this switch is originatingany errors.

    Finally, inspect optics/pluggables, fiber/cables andtroubleshoot as a Layer 1 issue. Change cable and port tofind where the problem follows.

  • 7/26/2019 Nexus Troubleshooting

    54/127

  • 7/26/2019 Nexus Troubleshooting

    55/127

  • 7/26/2019 Nexus Troubleshooting

    56/127

  • 7/26/2019 Nexus Troubleshooting

    57/127

  • 7/26/2019 Nexus Troubleshooting

    58/127

  • 7/26/2019 Nexus Troubleshooting

    59/127

  • 7/26/2019 Nexus Troubleshooting

    60/127

  • 7/26/2019 Nexus Troubleshooting

    61/127

  • 7/26/2019 Nexus Troubleshooting

    62/127

  • 7/26/2019 Nexus Troubleshooting

    63/127

  • 7/26/2019 Nexus Troubleshooting

    64/127

  • 7/26/2019 Nexus Troubleshooting

    65/127

    Finding the source of CRC errors

  • 7/26/2019 Nexus Troubleshooting

    66/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 66

    Finding the source of CRC errorsScenario #1: Physical Issue

    N7k-1

    N5k-2N5k-1

    e1/11 e1/12

    e1/7 e1/7

    e1/1e1/4

    VLAN 7

    VLAN 8

    e1/5

    N5k-2# show hardware internal gatos asic 0 counters interruptGatos 0 interrupt statistics:Interrupt name |Count |ThresRch|ThresCnt|Ivls

    -----------------------------------------------+--------+--------+--------+----

    gat_fw2_INT_eg_pkt_err_cb_bm_eof_err |1 |0 |0 |0gat_fw2_INT_eg_pkt_err_eth_crc_stomp |1 |0 |0 |0gat_fw2_INT_eg_pkt_err_e802_3_len_err |1 |0 |0 |0

    e1/5

    e1/3

    Front Panel Internal

    e1/1 7:2

    e1/5 7:1

    e1/3 0:2

    Interrupt counters incrementupon transmit of errored frame

  • 7/26/2019 Nexus Troubleshooting

    67/127

  • 7/26/2019 Nexus Troubleshooting

    68/127

  • 7/26/2019 Nexus Troubleshooting

    69/127

    Finding the source of CRC errors

  • 7/26/2019 Nexus Troubleshooting

    70/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 70

    Finding the source of CRC errorsObservations, scenario #2

    N7k-1

    N5k-2N5k-1

    e1/11 e1/12

    e1/7 e1/7

    e1/1e1/4

    VLAN 7

    VLAN 8

    e1/5e1/5

    e1/3

    N7k-1# show interface e1/11

    RX4 unicast packets 0 multicast packets 0 broadcast packets

    4 input packets 5672 bytes

    0 jumbo packets 0 storm suppression packets

    0 runts 0 giants 1 CRC 0 no buffer1 input error 0 short frame 0 overrun 0 underrun 0

    ignored

  • 7/26/2019 Nexus Troubleshooting

    71/127

    Finding the source of CRC errors

  • 7/26/2019 Nexus Troubleshooting

    72/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 72

    Finding the source of CRC errorsScenario #2: MTU Exceeded

    N7k-1

    N5k-2N5k-1

    e1/11 e1/12

    e1/7 e1/7

    e1/1e1/4

    VLAN 7

    VLAN 8

    e1/5e1/5

    e1/3

    Front Panel Internal

    e1/1 7:2

    4000B frame

    transmittedN5k-1# show hardware internal gatos port e1/1 counters

    rx

    RX_PKT_SIZE_IS_1519_TO_2047 | 0

    RX_PKT_SIZE_IS_2048_TO_4095 | 1RX_PKT_SIZE_IS_4095_TO_8191 | 0

    RX_PKT_SIZE_IS_8192_TO_9216 | 0

    RX_PKT_SIZE_GT_9216 | 0

    Hardware counters keep trackof size ranges.

  • 7/26/2019 Nexus Troubleshooting

    73/127

  • 7/26/2019 Nexus Troubleshooting

    74/127

    Finding the source of CRC errors

  • 7/26/2019 Nexus Troubleshooting

    75/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 75

    Finding the source of CRC errorsScenario#2: MTU Exceeded

    N7k-1

    N5k-2N5k-1

    e1/11 e1/12

    e1/7 e1/7

    e1/1e1/4

    VLAN 7

    VLAN 8

    e1/5e1/5

    e1/3

    N5k-1# show hardware internal gatos asic 0 counters interruptGatos 0 interrupt statistics:Interrupt name |Count |ThresRch|ThresCnt|Ivls

    -----------------------------------------------+--------+--------+--------+----

    gat_fw1_INT_eg_pkt_err_cb_bm_eof_err |1 |0 |1 |0gat_fw1_INT_eg_pkt_err_eth_crc_stomp |1 |0 |1 |0gat_fw1_INT_eg_pkt_err_ip_pyld_len_err |1 |0 |1 |0gat_mm1_INT_rlp_tx_pkt_crc_err |1 |0 |1 |0

    Front Panel Internal

    e1/1 7:2

    e1/7 0:1

    Leaving the egress interface,

    the CRC has been stomped and

    other interrupts have fired.

    Note the egress interface will

    aggregate all frames from various

    source interfaces. Adding up

    counters can be tricky.

  • 7/26/2019 Nexus Troubleshooting

    76/127

  • 7/26/2019 Nexus Troubleshooting

    77/127

  • 7/26/2019 Nexus Troubleshooting

    78/127

  • 7/26/2019 Nexus Troubleshooting

    79/127

  • 7/26/2019 Nexus Troubleshooting

    80/127

  • 7/26/2019 Nexus Troubleshooting

    81/127

  • 7/26/2019 Nexus Troubleshooting

    82/127

  • 7/26/2019 Nexus Troubleshooting

    83/127

    NX-OS

  • 7/26/2019 Nexus Troubleshooting

    84/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 84

    N5k-1# show system resources

    Load average: 1 minute: 0.95 5 minutes: 1.54 15 minutes: 1.46

    Processes : 281 total, 4 running

    CPU states : 26.7% user, 26.7% kernel, 46.5% idle

    Memory usage: 2073408K total, 1412172K used, 661236K free

    N5k-1# show process cpu sort | exclude 0.0

    PID Runtime(ms) Invoked uSecs 1Sec Process

    ----- ----------- -------- ----- ------ -----------

    4230 398 5011881 0 22.0% snmpd

    4204 1467 84869127 0 20.2% gatosusd

    4226 433 5601856 0 5.5% statsclient

    4264 1380 391510 3 3.7% ethpm

    4302 254 103 2468 1.8% netstack

    Ethanalyzer and CPU

    Using to aid in identifying external causes of high CPU utilization

  • 7/26/2019 Nexus Troubleshooting

    85/127

    NX-OS

  • 7/26/2019 Nexus Troubleshooting

    86/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 86

    N5k-1# show process cpu history

    1 1

    754669098990899966777977656766876775178734455655456466545645

    006186077990796258300801881187120477641015900150830621684070

    100 ### ### ## #

    90 ########### #

    80 ########### # # # #

    70 # ##################### ##### ## ###

    60 # ################################# ### ## # ### #

    50 #################################### ### ###################

    40 #################################### ### ###################

    30 #################################### #######################

    20 ############################################################

    10 ############################################################

    0....5....1....1....2....2....3....3....4....4....5....5....

    0 5 0 5 0 5 0 5 0 5

    CPU% per second (last 60 seconds)

    # = average CPU%

    Ethanalyzer and CPU Observed spike in CPU (per second)

  • 7/26/2019 Nexus Troubleshooting

    87/127

  • 7/26/2019 Nexus Troubleshooting

    88/127

  • 7/26/2019 Nexus Troubleshooting

    89/127

  • 7/26/2019 Nexus Troubleshooting

    90/127

    N 5000/5500 Q i

  • 7/26/2019 Nexus Troubleshooting

    91/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 91

    Nexus 5000/5500 Queuing

    Nexus 5000/5500 utilize ingress queuing Ingress queuing is helpful for data flows where many ports

    talk to few, the load is spread across the sources

    Simple flowcontrol mechanism can be implemented

    end-to-end flowcontrol is necessary for FCoE

    Ingress queuing is implemented by Virtual Output Queuing(VOQ)

    VOQ prevents head of line blocking

    One egress interface can be congested, but ingressbuff still accepts frame into other queues

    8 class-based unicast VOQ per egress interface on everyingress interface

    8 class-based multicast VOQ per ingress interface

    N 5000/5500 Q i

  • 7/26/2019 Nexus Troubleshooting

    92/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 92

    Nexus 5000/5500 Queuing

    Ingress queuing implication on troubleshooting:

    Drops occur at INGRESS!

    You must think about where the flow originates on the switch todetermine where you would like to look for drops.

  • 7/26/2019 Nexus Troubleshooting

    93/127

  • 7/26/2019 Nexus Troubleshooting

    94/127

  • 7/26/2019 Nexus Troubleshooting

    95/127

  • 7/26/2019 Nexus Troubleshooting

    96/127

    Nexus 5000/5500 QueuingScenario

  • 7/26/2019 Nexus Troubleshooting

    97/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 97

    Scenario

    N5k-1

    e1/1

    e1/5e1/5

    e1/3

    N5k-2

    Server A Server B

    Trunk

    N5k-1# show platform fwm info asic-errors 7

    Printing non zero Gatos error registers:

    N5k-1# show hardware internal gatos asic 7 counters interrupt

    Gatos 7 interrupt statistics:

    Interrupt name |Count |ThresRch|ThresCnt|Ivls

    Front Panel Internal

    e1/1 7:2

    e1/5 7:1

    These outputs are also clean

    Move on to the egress interface e1/5

    In this case, e1/5 is on the same ASIC, so we have alreadygathered the output needed

    Nexus 5000/5500 QueuingScenario

  • 7/26/2019 Nexus Troubleshooting

    98/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 98

    Scenario

    N5k-1

    e1/1

    e1/5e1/5

    e1/3

    N5k-2

    Server A Server B

    Trunk

    N5k-1# show platform fwm info pif e1/5 | grep stats

    Eth1/5 pd: tx stats: bytes 476497477 frames 0 discard 0 drop 0

    Eth1/5 pd: rx stats: bytes 232322392 frames 0 discard 0 drop 0

    Eth1/5 pd fcoe: tx stats: bytes 0 frames 0 discard 0 drop 0

    Eth1/5 pd fcoe: rx stats: bytes 0 frames 0 discard 0 drop 0

    Front Panel Internal

    e1/1 7:2

    e1/5 7:1

    These outputs are clean

  • 7/26/2019 Nexus Troubleshooting

    99/127

  • 7/26/2019 Nexus Troubleshooting

    100/127

  • 7/26/2019 Nexus Troubleshooting

    101/127

    Nexus 5000/5500 QueuingScenario

  • 7/26/2019 Nexus Troubleshooting

    102/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 102

    Scenario

    N5k-1

    e1/1

    e1/5e1/5

    e1/3

    N5k-2

    Server A Server B

    Trunk

    N5k-1# show hardware internal gatos asic 7 counters interrupt

    ...

    gat_lu_lkup1_INT_func_lo_drop_src_vlan_mbr|74 |...

    Interrupt counters will agree that a given error has fired from thehardware

    number is hex and

    we do not record every interrupt due to the rate at whichinterrupts can hit CPU. Generally this number will be somewhatless than the show platform fwm info pif number

    Front Panel Internal

    e1/1 7:2

    e1/5 7:1

  • 7/26/2019 Nexus Troubleshooting

    103/127

  • 7/26/2019 Nexus Troubleshooting

    104/127

  • 7/26/2019 Nexus Troubleshooting

    105/127

    Spanning-tree

  • 7/26/2019 Nexus Troubleshooting

    106/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 106

    Spanning-tree

    N5k-1# show spanning-tree internal event-history all

    -------------------- All the active STPs -----------

    VDC01 VLAN0001

    0) Transition at 848207 usecs after Thu Jan 13 05:05:54 2005

    Root: 0000.0000.0000.0000 Cost: 0Age: 0 Root Port: none Port: none [STP_TREE_EV_UP]

    1) Transition at 367168 usecs after Thu Jan 13 05:05:57 2005

    Root: 8001.000d.ecd6.02fc Cost: 0 Age: 0 Root Port: none Port: Ethernet1/15[STP_TREE_EV_UPDATE_TOPO_RCVD_SUP_BPDU]

    2) Transition at 373395 usecs after Thu Jan 13 05:05:57 2005

    Root: 2063.00d0.0362.4c00 Cost: 2 Age: 1 Root Port: Ethernet1/15 Port: none[STP_TREE_EV_MULTI_FLUSH_LOCAL]

    3) Transition at 434563 usecs after Thu Jan 13 05:06:00 2005

    Root: 2063.00d0.0362.4c00 Cost: 2 Age: 1 Root Port: Ethernet1/15 Port: Ethernet1/15[STP_TREE_EV_MULTI_FLUSH_RCVD]

    Checking all trees

  • 7/26/2019 Nexus Troubleshooting

    107/127

    Troubleshooting Nexus 5000 / 2000

  • 7/26/2019 Nexus Troubleshooting

    108/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 108

    Troubleshooting Nexus 5000 / 2000

    Problem Isolation Platform Overview and troubleshooting

    NX-OS Operation

    Crashes

    Nexus 5000Nexus 2000

    Management

    Queuing and forwarding

    Logs

  • 7/26/2019 Nexus Troubleshooting

    109/127

  • 7/26/2019 Nexus Troubleshooting

    110/127

  • 7/26/2019 Nexus Troubleshooting

    111/127

    Troubleshooting Nexus 5000 / 2000

  • 7/26/2019 Nexus Troubleshooting

    112/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 112

    Troubleshooting Nexus 5000 / 2000

    Problem Isolation Platform Overview and troubleshooting

    NX-OS Operation

    Crashes

    Nexus 5000Nexus 2000

    Management

    Queuing and forwarding

    Logs

    FEX Drops

  • 7/26/2019 Nexus Troubleshooting

    113/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 113

    FEX Drops

    Network interface drops can be seen from N5kshow queuing interface as of 5.0(3)N1(1)

    Best to attach to FEX to get detailed logs

    Similar to Cat 6k or Nexus 7k linecard commands

    Important to check here as FEX also have crashlogs, have their own CPU, and are responsible forcommunicating link state and offloading someprotocols like CDP.

    N5k-1# attach fex 100

    Attaching to FEX 100 ...

    To exit type 'exit', to abort type '$.'

    fex-100#

    FEX Drops

  • 7/26/2019 Nexus Troubleshooting

    114/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 114

    FEX Drops

    Network interface drops can be seen from N5kshow queuing interface as of 5.0(3)N1(1)

    Best to attach to FEX to get detailed logs

    Similar to Cat 6k or Nexus 7k linecard commands

    Important to check here as FEX also have crashlogs, have their own CPU, and are responsible forcommunicating link state and offloading someprotocols like CDP.

    N5k-1# attach fex 100

    Attaching to FEX 100 ...

    To exit type 'exit', to abort type '$.'

    fex-100#

  • 7/26/2019 Nexus Troubleshooting

    115/127

    FEX Drops

  • 7/26/2019 Nexus Troubleshooting

    116/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 116

    2148fex-100# dbgexec rw

    rw> show ints

    ASIC: 0:+-------+--------------------------+--------------+-----------+-----------+-----------+

    | ASIC | Interrupt Bit Field | Count1 | Thresh1 | Count2 | Thresh2 |

    | Port | | | | | |

    +-------+--------------------------+--------------+-----------+-----------+-----------+

    | 0-NI1 | not_synced_lane_3 | 1 | 0 | 0 | 1 |

    | 0-NI1 | not_synced_lane_2 | 1 | 0 | 0 | 1 |

    | 0-NI1 | not_synced_lane_0 | 1 | 0 | 0 | 1 |

    | 0-NI1 | synced_lane_3 | 1 | 0 | 0 | 1 |

    | 0-NI1 | synced_lane_2 | 1 | 0 | 0 | 1 |

    | 0-NI1 | synced_lane_1 | 1 | 0 | 0 | 1 |

    | 0-NI1 | synced_lane_0 | 1 | 0 | 0 | 1 |

    | 0-NI1 | loc_fault | 1 | 0 | 0 | 1 |

    | 0-NI1 | not_aligned | 1 | 0 | 0 | 1 |

    | 0-NI1 | aligned | 1 | 0 | 0 | 1 |

    +-------+--------------------------+--------------+-----------+-----------+-----------+

    this output is clean, no wo_cr counters. *shows non-zero counters.

    wo_cr indicates the buffer is without credit

    FEX Drops

  • 7/26/2019 Nexus Troubleshooting

    117/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 117

    FEX Drops

    2148

    rw> drops hiDropped packet counters for 0-HI0:

    red_hix_cnt_rx_allow_vntag_drop : 0

    red_hix_cnt_rx_echannel_drop : 0

    red_hix_cnt_rx_fwd_drop : 0

    red_hix_cnt_rx_mc_drop : 0

    red_hix_cnt_rx_runt_pkt_drop : 0

    red_hix_cnt_rx_src_vif_out_of_range_drop: 0

    red_hix_cnt_tx_lb_drop : 11892

    0-SS0 DDROP counters:

    OQ0: Class0: 0 Class1: 0 Class2: 0 Class3: 0

    OQ1: Class0: 0 Class1: 0 Class2: 0 Class3: 0

    OQ2: Class0: 0 Class1: 0 Class2: 0 Class3: 0

    OQ3: Class0: 0 Class1: 0 Class2: 0 Class3: 0

    OQ4: Class0: 0 Class1: 0 Class2: 0 Class3: 0

    0-SS0 ECC1: 0 ECC2: 0

    0-SS0 wo_cr: 0 no cells: 0 mtu_vio: 0

  • 7/26/2019 Nexus Troubleshooting

    118/127

    FEX Drops2248

  • 7/26/2019 Nexus Troubleshooting

    119/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 119

    2248satctrl/qosctrl> show asic 0 0

    SS Statistics:

    SS No Credit* No Cells MTU Error OQ Discard Free Cells---+-----------+-----------+-----------+-----------+----------

    0 0 0 0 0 10213

    1 0 0 0 0 10213

    ...

    Dropped packets per CoS due to OQ head-drop, OQ is per 8 port group:

    OQ CoS 0 CoS 1 CoS 2 CoS 3 CoS 4 CoS 5 CoS 6 CoS 7

    ----+----------+----------+----------+----------+----------+----------+----------+-----------

    NR0 0 0 0 0 0 0 0 0

    NR1 0 0 0 0 0 0 0 0

    NR2 0 0 0 0 0 0 0 0

    NR3 0 0 0 0 0 0 0 0

    NR4 0 0 0 0 0 0 0 0

    NR5 0 0 0 0 0 0 0 0

    ----+----------+----------+----------+----------+----------+----------+----------+-----------

    HR0 0 0 0 0 0 0 0 0

    HR1 0 0 0 0 0 0 0 0

    HR2 0 0 0 0 0 0 0 0

    HR3 0 0 0 0 0 0 0 0

    HR4 0 0 0 0 0 0 0 0

    HR5 0 0 0 0 0 0 0 0

    FEX Drops2248

  • 7/26/2019 Nexus Troubleshooting

    120/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 120

    2248

    fex130# dbgexec prt

    prt> drops

    PRT_SS_CNT_TAIL_DROP8 : 2 SS0

    prt> show rmon 0 ni

    +----------------------+----------------------+-----------------+----------------------+----------------------+-----------------+

    | TX | Current | Diff | RX | Current| Diff |

    +----------------------+----------------------+-----------------+----------------------+----------------------+-----------------+

    | TX_PKT_LT64 | 0| 0| RX_PKT_LT64 |0| 0|

    | TX_PKT_64 | 5| 1| RX_PKT_64 |8| 0|

    | TX_PKT_65 | 2062219| 264039| RX_PKT_65 |4073560| 521532|

    | TX_PKT_128 | 2149866| 274780| RX_PKT_128 |2060397| 263419|

    | TX_PKT_256 | 1920669| 245601| RX_PKT_256

    ...

    rmon counters are similar to the counters detailed on the N5k ports,helpful for error tracking and finding packets of a certain size

    updates immediately show counters on n5k waits for the statsclient

    Troubleshooting Nexus 5000 / 2000

  • 7/26/2019 Nexus Troubleshooting

    121/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 121

    g

    Problem Isolation

    Platform Overview and troubleshooting

    NX-OS Operation

    Crashes

    Nexus 5000Nexus 2000

    Management

    Queuing and forwarding

    Logs

    FEX Logs

  • 7/26/2019 Nexus Troubleshooting

    122/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 122

    g

    attach fex

    dbgexec rw/prt (rw=2148, prt=2248)

    Show ctx driver information

    Show oper link states for L1 status

    Show elog event log chronicling hardware and software interaction, helpful for L1 issues

    Show ints interrupt counters

    Show bootlog bootup messages

    Show log any other logs

  • 7/26/2019 Nexus Troubleshooting

    123/127

    Complete Your OnlineSession Evaluation

  • 7/26/2019 Nexus Troubleshooting

    124/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 1241

    Receive 25 Cisco Preferred Access points for each sessionevaluation you complete.

    Give us your feedback and you could win fabulous prizes. Points arecalculated on a daily basis. Winners will be notified by email afterJuly 22nd.

    Complete your session evaluation online now (open a browserthrough our wireless network to access our portal) or visit one of theInternet stations throughout the Convention Center.

    Dont forget to activate your Cisco Live and Networkers Virtualaccount for access to all session materials, communities, and on-

    demand and live activities throughout the year. Activate your accountat any internet station or visit www.ciscolivevirtual.com.

    http://www.ciscolivevirtual.com/http://www.ciscolivevirtual.com/
  • 7/26/2019 Nexus Troubleshooting

    125/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 1251

    Visit the Cisco Store forRelated Titles

    http://theciscostores.com

    http://theciscostore.com/http://theciscostore.com/
  • 7/26/2019 Nexus Troubleshooting

    126/127

    2011 Cisco and/or its affiliates. All rights reserved. Cisco PublicBRKCRS-3145 126

  • 7/26/2019 Nexus Troubleshooting

    127/127

    Thank you.