JUNIPER JUNOS TMJ Troubleshooting

  • View
    300

  • Download
    2

Embed Size (px)

Text of JUNIPER JUNOS TMJ Troubleshooting

Juniper T/M/JJUNOS troubleshooting basics

Version 0.1

Author: Department: Date: Version:

0.1

Troubleshooting guide for T/M/J JUNOS routers

1 Table of contents1 Table of contents.......................................................................................................................1 Introduction..................................................................................................................................2 Document Objective...................................................................................................................2 Scope.........................................................................................................................................2 1.1 Document History..................................................................................................................3 Related documents....................................................................................................................3 2 Troubleshooting guidelines.....................................................................................................4 2.1 Basis troubleshooting for all events ......................................................................................4 2.2 Common events....................................................................................................................5 2.2.1 Power supply failure........................................................................................................5 2.2.2 Fan failure/Temperature alert.........................................................................................5 2.2.3 Device reboot with unknown cause.................................................................................6 2.2.4 Chassis event (component failure)..................................................................................7 2.2.5 Routing-engine ..............................................................................................................8 2.2.6 Link failure......................................................................................................................9 2.2.7 Management IP unreachable (ICMP)............................................................................10 2.2.8 In-band Loopback IP unreachable (ICMP)....................................................................11 2.2.9 BGP neighbor ..............................................................................................................12 2.2.10 ISIS adjacency ...........................................................................................................15 2.2.11 VRRP .........................................................................................................................16 2.2.12 LDP neighbor/MPLS ..................................................................................................17 2.2.13 PIM neighbor/multicast ..............................................................................................18 2.3 Non-fault management alarms or undocumented events....................................................19 2.3.1 Undocumented event....................................................................................................19 2.3.2 Network slow................................................................................................................19 2.3.3 Reachability problem....................................................................................................19 2.3.4 Complete service/product not working..........................................................................19 2.4 Disaster recovery................................................................................................................20 2.5 Hardware maintenance verification.....................................................................................21

1

Troubleshooting guide for T/M/J JUNOS routers

IntroductionDocument Objective

This document will show basic instructions for certain types of alarms. The basic troubleshooting steps defined will be categorized per event and are valid for JUNOS software running on M/T/J series models. For most of the events reference to the vendor documentation is given where additional information can be looked up. This vendor documentation is also available in PDF format and should be present at a common location for operational personal (accompanying this document). The output interpretation of the command can also be looked up in the vendor documentation: Go to www.juniper.net type command in the search area, all command output reference information can be found there.

Scope

This document will describe the initial troubleshooting for the most common events. It will also describe a generic approach per fault. It assumes the following knowledge and capabilities from the operator: Basic topology knowledge of the network (what is core, distribution, access) Basic knowledge of Juniper T/M/J hardware (know the generic architecture of the box, should know about routing engine, FPC, PIC, SIB, etc). Basic knowledge of JUNOS (can log-in, can run commands) Basic knowledge of BGP/ISIS/LDP/PIM (what are these protocols doing in general).

2

Troubleshooting guide for T/M/J JUNOS routers

1.1

Document HistoryReason for Change Modified by Date

Version

Related documents

Vendor documentation at www.juniper.net

3

Troubleshooting guide for T/M/J JUNOS routers

22.1

Troubleshooting guidelinesBasis troubleshooting for all events

Below commands should be run in all situations: show show show show show version system uptime log messages | last 100 chassis alarms chassis hardware

Details: show version -> this will show the model your are working on show system uptime > this will show the current system uptime and when it has been configured for the last time. It will indicate via the load figures how busy the system is. show log messages | last 100 -> this will show the last 100 events which happened on the router show chassis alarms -> this will show if there are any alarms active on the router for the chassis. show chassis hardware -> this will show which hardware is present

4

Troubleshooting guide for T/M/J JUNOS routers

2.2

Common events

2.2.1 Power supply failureDiagnostics: Below commands should be run in case of power supply failure. Please note that not all systems have a PEM module. show chassis environment show chassis environment pem Impact: Most chassis will have redundant power supply. Common causes: Power supply has failed External power interrupted

Solution: Replace power supply Fix external power

Further reference: For additional information: http://www.juniper.net/techpubs/software/nog/nog-hardware/html/noghardwareTOC.html

2.2.2 Fan failure/Temperature alertDiagnostics: Below commands should be run in case of fan failure: show chassis environment Impact: Most chassis will have redundant failures. Overheating can be caused if the fan is not fixed. Common causes: Fan has failed Air filter is dirty Housing location is to hot

Solution: 5

Troubleshooting guide for T/M/J JUNOS routers

Replace Fan/Clear air filters Contact housing location

Further reference: For additional information: http://www.juniper.net/techpubs/software/nog/nog-hardware/html/noghardwareTOC.html

2.2.3 Device reboot with unknown causeDiagnostics: Below commands should be run in case of unknown cause reboot show log messages Impact: It depends where in the topology this system is. In general for systems with an access related function this means a short outage has occurred. In the core impact should be minimal Common causes: Power failure Bug/crash Routing engine failure (can be hard-disk failure on RE)

Solution: Cases should be created with vendor for analysis as soon as possible to establish if it is a software issue or hardware issue. If failing hardware is the cause; it must be replaced in a service window on redundant systems. On non-redundant systems it must be replaced as soon as possible. If a component is causing re-occurring failures the component should be removed as soon as possible. If it is non-redundant it should be replaced as soon as possible. If a software bug is causing re-occurring failures it should be escalated to next level of support.

Further reference: For additional information: http://www.juniper.net/techpubs/software/nog/nog-hardware/html/noghardwareTOC.html

6

Troubleshooting guide for T/M/J JUNOS routers

2.2.4 Chassis event (component failure)Diagnostics: Below commands should be run in case of component failures on a chassis: show show show show chassis chassis chassis chassis alarms craft-interface routing-engine fpc

Look in the further reference section for your specific model (and then under monitoring model XXX components section). Impact: PIC -> This will cause interface problems FPC -> This will cause multiple PIC problems Other components -> Other components will be chassis related (see further information)

Common causes: Hardware failure

Solution: Replace the hardware via the vendor contract. In most cases there will be a service contract with a 3 hour time-to-fix. Open a ticket with this supplier as soon as possible and let them replace the hardware.

Further reference: For additional information: http://www.juniper.net/techpubs/software/nog/nog-hardware/html/noghardwareTOC.html

7

Troubleshooting guide for T/M/J JUNOS routers

2.2.5 Routing-engineDiagnostics: Below commands should be run in case of routing-engine failures: show chassis alarms show