Upload
brenna-hoof
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Copyright © 2005 Juniper Networks, Inc. Proprietary and Confidential www.juniper.net
4-1
Operating Juniper Networks Routers in the Enterprise
Chapter 9: Troubleshooting
Copyright © 2007 Juniper Networks, Inc. 9-2Education Services
9-2
Chapter Objectives
After successfully completing this chapter, you will be able to:•Describe the layered troubleshooting methodology•Identify and use resources and troubleshooting
tools•List some best practices that promote
troubleshooting•Troubleshoot problems related to hardware,
software, interfaces, and protocols on a Juniper Networks enterprise routing platform
Copyright © 2007 Juniper Networks, Inc. 9-3Education Services
9-3
Agenda: Troubleshooting
Troubleshooting Methodology Resources and Troubleshooting Tool Kit Best Practices Troubleshooting Hardware Troubleshooting Software Troubleshooting Interfaces Troubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-4Education Services
9-4
General Troubleshooting Tips
You must know what is normal for your system•Baseline should be established during normal operations
Start with a visual inspection•Check power, grounds, connections, and configurations
A divide-and-conquer approach is ideal when multiple faults can lead to a common symptom•Reduce the system to the minimum components during
test Failure hypotheses should be testable—be
definitive about what is or is not being tested with a given test•Each test should reduce the number of possible causes
for the problem regardless of pass/fail status Do not be blinded by subjectivity—keep an open
mind
Copyright © 2007 Juniper Networks, Inc. 9-6Education Services
9-6
A Layered Troubleshooting Approach
Modern communications networks are modeled around layered architectures•Each layer depends on the services of the underlying layer(s)
Matching a symptom to the root-cause layer is a critical step in rapid diagnosis and restoration•Numerous failure scenarios can result in a common symptom
like no route to the remote host•Allows escalation and hand-off to the appropriate group
Identifying the specific fault within the root-cause layer is icing on the cake!•Problem resolution is above and beyond fault confirmation
and root-cause layer determination
Copyright © 2007 Juniper Networks, Inc. 9-8Education Services
9-8
Layered Troubleshooting Case Study
SubscriberSite 2
SubscriberSite 1 PE
CPE
CPE
PECPE PE
P P
ProviderNetwork
Frame Relay Ethernet
SONET/ATM
OSPF/BGP
Application Flows (HTTP)
Symptom: No HTTP connectivity between subscriber sites•Identify the layers that can account for this symptom,
and indicate their scope on the diagram
•Identify specific faults that could lead to the symptom at each layer identified
Copyright © 2007 Juniper Networks, Inc. 9-9Education Services
9-9
The Control and Forwarding Planes
The control plane provides the signaling and routing intelligence needed to establish forwarding state•Problems in the control plane often show up as a lack of routes
• A high degree of independence exists between the control and forwarding planes
•Generally a good idea to begin diagnosis at the control plane
Routing
EngineIngres
sFPC/PIC 0
1
2
3
0
1
2
3
IP II
Packet Forwarding
Engine
EgressFPC/PIC
FT
Control Plane
Forwarding Plane
Keepalives, IGP, BGP, policy, RSVP, LDP, etc.
Physical errors, MTU, firewall filters, policers, etc.
Copyright © 2007 Juniper Networks, Inc. 9-10Education Services
9-10
Agenda: Troubleshooting
Troubleshooting MethodologyResources and Troubleshooting Tool Kit Best Practices Troubleshooting Hardware Troubleshooting Software Troubleshooting Interfaces Troubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-11Education Services
9-11
Troubleshooting Resources
The troubleshooting resources include: •Online documentation
• Technical publications: – http://www.juniper.net/techpubs/
• Network Operations Guide:– http://www.juniper.net/techpubs/software/nog/
•JTAC• Support Engineers• Knowledge Base• Bug search• Technical forums (J-Net Communities)
Copyright © 2007 Juniper Networks, Inc. 9-12Education Services
9-12
Troubleshooting Tool Kit
The troubleshooting tool kit includes: •Visual indicators•The JUNOS software CLI
• Key commands• Process restart and hardware online/offline• Network and diagnostic utilities
•System logs and protocol tracing•Core files• Interactive UNIX shell and hidden commands
Copyright © 2007 Juniper Networks, Inc. 9-13Education Services
9-13
POWER LEDALARM LED
STATUS LED PIM Status LED
Visual Indicators
Front panel indicators summarize platform status •STATUS: Blinks green during kernel boot, solid
green after boot, and blinks red on error•ALARM: Red indicates a major alarm, yellow
indicates a minor alarm •POWER: Solid green when powered on, blinks green
when powering off•PIM Status: PIM status LEDs vary by interface type
Copyright © 2007 Juniper Networks, Inc. 9-14Education Services
9-14
The JUNOS Software CLI: Key Commands
Key operational mode commands include:•show chassis
• alarms, environment, hardware, routing-engine, fpc, craft-interface, etc.
•show system• statistics, storage, connections, users, etc.
•show interfaces• terse, detail, filters, policers, etc.
•show route• protocol, hidden, detail, advertising-protocol, receive-protocol, etc.
•monitor interface•monitor traffic•request support information
Copyright © 2007 Juniper Networks, Inc. 9-16Education Services
9-16
You can restart most software processes from the CLI•Restart of other processes requires escape to a shell
The JUNOS Software CLI: Restarting a Software Process (daemon) (1 of 2)
user@host> restart ?Possible completions: adaptive-services Adaptive services process audit-process Audit process autoinstallation Autoinstallation process .... routing Routing protocol process sampling Traffic sampling control process sdk-service SDK Service Daemon service-deployment Service Deployment System (SDX) process service-pics Service PICs process snmp Simple Network Management Protocol process soft Soft reset (SIGHUP) the process usb-control USB supervise process vrrp Virtual Router Redundancy Protocol process web-management Web management process
user@host> restart routing Routing protocol daemon started, pid 5042
Copyright © 2007 Juniper Networks, Inc. 9-17Education Services
9-17
The JUNOS Software CLI: Restarting a Software Process (daemon) (2 of 2)
The routing protocol daemon (rpd) handles all routing protocols•Bouncing rpd with a restart routing command
disrupts all rpd components•Use deactivate to bounce a specific rpd
component; the example bounces BGP while leaving OSPF untouched:[edit]user@host# show protocols bgp { group x65412 { peer-as 65412; neighbor 172.14.51.2; }}ospf { area 0.0.0.0 { interface fe-2/0/1.0; interface lo0.0; }}. . .
. . .[edit]user@host# deactivate protocols bgp
[edit]user@host# commit commit complete
[edit]user@host# rollback 1 load complete
[edit]user@host# commit commit complete
Copyright © 2007 Juniper Networks, Inc. 9-18Education Services
9-18
user@host> request chassis fpc ?Possible completions: offline Turn an FPC offline online Turn an FPC online restart Restart an FPC slot FPC slot number (0..3)user@host> request chassis fpc slot 0 restart Restart initiated, use "show chassis fpc" to verifyuser@host> show chassis fpc Temp CPU Utilization (%) Memory Utilization (%)Slot State (C) Total Interrupt DRAM (MB) Heap Buffer 0 Starting 32 0 0 0 0 0 1 Online 30 0 0 8 11 14 2 Empty 3 Empty
FPCs, PICs, and PIMs can be restarted or brought offline/online using the CLI:
The JUNOS Software CLI: Hardware Online and Offline
Copyright © 2007 Juniper Networks, Inc. 9-19Education Services
9-19
Ping and traceroute utilities•Optional switches available to help with fault
isolation• source, do-not-fragment, size, tos, etc.
Telnet, SSH, and FTP support•Ability to specify nonstandard ports
The monitor traffic command provides CLI access to the tcpdump utility•Only displays traffic originating or terminating on
local RE• The best way to perform analysis of Layer 2 protocols• Protocol filtering currently requires writing and reading
from a file (hidden write-file and read-file options)
The JUNOS Software CLI: Network Utilities and Applications
Copyright © 2007 Juniper Networks, Inc. 9-20Education Services
9-20
System Logs and Protocol Tracing: Review System logging:
•Standard UNIX syslog configuration syntax• Primary syslog file is /var/log/messages • Most daemons also write to individual log files
•Numerous facilities and severity levels are supported• The facility defines the class of log message, while the severity
level determines the level of logging detail •Local and remote syslog support
• Remote logging (and archiving) recommended for troubleshooting
Tracing decodes protocol packets and certain router events•Referred to as debug by some other vendors•Tracing operations include:
• Global routing behavior• Router interfaces• Protocol-specific information
Copyright © 2007 Juniper Networks, Inc. 9-21Education Services
9-21
System Logs and Protocol Tracing: Key Commands Use show log file-name to display contents
•Use the pipe (|) option to filter displayed output•Monitor a log or trace file in real time with the CLI’s monitor start command
• Use the pipe (|) option to filter real-time output• Use Esc+q to enable or disable real-time output to screen• Issue a monitor stop to cease all real-time monitoring
To stop a tracing operation, delete a trace flag or the entire stanza
Log and trace file manipulation•Use clear log to truncate (clear) log files•Use file delete to delete log and trace files
Copyright © 2007 Juniper Networks, Inc. 9-23Education Services
9-23
Standard log entries consist of the following fields:•Timestamp, platform name, software process
name/PID, a message code, and the message textApr 29 09:43:08 host chassisd[2320]: CHASSISD_FRU_EVENT: scb_recv_slot_detach: FPC 1 detach
•Use explicit-priority to alter the message format to include a numeric priority value
Apr 29 09:41:27 %DAEMON-5-CHASSISD_FRU_EVENT: host chassisd[2320]: scb_recv_slot_detach: FPC 1 detach
Consult the System Log Messages Reference documentation for details on log entries•Use help syslog ? for help in decoding message
codesuser@host> help syslog CHASSISD_IFDEV_DETACH_FPC Name: CHASSISD_IFDEV_DETACH_FPCMessage: ifdev_detach(<fpc-slot-number>)Help: chassisd detached all PIC ifdevs on FPCDescription: The chassis process (chassisd) detached the interface devices (ifdevs) for all PICs on the indicated FPC.Type: Event: This message reports an event, not an errorSeverity: notice
System Logs and Protocol Tracing: Interpreting Syslog Messages
Copyright © 2007 Juniper Networks, Inc. 9-25Education Services
9-25
Core Files Modern computing environments are complex and
therefore, have complex bugs•Transient software failures are extremely hard to
reproduce and, therefore, difficult to fix• Can also be triggered by hardware errors
•Well-written code dumps a core file for diagnostic analysis when a fatal fault (panic) occurs
• The stack trace identifies the offending process’s name, memory pointers, and register data at the time of the fault
• In JUNOS software numerous entities can dump a core at panic or upon command, including:
• The JUNOS kernel, software daemons, and embedded hosts in the PFE
•The storage locations and handling of core files can vary• Core files are written to the /var/crash/ or /var/tmp/
directories
Copyright © 2007 Juniper Networks, Inc. 9-26Education Services
9-26
The Interactive Shell and Hidden Commands Interactive UNIX shell and hidden command
support•Unless directed by JTAC, working in the shell and
using hidden commands is unsupported and potentially dangerous
•CLI users can escape to an interactive shell only when permitted by their login class
Copyright © 2007 Juniper Networks, Inc. 9-27Education Services
9-27
Hidden Command Example
The commit function is optimized•Goal is to avoid disruption to daemons and processes not
affected by a configuration change The hidden full switch shakes up the box
•Causes all processes including init to receive a SIGHUP• Forces reread of configuration, reactivating the entire
configuration•An excellent way to restart a process that is disabled
because of thrashingHidden switch
[edit]user@host# commit full Mar 19 14:33:36 host mgd[2510]: UI_COMMIT: User ‘user' performed commit: no commentMar 19 14:33:42 host init: product mask 0x70000, model 4 Mar 19 14:33:42 host rpd[2470]: RPD_OSPF_CFGNBR_P2P: Ignoring configured neighbors. . .Mar 19 14:33:43 host init: ntp (PID 3722) exit on SIGHUP, will be restartedMar 19 14:33:43 host init: ntp (PID 3957) startedMar 19 14:33:43 host xntpd[3957]: ntpd 4.0.99b Thu Feb 26 03:07:34 GMT 2004 (1)commit complete
Copyright © 2007 Juniper Networks, Inc. 9-28Education Services
9-28
Agenda: Troubleshooting
Troubleshooting Methodology Resources and Troubleshooting Tool KitBest Practices Troubleshooting Hardware Troubleshooting Software Troubleshooting Interfaces Troubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-29Education Services
9-29
Out-of-Band Management Network
An OoB management network is critical in times of network outage•Console access recommended for maintenance
activities•Console access required for password recovery as
well as other administrative tasks
Terminal
Server
Management Workstation.100
Console Ports
Firewall/Router
Copyright © 2007 Juniper Networks, Inc. 9-30Education Services
9-30
Monitoring Devices Using SNMP
Configure SNMP monitoring at [edit snmp] hierarchy level•SNMP communities allow
central network management system to monitor router
• Define authorization level and client list
•SNMP traps allow router to send notifications to network management system when significant events occur
• Define trap categories and targets
[edit]user@host# show snmp community Juniper { authorization read-only; clients { 10.210.9.189/32; 0.0.0.0/0 restrict; }}trap-group trap-door { categories { chassis; link; routing; } targets { 10.210.9.189; }}
Restricts all other clients from
polling local device
Copyright © 2007 Juniper Networks, Inc. 9-31Education Services
9-31
Backup Configuration Files
Configure system for automated configuration file backups at [edit system archival] hierarchy•Perform regular backups at scheduled intervals or
whenever a new configuration file is committed[edit]user@host# show system archival configuration { transfer-on-commit; archive-sites { "ftp://[email protected]:/archive" password "$9…"; ## SECRET-DATA "scp://[email protected]:/archive" password "$9…"; ## SECRET-DATA }}
Backup occurs when commit is
issued
First URL listed will be used unless
transfer failsTransfer options include both FTP
and SCP
Copyright © 2007 Juniper Networks, Inc. 9-32Education Services
9-32
Recommended Syslog Settings
Where possible, your syslog should be configured to:•Write entries to both a local file and to a remote host
• Remote archiving is helpful if the local storage drive fails• Configure remote syslog service to retain log files for at least
one month•Use archive settings to maintain at least 20 archive
files with a minimum 1-MB file size (resources permitting)
• Default number of files is 10, default size is platform specific• 128-KB size on J-series routers• 1-MB size on all M-series routers• Especially important if remote syslog is not in effect
•Log interactive CLI commands and configuration changes
• Achieved with the interactive-commands and change-log facilities using the info severity level
• Provides an audit trail of who did what, and when
Copyright © 2007 Juniper Networks, Inc. 9-33Education Services
9-33
Recommend synchronizing router clocks with NTP•Correlated timestamps in log files assist fault
analysis• Also useful in forensic analysis of security incidents
JUNOS software cannot provide primary time reference•An external device is needed for synchronization
• A simple UNIX box using an undisciplined local clock will suffice
•Support for client, server, or symmetric modes, with or without authentication
•Use the show ntp associations command to confirm synchronization status
Clock Synchronization
Boot server is used to set initial NTP time during boot
The configured list of possible synchronization sources
A simple NTP client-mode configuration
[edit system ntp]user@host# showboot-server 10.0.1.201;server 10.0.1.201;server 10.0.1.202;
Copyright © 2007 Juniper Networks, Inc. 9-35Education Services
9-35
Lab 7—Parts 1–3: Troubleshooting
Use the CLI troubleshooting tools. Establish a baseline of operation for your
team’s station. Add best-practice configuration that
promotes troubleshooting and facilitates disaster recovery.
Copyright © 2007 Juniper Networks, Inc. 9-36Education Services
9-36
Agenda: Troubleshooting
Troubleshooting Methodology Resources and Troubleshooting Tool Kit Best PracticesTroubleshooting Hardware Troubleshooting Software Troubleshooting Interfaces Troubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-37Education Services
9-37
Hardware Troubleshooting Tools
Visual indicators:•Red LEDs indicate failure•Many individual components have their own status
indicators JUNOS software CLI:
• Interactive failure analysis using show commands•Hardware components can be restarted or taken
offline/online using request chassis commands System logs (syslog):
•Log files contain a wealth of invaluable information• CLI show log log-file-name command• Remember to use pipe for added functionality
Copyright © 2007 Juniper Networks, Inc. 9-38Education Services
9-38
Hardware Troubleshooting Chart
Alarms active? Display/view alarms
HW-relatedlog entries?
Parse/view syslogsand act accordingly
LED indicationof component
failure?
View LED status/display Craft Interface
FPC/PIC/portoperational?
Display interface andhardware status
Investigate software faults
show chassis alarms
show chassis craft-interface
show log messages
monitor start [messages | chassisd]
show chassis hardwareshow chassis fpc
show interfaces terseshow interfaces interface-name detail
show chassis craft-interface
show log chassisd
show pfe statistics error
show log log-file-name
Copyright © 2007 Juniper Networks, Inc. 9-39Education Services
9-39
Hardware Case Study (1 of 4)
Case study background:•You have received notification that two ATM links
went down•These ATM links are served by two OC12c PICs in
an M120 router’s FPC slot 1 What is wrong?
•What CLI commands help narrow down a possible cause?
Copyright © 2007 Juniper Networks, Inc. 9-40Education Services
9-40
user@host> show chassis fpc Temp CPU Utilization (%) Memory Utilization (%)Slot State (C) Total Interrupt DRAM (MB) Heap Buffer 0 Online 30 1 0 8 16 15 1 Dormant 30 0 0 8 11 14 2 Empty 3 Empty
user@host> show log messages | match FPC Mar 20 10:19:32 host chassisd[2308]: CHASSISD_FRU_EVENT: scb_recv_slot_detach: FPC 1 detachMar 20 10:19:32 host chassisd[2308]: CHASSISD_IFDEV_DETACH_FPC: ifdev_detach(1)Mar 20 10:19:32 host chassisd[2308]: CHASSISD_SNMP_TRAP10: SNMP trap: FRU power off: jnxFruContentsIndex 7, jnxFruL1Index 2, jnxFruL2Index 0, jnxFruL3Index 0, jnxFruName FPC @ 1/*/*, jnxFruType 3, jnxFruSlot 2, jnxFruOfflineReason 14, jnxFruLastPowerOff 76879080, jnxFruLastPowerOn 69264045
Sample course of action:1.Determine if any alarms are active (CLI method
shown):
2.Parse system log files for related entries:
3.Confirm FPC status:
user@host> show chassis alarms No alarms currently active
Hardware Case Study (2 of 4)
No alarms present
Log entries indicate that FPC 1 was taken offline!
The FPC is offline!
Copyright © 2007 Juniper Networks, Inc. 9-41Education Services
9-41
user@host> request chassis fpc online slot 1 Online initiated, use “show chassis fpc” to verify
user@host > show chassis fpc Temp CPU Utilization (%) Memory Utilization (%)Slot State (C) Total Interrupt DRAM (MB) Heap Buffer 0 Online 30 1 0 8 16 15 1 Probed 30 0 0 0 0 0…user@host > show chassis fpc Temp CPU Utilization (%) Memory Utilization (%)Slot State (C) Total Interrupt DRAM (MB) Heap Buffer 0 Online 30 1 0 8 16 15 1 Online 30 0 0 8 11 14…
Hardware Case Study (3 of 4)
Sample course of action (contd.):4. Attempt to bring the FPC back online:
Copyright © 2007 Juniper Networks, Inc. 9-42Education Services
9-42
Hardware Case Study (4 of 4)
What problem sources can you eliminate? What might have caused the FPC to go
offline?•Too bad CLI logging was not enabled…
Copyright © 2007 Juniper Networks, Inc. 9-43Education Services
9-43
Agenda: Troubleshooting
Troubleshooting Methodology Resources and Troubleshooting Tool Kit Best Practices Troubleshooting HardwareTroubleshooting Software Troubleshooting Interfaces Troubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-44Education Services
9-44
Software Troubleshooting Tools
The JUNOS software CLI:•Use show commands to narrow focus•Use commit full to reapply entire configuration•Use restart process-name to restart a process
System logs (syslog):•Log files contain a wealth of invaluable
information• Use the CLI show log log-file-name command• Remember to use pipe for added functionality
Core analysis•Core files are stored in /var/tmp or /var/crash
depending on the type of core•Open a support ticket and work with JTAC for
core-file analysis
Copyright © 2007 Juniper Networks, Inc. 9-46Education Services
9-46
Software Troubleshooting Chart
SW-relatedlog entries?
Parse/view syslogs and act accordingly
Investigate interface faults
show log messagesmonitor start messages
show system core-dumpsfile list /var/tmp/*core* Core files?
Determine if core files are present file list /var/crash/*core*
Hardware is OK
Software processrunning?
Display running processes
show system processesshow system connections
file show /etc/services
Copyright © 2007 Juniper Networks, Inc. 9-47Education Services
9-47
Software Case Study (1 of 3)
Case study background:• The people in the management group report that
they have lost SNMP contact with your router • No hardware alarms or malfunctions are evident
What is wrong?• What CLI commands and fault analysis steps can
help narrow down a possible cause?
Copyright © 2007 Juniper Networks, Inc. 9-48Education Services
9-48
Software Case Study (2 of 3)
Sample course of action:1. Parse system log files for SNMP-related entries:
user@host> show log messages | match snmp | match coreApr 25 00:33:26 host dumpd: Core and context for snmpd saved in /var/tmp/snmpd.core-tarball.0.tgzApr 25 00:33:29 host dumpd: Core and context for snmpd saved in /var/tmp/snmpd.core-tarball.1.tgzApr 25 00:33:34 host dumpd: Core and context for snmpd saved in /var/tmp/snmpd.core-tarball.2.tgz. . . .user@host> show log messages | match thrashApr 25 00:33:47 Sydney init: snmp is thrashing, not restarted
2. Determine if the snmpd process is running:user@host> show system processes | match snmpd user@host> file show /etc/services | match snmp snmp 161/tcpsnmp 161/udp
user@host> show system connections | match 161
snmpd repeatedly crashed and was
shut down to prevent thrashing
snmpd is not running: no surprise that
management contact was lost
Copyright © 2007 Juniper Networks, Inc. 9-49Education Services
9-49
Software Case Study (3 of 3)
Sample course of action (contd.):3.Confirm that core files are present:
4.Open a support case to have the core files and related context analyzed
user@host> show system core-dumps /var/crash/*core*: No such file or directory-rw------- 1 root field 113825 Apr 25 00:33 /var/tmp/snmpd.core-tarball.0.tgz-rw------- 1 root field 70399 Apr 25 00:33 /var/tmp/snmpd.core-tarball.1.tgz-rw------- 1 root field 70380 Apr 25 00:33 /var/tmp/snmpd.core-tarball.2.tgz-rw------- 1 root field 100891 Apr 25 00:33 /var/tmp/snmpd.core-tarball.3.tgz-rw------- 1 root field 101109 Apr 25 00:33 /var/tmp/snmpd.core-tarball.4.tgz-rw-rw---- 1 root field 1024000 Apr 25 00:33 /var/tmp/snmpd.core.0-rw-rw---- 1 root field 704512 Apr 25 00:33 /var/tmp/snmpd.core.1-rw-rw---- 1 root field 704512 Apr 25 00:33 /var/tmp/snmpd.core.2-rw-rw---- 1 root field 958464 Apr 25 00:33 /var/tmp/snmpd.core.3-rw-rw---- 1 root field 958464 Apr 25 00:33 /var/tmp/snmpd.core.4total 10
Copyright © 2007 Juniper Networks, Inc. 9-50Education Services
9-50
Agenda: Troubleshooting
Troubleshooting Methodology Resources and Troubleshooting Tool Kit Best Practices Troubleshooting Hardware Troubleshooting SoftwareTroubleshooting Interfaces Troubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-51Education Services
9-51
Interface Troubleshooting Considerations (1 of 2)
Understanding the demarcation:•Europe typically excludes the CSU/DSU (CPE
perspective) because equipment is owned by the telco
•North America typically includes the CSU/DSU (CPE perspective) because it is owned by the customer
Topology determines troubleshooting approach— three topology types to consider when troubleshooting:•LAN/broadcast multiaccess (Fast/Gigabit Ethernet)•Point-to-point (SONET/SDH, T3/E3, T1/E1, PPP, or Cisco
HDLC)•Point-to-multipoint (SONET/SDH, T3/E3, T1/E1, Frame
Relay or ATM-VC)
Copyright © 2007 Juniper Networks, Inc. 9-52Education Services
9-52
Interface Troubleshooting Considerations (2 of 2) Configuration details must be set correctly and in
some cases match at both ends; consider both physical and logical settings•Physical properties:
• Clocking, scrambling, FCS, MTU, data-link-layer protocol, keepalives
• Diagnostic capabilities (local, remote, and facility loopback, BERT)
•Logical properties:• Protocol family (Internet, ISO, MPLS)• Addresses (IP address, ISO NET address)• Virtual circuits (VCI/VPI, DLCI)
Fault isolation• If settings are correct on both ends of the circuit and
the problem persists, you must work with the telco
Copyright © 2007 Juniper Networks, Inc. 9-53Education Services
9-53
Interface Troubleshooting Tools
The JUNOS software CLI:•Use the show interfaces commands to view
interface details (add detail or extensive to view errors and alarms)
•Use monitor interface to view real-time statistics•Use show arp to view ARP table details
Diagnostic tools:•Use monitor traffic when troubleshooting Layer 2•Use ping or BERT testing for circuit error detection
and verification• Use the pattern option with ping utility when testing a
circuit for errors Loopback testing is the primary way to
distinguish between interface and circuit faults• For loopback testing details for the various interface types,
see http://www.juniper.net/techpubs/software/nog/
Copyright © 2007 Juniper Networks, Inc. 9-55Education Services
9-55
Interface Troubleshooting Chart
Local loop?
Local loop?
Investigate
protocol faults
Enable interfa
ce
Chassis/software OK
Admin Down
Bad local port
Bad L2 config
Suspect bad
IP config
Admin Up Link Down
Admin Up, Link Up
Bad remote
port
Suspect L2
config
Bad local port
Bad telco
Bad telco
No
Yes
Yes
Interface
status?
Errors or
Alarms? Can L2 be
looped?
Remote loop?
Remote loop?
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
Ping remote end?
No
No
No
Copyright © 2007 Juniper Networks, Inc. 9-56Education Services
9-56
Interface Case Study (1 of 4)
Case study background:•Circuit between London and Amsterdam is down
• Both routers are configured for cisco-hdlc encapsulation and show no chassis hardware alarms or software malfunctions
What is wrong?•What CLI commands and fault analysis steps can
help narrow down a possible cause?
WintermuteAmsterdam
lo0: 192.168.32.1se-1/0/0
.6 fe-2/0/1.1fe-2/0/1
.1
se-1/0/0.5HARLIELondon
lo0: 192.168.36.1 172.18.36.4/30
10.222.101.0/24 10.222.104.0/24
Copyright © 2007 Juniper Networks, Inc. 9-57Education Services
9-57
Sample course of action:1.Determine interface status:
2.Any errors or alarms?
user@London> show interfaces terse se-1/0/0 Interface Admin Link Proto Local Remotese-1/0/0 up downse-1/0/0.0 up down inet 172.18.36.5/30
user@London> show interfaces se-1/0/0 extensive |find errors: Input errors: Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0, Policed discards: 0, Resource errors: 0 Output errors: Carrier transitions: 0, Errors: 0, Drops: 0, MTU errors: 0, Resource errors: 0
Interface Case Study (2 of 4)
Administratively up, link level down
No input or output errors
detected
Copyright © 2007 Juniper Networks, Inc. 9-58Education Services
9-58
Sample course of action (contd.):3. Configure a local loopback:
4. Confirm local loop results:
Interface Case Study (3 of 4)
Link is up, traffic is passing
(TTL expired)
[edit]user@London# set interfaces se-1/0/0 no-keepalives
[edit]user@London# set interfaces se-1/0/0 serial-options loopback local
[edit]user@London# commit and-quit commit completeExiting configuration mode
user@London> show interfaces terse se-1/0/0 Interface Admin Link Proto Local Remotese-1/0/0 up up se-1/0/0.0 up up inet 172.18.36.5/30
user@London> ping 172.18.36.6 count 1 PING 172.18.36.6 (172.18.36.6): 56 data bytes36 bytes from 172.18.36.5: Time to live exceededVr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 0054 8e63 0 0000 01 01 8b16 172.18.36.5 172.18.36.6 --- 172.18.36.6 ping statistics ---1 packets transmitted, 0 packets received, 100% packet loss
Local loop is possible because
of L2 configuration
Copyright © 2007 Juniper Networks, Inc. 9-59Education Services
9-59
Interface Case Study (4 of 4)
What can you eliminate given the results obtained thus far?
•What test should you perform next? Assume the local loopback test also passes
on Amsterdam.•Where is the fault?
WintermuteAmsterdam
lo0: 192.168.32.1se-1/0/0
.6 fe-2/0/1.1fe-2/0/1
.1
se-1/0/0.5HARLIELondon
lo0: 192.168.36.1 172.18.36.4/30
10.222.101.0/24 10.222.104.0/24
Copyright © 2007 Juniper Networks, Inc. 9-60Education Services
9-60
Agenda: Troubleshooting
Troubleshooting Methodology Resources and Troubleshooting Tool Kit Best Practices Troubleshooting Hardware Troubleshooting Software Troubleshooting InterfacesTroubleshooting Protocols (OSPF)
Copyright © 2007 Juniper Networks, Inc. 9-61Education Services
9-61
OSPF Troubleshooting Considerations
Neighbor states:•No neighbor detected
• Check physical and data link layer connectivity• Check mismatched IP subnet/mask (on multiaccess
links), area number, area type, authentication, hello or dead interval, or network type
•Stuck in two-way state• Normal for DROther neighbors
•Stuck in exchange start• Mismatched IP MTU
Copyright © 2007 Juniper Networks, Inc. 9-62Education Services
9-62
OSPF Troubleshooting Tools
The JUNOS software CLI:•Use the show ospf commands to view OSPF
details such as neighbor state, statistics, and OSPF database
•Use the CLI to restart OSPF (or rpd if needed) Use traceoptions to trace OSPF events and
gain insight into what the protocol is doing•A typical OSPF tracing configuration:
•Use the monitor start or show log command to view the resulting log information
[edit protocols ospf]
user@host# show
traceoptions {
file ospf-trace;
flag error detail;
flag hello detail;
flag lsa-update detail;
}
Copyright © 2007 Juniper Networks, Inc. 9-63Education Services
9-63
Protocol Troubleshooting Chart
Investigate forwarding
faults
Route present
and active?
Chassis, software, interface, and transmission line are OK
IGP route?
Adjacencies up?
Suspect IGP
config
BGP sessio
n estab.
?
Yes
Suspect policy/ or IGP config
Route hidden?
No
Yes
No No Suspect config/ or IGP
No
Suspect policy/ or IGP
config
Suspect remote
peer policy
No
YesYesYes
Yes
Copyright © 2007 Juniper Networks, Inc. 9-64Education Services
9-64
OSPF Case Study (1 of 3)
Case study background:•Users from sites A and B complain that they
cannot reach network resources located in the remote site
•All interfaces are functioning correctly and no chassis hardware alarms or software malfunctions are evident
What is wrong?•What CLI commands and fault analysis steps can
help narrow down a possible cause? OSPF Area 0
WintermuteLondon
lo0: 192.168.36.1se-1/0/1
.2 fe-0/0/1.1fe-0/0/1
.2
se-1/0/0.1HARLIETokyo
lo0: 192.168.24.1
10.222.2.0/30
10.222.1.0/24 10.222.3.0/24
(DCE) (DTE)
OSPF Area 1
Site AOSPF Area 2
Site B
Copyright © 2007 Juniper Networks, Inc. 9-65Education Services
9-65
OSPF Case Study (2 of 3)
Sample course of action:1.Determine if required routes are present and
active:
2.Display OSPF neighbor status:
user@London> show route 10.222.1.0/24
user@London> show ospf neighbor Address Interface State ID Pri Dead10.222.3.2 fe-0/0/1.0 Full 192.168.32.1 128 30
No route to remote network
No OSPF neighbor for
serial interface in
Area 0
Copyright © 2007 Juniper Networks, Inc. 9-66Education Services
9-66
OSPF Case Study (3 of 3)
Sample course of action (contd.):3. After verifying that the configurations are in
place, use OSPF traceoptions to investigate cause:
And the survey says…
[edit protocols ospf]user@London# show traceoptions { file ospf-trace; flag error detail; flag hello detail; flag lsa-update detail;}
user@London> monitor start ospf-trace*** ospf-trace ***Jul 30 16:39:42 OSPF periodic xmit from (null) to 224.0.0.5 (IFL 71)Jul 30 16:39:48 OSPF packet ignored: authentication type mismatch (0) from 10.222.2.1Jul 30 16:39:48 OSPF periodic xmit from 10.222.29.1 to 224.0.0.5 (IFL 70)Jul 30 16:39:52 OSPF periodic xmit from (null) to 224.0.0.5 (IFL 71)Jul 30 16:39:56 OSPF packet ignored: authentication type mismatch (0) from 10.222.2.1…
Copyright © 2007 Juniper Networks, Inc. 9-67Education Services
9-67
Review Questions
1.Describe the layered troubleshooting methodology.
2.List the tools available for troubleshooting.3.Explain the purpose of the commit full
command.4.What are core files and why are they generated?5.How can NTP facilitate troubleshooting?6.What are some common problems with interface
connectivity?7.What is the difference between system logs and
traceoptions? How can they each help with troubleshooting efforts?
Copyright © 2007 Juniper Networks, Inc. 9-68Education Services
9-68
Lab 7—Part 4: Troubleshooting
Given a general symptom, such as users within one OSPF area not being able to communicate with users in the remote OSPF areas, use the layered troubleshooting methodology, the troubleshooting tool kit, and the troubleshooting flowcharts to investigate and repair any contributing problems.
Work with the remote team, the telco (instructor), and JTAC (instructor) as needed to work towards a resolution.
Copyright © 2007 Juniper Networks, Inc. 9-69Education Services
9-69Education Services