Network troubleshooting-guide1889

Transcend® Management SoftwareNetwork Troubleshooting Guide‘ 9 7 f o r W i n d o w s N T ®

3Com Corporation5400 Bayfront PlazaP.O. Box 58145Santa Clara, CA95052-8145

http://www.3com.com

© 19973Com CorporationAll rights reserved.Printed in the U.S.A.

09-1293-000

3/4” SPINE

1/2” SPINE

1/4”

®

Transcend® Management Software

Network Troubleshooting Guide

http://www.3com.com/

Part No. 09-1293-000Published September 1997

3Com Corporation5400 Bayfront Plaza Santa Clara, California 95052-8145

Copyright © 1997, 3Com Corporation. All rights reserved. No part of this documentation may be reproduced in any form or by any means or used to make any derivative work (such as translation, transformation, or adaptation) without permission from 3Com Corporation.

3Com Corporation reserves the right to revise this documentation and to make changes in content from time to time without obligation on the part of 3Com Corporation to provide notification of such revision or change.

3Com Corporation provides this documentation without warranty of any kind, either implied or expressed, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. 3Com may make improvements or changes in the product(s) and/or the program(s) described in this documentation at any time.

UNITED STATES GOVERNMENT LEGENDS:If you are a United States government agency, then this documentation and the software described herein are provided to you subject to the following restricted rights:

For units of the Department of Defense:Restricted Rights Legend: Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c) (1) (ii) for Restricted Rights in Technical Data and Computer Software Clause at 48 C.F.R. 52.227-7013. 3Com Corporation, 5400 Bayfront Plaza, Santa Clara, California 95052-8145.

For civilian agencies:Restricted Rights Legend: Use, reproduction, or disclosure is subject to restrictions set forth in subparagraph (a) through (d) of the Commercial Computer Software – Restricted Rights Clause at 48 C.F.R. 52.227-19 and the limitations set forth in 3Com Corporation’s standard commercial agreement for the software. Unpublished rights reserved under the copyright laws of the United States.

If there is any software on removable media described in this documentation, it is furnished under a license agreement included with the product as a separate document, in the hard copy documentation, or on the removable media in a directory file named LICENSE.TXT. If you are unable to locate a copy, please contact 3Com and a copy will be provided to you.

Unless otherwise indicated, 3Com registered trademarks are registered in the United States and may or may not be registered in other countries.

3Com, the 3Com logo, Boundary Routing, EtherDisk, EtherLink, EtherLink II, LANplex, LANsentry, LinkBuilder, LinkSwitch, NetAge, NETBuilder, NETBuilder II, Parallel Tasking, SmartAgent, SuperStack, TokenDisk, TokenLink, Transcend, and ViewBuilder are registered trademarks of 3Com Corporation. CoreBuilder, FDDILink, NetProbe, and Traffix are trademarks of 3Com Corporation. 3ComFacts is a service mark of 3Com Corporation.

AppleTalk and Macintosh are registered trademarks of Apple Computer Company. VINES is a registered trademark of Banyan Systems, Inc. CompuServe is a registered trademark of CompuServe, Inc. DECnet is a trademark of Digital Equipment Corporation. HP and OpenView are a registered trademarks of Hewlett-Packard Co. AIX, IBM, and NetView are registered trademarks of International Business Machines Corporation. Zip is a trademark of Iomega. Windows and Windows NT are registered trademarks of Microsoft Corporation. Sniffer is a registered trademark of Network General Holding Corporation. Novell is a registered trademark of Novell, Inc. OpenWindows, SunNet Manager, and SunOS are trademarks of Sun Microsystems Inc. SPARCstation is a trademark and is licensed exclusively to Sun Microsystems Inc. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Ltd.

Other brand and product names may be registered trademarks or trademarks of their respective holders.

Guide written by Patricia Johnson, Sarah Newman, Iain Young, and Adam Bell. Technical information provided by Dan Bailey, Bob McTague, Graeme Robertson, and Andrew Ward. Edited by Beth Britt and Bonnie Jo Collins. Production by Christine Zak.

ii

CONTENTS

ABOUT THIS GUIDE

Finding Specific Information in This Guide 12What to Expect from This Guide 12Conventions 13Related Documentation 15

Documents 15Help Systems 15

PART I BEFORE TROUBLESHOOTING

NETWORK TROUBLESHOOTING OVERVIEW

Introduction to Network Troubleshooting 19About Connectivity Problems 19About Performance Problems 20Solving Connectivity and Performance Problems 20

Network Troubleshooting Framework 21Troubleshooting Strategy 23

Recognizing Symptoms 24User Comments 24Network Management Software Alerts 24Analyzing Symptoms 25

Understanding the Problem 25Identifying and Testing the Cause of the Problem 26

Sample Problem Analysis 27Equipment for Testing 28

Solving the Problem 29

iii

YOUR NETWORK TROUBLESHOOTING TOOLBOX

Transcend Applications 31Transcend Central 32Status View 32

Status Watch 32MAC Watch 33Web Reporter 33

LANsentry Manager 33Traffix Manager 34Device View 35

Network Management Platforms 353Com SmartAgent Embedded Software 36Other Commonly Used Tools 38

Ping 38Strategies for Using Ping 39Tips on Interpreting Ping Messages 40

Telnet 40FTP and TFTP 40Analyzers 41Probes 41Cable Testers 42

STEPS TO ACTIVELY MANAGING YOUR NETWORK

Designing Your Network for Troubleshooting 43Positioning Your SNMP Management Station 44Using Probes 45Monitoring Business-critical Networks 47

FDDI Backbone Monitoring 48Internet WAN Link Monitoring 48Switch Management Monitoring 48

Using Telnet, Serial Line, and Modem Connections 49Using Communications Servers 50Setting Up Redundant Management 51Other Tips on Network Design 52

Management Station Configuration 52More Tips 52

iv

v

Preparing Devices for Management 53Configuring Management Parameters 53Configuring Traps 53

Configuring Transcend Software 54Monitoring Devices 54Setting Thresholds and Alarms 54

Setting Thresholds in Status Watch 55Setting Thresholds and Alarms in LANsentry Manager 55Refining Alarm Settings 56Setting Alarms Based on a Baseline 57Other Tips for Setting Thresholds and Alarms 57

Knowing Your Network 58Knowing Your Network’s Configuration 58

Site Network Map 58Logical Connections 60Device Configuration Information 60Other Important Data About Your Network 61

Identifying Your Network’s Normal Behavior 62Baselining Your Network 62Identifying Background Noise 63

PART II NETWORK CONNECTIVITY PROBLEMS AND SOLUTIONS

MANAGER-TO-AGENT COMMUNICATION

Manager-to-Agent Communication Overview 67Understanding the Problem 67Identifying the Problem 67Solving the Problem 68

Checking Management Configurations 68Manager-to-Agent Communication Reference 69

IP Address 69Gateway Address 69Subnetwork Mask 69SNMP Community Strings 69SNMP Traps 72

FDDI CONNECTIVITY

FDDI Connectivity Overview 73Understanding the Problem 73Identifying the Problem 75Solving the Problem 76

Monitoring FDDI Connections 77Status Watch 77

Making Your FDDI Connections More Resilient 77Implementing Dual Homing 77Installing an Optical Bypass Unit 79

FDDI Connectivity Reference 79Peer Wrap Condition 79Twisted Ring Condition 80Undesired Connection Attempt Event 80

PART III NETWORK PERFORMANCE PROBLEMS AND SOLUTIONS

BANDWIDTH UTILIZATION

Bandwidth Utilization Overview 85Understanding the Problem 85Identifying the Problem 85Solving the Problem 86

Identifying Utilization Problems 86Status Watch 86

Generating Historical Utilization Reports 88Web Reporter 88

Bandwidth Utilization Reference 89ATM Utilization 89Ethernet Utilization 89FDDI Utilization 90Token Ring Utilization 90

vi

BROADCAST STORMS

Broadcast Storms Overview 93Understanding the Problem 93Identifying the Problem 93Solving the Problem 94

Identifying a Broadcast Storm 94Status Watch 94Traffix Manager 96

Disabling the Offending Interface 97MAC Watch 97

Correcting Spanning Tree Misconfigurations 98Device View 98

Broadcast Storms Reference 99Broadcast Packets 99Multicast Packets 99

DUPLICATE ADDRESSES

Duplicate Addresses Overview 101Understanding the Problem 101Identifying the Problem 101Solving the Problem 101

Finding Duplicate MAC Addresses 102MAC Watch 102Status Watch 103

Finding Duplicate IP Addresses 103MAC Watch 104LANsentry Manager 104

Duplicate Addresses Reference 105Duplicate MAC Addresses 105Duplicate IP Addresses 106

vii

ETHERNET PACKET LOSS

Ethernet Packet Loss Overview 107Understanding the Problem 107Identifying the Problem 108Solving the Problem 108

Checking for Packet Loss 109Status Watch 109LANsentry Manager Network Statistics Graph 111Device View 114

Ethernet Packet Loss Reference 115Alignment Errors 115Collisions 115CRC Errors 116Excessive Collisions 116FCS Errors 116Late Collisions 116Nonstandard Ethernet Problems 117Receive Discards 118Too Long Errors 118Too Short Errors 118Transmit Discards 118

FDDI RING ERRORS

FDDI Ring Errors Overview 119Understanding the Problem 119Identifying the Problem 119Solving the Problem 120

Identifying Ring Errors 121Status Watch 121

FDDI Ring Errors Reference 121Elasticity Buffer Error Condition 121Frame Error Condition 121Frames Not Copied Condition 122Link Error Condition 122MAC Neighbor Change Event 122

viii

NETWORK FILE SERVER TIMEOUTS

Network File Server Timeout Overview 123Understanding the Problem 123Identifying the Problem 124Solving the Problem 124

Checking for Obvious Errors 124Ping and Telnet 124LANsentry Manager Alarms View 124LANsentry Manager Statistics View 125LANsentry Manager History View 125

Reproducing the Fault While Monitoring the Network 126LANsentry Manager Top-N Graph 126LANsentry Manager Packet Capture 126LANsentry Manager Packet Decode 127MAC Watch 128LANsentry Manager Packet Decode 128

Correcting the Fault 129Network File Server Timeouts Reference 129

Jabbering 129Network File System (NFS) Protocol 129

ix

PART IV REFERENCE

SNMP IN NETWORK TROUBLESHOOTING

SNMP Operation 133Manager/Agent Operation 133SNMP Messages 134Trap Reporting 134Security 135

SNMP MIBs 136MIB Tree 136MIB-II 138RMON MIB 139RMON2 MIB 1403Com Enterprise MIBs 141

INFORMATION RESOURCES

Books 143URLs 144

INDEX

x

ABOUT THIS GUIDE

About This Guide provides an overview of this guide, describes guide conventions, tells you where to look for specific information, and lists other publications that may be useful.

This guide helps you to troubleshoot connectivity and performance problems on your network using Transcend® software and other tools.

This guide is intended for network administrators who understand networking technologies and how to integrate networking devices. You should have a working knowledge of:

■ Transmission Control Protocol/Internet Protocol (TCP/IP)

■ Simple Network Management Protocol (SNMP)

■ Network management platforms (especially HP OpenView Network Node Manager from Hewlett-Packard)

■ 3Com devices on your network

You should also be familiar with the interface and features of the Transcend management software you have installed.

With subsequent releases of Transcend management software, this guide will be updated with new troubleshooting information and additional Transcend troubleshooting tools. The most current version of this guide is on the 3Com Web site: www.3Com.com.

12 ABOUT THIS GUIDE

Finding Specific Information in This Guide

This guide, which is available online (in PDF and HTML formats) and on paper, is designed to be used online. For the online version, cross-references to other sections are indicated with links in blue, underlined text, which you can click. You can print any pages as needed.

Table 1 provides guidelines for navigating through this document.

What to Expect from This Guide

This guide demonstrates how to troubleshoot problems on your network with the help of Transcend management software and other tools. It also shows you how to use Transcend software to move beyond day-to-day troubleshooting to proactive network management.

This guide does not help you identify and correct problems with installation and use of Transcend software. For that type of troubleshooting, see:

■ The Transcend Management Software Installation Guide (for help with installation and startup problems)

■ The help or user guide for a specific application (for information about troubleshooting application problems)

This guide focuses on technologies that are important for troubleshooting your network and shows how these technologies are

Table 1 Guidelines for Finding Specific Information in This Guide

If you are looking for See

An introduction to network troubleshooting, information about troubleshooting tools, and guidelines for getting ready for management

Part I: Before Troubleshooting (page 17)

Note: This part is recommended reading for users who are new to network management.

Specific troubleshooting scenarios that will help you solve real network problems

Part II: Network Connectivity Problems and Solutions (page 65)

Part III: Network Performance Problems and Solutions (page 83)

Useful background information to help you with troubleshooting tasks

Part IV: Reference (page 131)

Conventions 13

applied using Transcend management software. For additional information, see the resources listed in Information Resources (page 143).

Conventions Table 2, Table 3, and Table 4 list conventions that are used throughout this guide.

Table 2 Notice Icons

Icon Notice Type Description

Information note Important features or instructions

Caution Information to alert the user to potential damage to a program, system, or device

Warning Information to alert the user to potential personal injury

Table 3 Troubleshooting Icons

Icon Type Points out

Troubleshooting procedure

Where a troubleshooting procedure begins

Troubleshooting tip

Tips and other useful information for performing a troubleshooting task or working with a Transcend management software tool

14 ABOUT THIS GUIDE

Table 4 Text Conventions

Convention Description

Syntax The word “syntax” means you must evaluate the syntax provided and supply the appropriate values. Placeholders for values you must supply appear in angle brackets. Example:

Enable RIPIP by using the following syntax:

SETDefault !<port> -RIPIP CONTrol = Listen

In this example, you must supply a port number for <port>.

Commands The word “command” means you must enter the command exactly as shown in text and press the Return or Enter key. Example:

To remove the IP address, enter the following command:

SETDefault !0 -IP NETaddr = 0.0.0.0

Screen displays This typeface represents information as it appears on the screen.

The words “enter” and “type”

When you see the word “enter” in this guide, you must type something, and then press the Return or Enter key. Do not press the Return or Enter key when an instruction simply says “type.”

[Key] names Key names appear in text in one of two ways:

■ Referred to by their labels, such as “the Return key” or “the Escape key”

■ Written with brackets, such as [Return] or [Esc].

If you must press two or more keys simultaneously, the key names are linked with a plus sign (+). Example:

Press [Ctrl]+[Alt]+[Del].

Menu commands and buttons

Menu commands or button names appear in italics. Example:

From the Help menu, select Contents.

Words in italicized type

Italics emphasize a point or denote new terms at the place where they are defined in the text.

Words in boldface type

Bold text denotes key features.

Related Documentation 15

Related Documentation

This guide is complemented by other 3Com documents and comprehensive help systems.

Documents The following documents are shipped with your Transcend software on the compact disc entitled Transcend Enterprise Manager Online Documentation Set for Windows NT v1.0 and Windows v.6.1:

■ Transcend Management Software Installation Guide (A paper version is also shipped with the product.)

■ Transcend Management Software Getting Started Guide (A paper version is also shipped with the product.)

■ Transcend Management Software Transcend Central User Guide

■ Transcend Management Software Status View User Guide

■ Transcend Management Software LANsentry Manager User Guide

■ Transcend Management Software ATMvLAN Manager User Guide

■ Transcend Management Software Device View User Guide

Also, see the Transcend Traffix Manager User Guide, shipped with the Traffix Manager software.

Help Systems Each Transcend application contains a help system that describes how to use all the features of the application. Help includes window descriptions, instructions, conceptual information, and troubleshooting tips for that application.

You can access help from:

■ The Help menu in any application by selecting Help Topics (in the Help Topics window, you can view the Contents and Index)

■ A Help button in windows and dialog boxes

■ Your 3Com/Transcnd/Help directory (or the directory that you have set for your Transcend software installation)

16 ABOUT THIS GUIDE

I
BEFORE TROUBLESHOOTING
Network Troubleshooting Overview (page 19)

Your Network Troubleshooting Toolbox (page 31)

Steps to Actively Managing Your Network (page 43)

NETWORK TROUBLESHOOTING OVERVIEW

These sections introduce you to the concepts and practice of network troubleshooting:

■ Introduction to Network Troubleshooting (page 19)

■ Network Troubleshooting Framework (page 21)

■ Troubleshooting Strategy (page 23)

Introduction to Network Troubleshooting

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). You also continually evaluate and improve your network’s performance. Because serious networking problems can sometimes begin as performance problems, paying attention to performance can help you address issues before they become serious.

About ConnectivityProblems

Connectivity problems occur when end stations cannot communicate with other areas of your local or wide-area network. Using management tools, you can often fix a connectivity problem before the user even notices it. Connectivity problems include:

■ Loss of connectivity — Immediately correct any connectivity breaks. When users cannot access areas of your network, your organization’s effectiveness is impaired.

■ Intermittent connectivity — If connectivity is erratic, investigate the problem immediately. Although users have access to network resources some of the time, they are still facing periods of downtime. Intermittent connectivity problems could indicate that your network is on the verge of a major break.

■ Timeout problems — Timeouts cause loss of connectivity, but are often associated with poor network performance.

20 NETWORK TROUBLESHOOTING OVERVIEW

About PerformanceProblems

Your network has performance problems when it is not operating as effectively as it should. For example, response times may be slow, the network may not be as reliable as usual, and users may be complaining that it takes them longer to do their work. Some performance problems are intermittent, like instances of duplicate addresses. Other problems can indicate a growing strain on your network, such as consistently high utilization rates.

If you regularly check your network for performance problems, you can extend the usefulness of your existing network configuration and plan network enhancements, instead of waiting for a performance problem to adversely affect the users’ productivity.

Solving Connectivityand Performance

Problems

When troubleshooting your network, you employ tools and knowledge already at your disposal. With an in-depth understanding of your network, you can use network software tools, such as Ping (page 38), and network devices, such as Analyzers (page 41), to locate problems, and then make corrections, such as swapping equipment or reconfiguring segments, based on your analysis.

Transcend® management software provides another set of tools for network troubleshooting. These tools have graphical user interfaces that make managing and troubleshooting your network easier. With Transcend Applications (page 31), you can:

■ Baseline your network’s normal status so that you can use it as a basis for comparison when troubleshooting

■ Precisely monitor network events

■ Be immediately notified of critical problems on your network, such as a device losing connectivity

■ Establish alert thresholds that warn you of potential problems so that you can correct problems before they affect your network

■ Resolve problems by disabling ports or reconfiguring devices

See Your Network Troubleshooting Toolbox (page 31) for details about each troubleshooting tool.

Network Troubleshooting Framework 21

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications. This seven-layer structure provides a clear picture of how network communications work.

Protocols (rules) govern communications between the layers of a single system and among several systems. In this way, devices made by different manufacturers or using different designs can use different protocols and still be about to communicate.

Understanding how network troubleshooting fits into the framework of the OSI model will help you to identify at what layer problems are located and which type of troubleshooting tools you might want to use. For example, unreliable packet delivery could be caused by a problem with the transmission media or with a router configuration. If you are receiving high rates of FCS Errors (page 116) and Alignment Errors (page 115), which you can monitor with Status Watch, then the problem is probably located at the physical layer and not the network layer. Figure 1 shows how to troubleshoot the layers of the OSI model.

The data that network management tools can collect as it relates to the OSI model layers is described in Table 5.

Table 5 Network Data and the OSI Model Layers

Layer Data Collected Transcend Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

■ LANsentry Manager (page 33)

■ Traffix Manager (page 34)(for more detail)

Network Routing information ■ Status Watch (page 32)

■ LANsentry® Manager(for more detail)

■ Traffic Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

■ Status Watch

■ LANsentry Manager(for more detail)

Physical Error counts ■ Status Watch


Figure 1 OSI Reference Model and Network Troubleshooting

For information about network troubleshooting tools, see Your Network Troubleshooting Toolbox (page 31).

SNMPmanagers Console

SNMPmanager, agent,

proxy agent

Telnet,rlogin, FTP

UDPTCP

IP

Application

Presentation

Session

Transport

NetworkLayer 3

Analyzers

Probes

Traffix Manager

LANsentry Manager

Probes

LANsentry Manager

Status Watch

TroubleshootingTools

Examples:

Examples:

Examples:

IPX

Data link

PhysicalLayer 1

Layer 2

Ethernet

LLC

MAC

PHY

TokenRing

LLC

MAC

PHY

FDDI

LLC

MAC

PHY

PMD

Layer 4

Layer 5

Layer 6

Layer 7

StatusWatch

Cabletestingtools

Troubleshooting Strategy 23

Troubleshooting Strategy

How do you know when you are having a network problem? The answer to this question depends on your site’s network configuration and on your network’s normal behavior. See Knowing Your Network (page 58) for more information.

If you notice changes on your network, ask the following questions:

■ Is the change expected or unusual?

■ Has this event ever occurred before?

■ Does the change involve a device or network path for which you already have a backup solution in place?

■ Does the change interfere with vital network operations?

■ Does the change affect one or many devices or network paths?

Once you have an idea of how the change is affecting your network, you can categorize it as critical or noncritical. Both of these categories need resolution (except for changes that are one-time occurrences); the difference between the categories is the time you have to fix the problem.

Using a strategy for network troubleshooting helps you to approach a problem methodically and resolve it with minimal disruption to the network users. A good approach to problem resolution is:

■ Recognizing Symptoms (page 24)

■ Understanding the Problem (page 25)

■ Identifying and Testing the Cause of the Problem (page 26)

■ Solving the Problem (page 29)


RecognizingSymptoms

The first step to resolving any problem is to identify and interpret the symptoms. You may discover network problems in several ways. You may have users complaining that the network seems slow or that they cannot connect to a server. You may pass your network management station and notice that a node icon is red. Your beeper may go off and display the message: WAN connection down.

User Comments

While you can often solve networking problems before users notice a change in their environment, you invariably get feedback from your users about how the network is running, such as:

“I can’t print.”

“I can’t access the application server.”

“It’s taking me much longer to copy files across the network than it usually does.”

“I can’t log on to a remote server.”

“When I send e-mail to our other site, I get a routing error message.”

“My system freezes whenever I try to Telnet.”

Network Management Software Alerts

Network management software, as described in Your Network Troubleshooting Toolbox (page 31), can alert you to areas of your network that need attention. For example:

■ The application displays red (Warning) icons.

■ Your weekly Top-N utilization report (which provides you with a table of the top ten ports showing the highest utilization rates) shows that one port is experiencing much higher utilization levels than normal.

■ You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded.

These signs usually provide additional information about the problem, allowing you to focus on the right area.


Analyzing Symptoms

When confronted with a symptom, ask yourself these types of questions to narrow the location of the problem and to get more data for analysis:

■ To what degree is the network not acting normally (for example, does it now take one minute to perform a task that normally takes five seconds)?

■ On what subnetwork is the user located?

■ Is the user trying to reach a server, end station, or printer on the same subnetwork or on a different subnetwork?

■ Are many users complaining that the network is operating slowly or that a specific network application is operating slowly?

■ Are many users reporting network logon failures?

■ Are the problems intermittent? For example, some files may print with no problems, while other printing attempts generate error messages, make users lose their connections, and cause systems to freeze.

Understanding theProblem

Networks are designed to move packets of data from a transmitting device to a receiving device. When communication becomes problematic, you must determine why packets are not traveling as expected and then find a solution. The two most common causes for packets not moving reliably from source to destination are:

■ The physical connection breaks (that is, a cable is unplugged or broken).

■ A network device is not working properly and cannot send or receive some or all packets.

Network management software can easily locate and report a physical connection break (layer 1 problem). You will find it harder to determine why a network device is not working as expected, which is often related to a layer 2 or a layer 3 problem.


When trying to determine why a network device is not working properly, check first for:

■ Valid service — Is the device configured properly for the type of service it is supposed to provide? For example, has Quality of Service (QoS), the definition of the transmission parameters, been established?

■ Restricted access — Is an end station supposed to be able to connect with a specific device or is that connection restricted? For example, is a firewall set up preventing that device from accessing certain network resources?

■ Correct configuration — Is there a misconfiguration of IP address, network mask, gateway, or broadcast address? Network problems are commonly caused by misconfiguration of newly connected or configured devices. See Manager-to-Agent Communication (page 67) for more information.

Identifying andTesting the Cause of

the Problem

After you develop a possible theory about what is causing the problem, you must test your theory. The test must conclusively prove or disprove your theory.

A general rule of troubleshooting is that, if you cannot reproduce a problem, then no problem exists unless it happens again on its own. However, if the problem is intermittent and you cannot replicate it, you can configure your network management software to catch the event in progress.

For example, with LANsentry Manager (page 33), you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again. See Configuring Transcend Software (page 54) for more information.

Although network management tools can provide a great deal of information about problems and their general location, you may still need to swap equipment or replace components of your network setup until you locate the exact trouble spot.

After testing your theory, you should either fix the problem as described in Solving the Problem (page 29) or develop another theory to check.


Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident.

On your network, a user reports that she cannot access her mail server. You need to establish two areas of information:

■ What you know — In this case, the workstation cannot communicate with the server.

■ What you do not know and need to test —

■ Can the workstation communicate with the network at all, or is the problem limited to communication with the server? Test by sending a Ping (page 38) or by connecting to other devices.

■ Is the workstation the only device that is unable to communicate with the server, or do other workstations have the same problem? Test connectivity at other workstations.

■ If other workstations cannot communicate with the server, can they communicate with other network devices? Again, test the connectivity.

The analysis process follows these steps:

1 Can the workstation communicate with any other device on the subnetwork?

■ If no, then go to test 2.

■ If yes, determine if it is only the server that is unreachable.

■ If only the server cannot be reached, this suggests a server problem. Confirm by doing test 2.

■ If other devices cannot be reached, this suggests a connectivity problem in the network. Confirm by doing test 3.

2 Can other workstations communicate with the server?

■ If no, then most likely it is a server problem. Go to test 3.

■ If yes, then the problem is that the workstation is not communicating with the subnetwork. (This situation can be caused by workstation issues or a network issue with that specific station.)

3 Can other workstations communicate with other network devices?

■ If no, then the problem is likely a network problem.

■ If yes, the problem is likely a server problem.


When you determine whether the problem is with the server, subnetwork, or workstation, you can further analyze the problem, as follows:

■ For a problem with the server, examine whether the server is running, if it is properly connected to the network, and if it is configured appropriately.

■ For a problem with the subnetwork, examine any device on the path between the users and the server.

■ For a problem with the workstation, examine whether the workstation can access other network resources and if it is configured to communicate with that particular server.

Equipment for Testing

To help identify and test the cause of problems, have available:

■ A laptop computer loaded with a terminal emulator, IP stack, TFTP server, CD-ROM drive (with which you can read the online documentation), and some key network management applications, such as LANsentry Manager. With the laptop computer, you can plug into any subnetwork to gather and analyze data about the segment.

■ A spare managed hub to swap for any hub that does not have management. Swapping in a managed hub allows you to quickly spot which port is generating the errors.

■ A single port probe to insert in the network if you are having a problem where you do not have management capability.

■ Console cables for each type of connector, labeled and stored in a secure place.


Solving the Problem Many device or network problems are straightforward to resolve, but others yield misleading symptoms. If one solution does not work, continue with another.

A solution often involves:

■ Upgrading software or hardware (for example, upgrading to a new version of agent software or installing Gigabit Ethernet devices)

■ Balancing your network load by analyzing:

■ What users communicate with which servers

■ What the user traffic levels are in different segments of your network

Based on these findings, you can decide how to redistribute network traffic.

■ Adding segments to your LAN (for example, adding a new switch where utilization is continually high)

■ Replacing faulty equipment (for example, replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems, have available:

■ Spare hardware equipment (such as modules and power supplies), especially for your critical devices

■ A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen when there is a power outage)

The Transcend application suite Network Admin Tools allows you to save and reload your software configurations to devices.


YOUR NETWORK TROUBLESHOOTING TOOLBOX

A robust network troubleshooting toolbox consists of items (such as network management applications, hardware devices, and other software) essential for recognizing, diagnosing, and solving networking problems. It contains:

■ Transcend Applications (page 31)

■ Network Management Platforms (page 35)

■ 3Com SmartAgent Embedded Software (page 36)

■ Other Commonly Used Tools (page 38)

Transcend Applications

Transcend® management software is optimized for managing 3Com devices and their attached networks. However, some applications, such as LANsentry® Manager, can manage any vendor’s networking equipment that complies with the Remote Monitoring (RMON) MIB.

This section describes these Transcend applications, which you can use to troubleshoot your network:

■ Transcend Central (page 32)

■ Status View (page 32)


■ Traffix Manager (page 34)

■ Device View (page 35)

This guide primarily focuses on using these applications to troubleshoot your network.

32 YOUR NETWORK TROUBLESHOOTING TOOLBOX

Transcend Central Transcend Central, an asset management and device grouping application, is your starting point for understanding what your network consists of and for controlling the Transcend network management troubleshooting tools. Transcend Central is available as both a native Windows application and a Java application that you can access using a browser.

Using Transcend Central for troubleshooting, you can:

■ Display an inventory of device, module, and port information.

■ Group devices to make your troubleshooting tasks easier. Managing a collection of devices allows you to simultaneously perform the same tasks on each device in a group and to locate physical or logical problems on your network.

■ Launch Transcend applications, including some of your primary Transcend troubleshooting tools:

■ Status View (page 32), which includes Status Watch and MAC Watch (from the native version) and Web Reporter (from the Java version)


■ Device View (page 35)

Status View The Status View applications manage 3Com devices and their attached networks. Status View applications primarily poll for MIB-II (page 138) data.

Check the Status View help to see which 3Com devices are supported by each Status View application.

Status Watch

Status Watch is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention.

Transcend Applications 33

MAC Watch

MAC Watch is an address collection and discovery application that:

■ Polls managed devices for all MAC addresses

■ Polls managed devices and routers for IP addresses to perform MAC-to-IP address translation

■ Allows you to disable troublesome ports

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser. It generates reports from data collected by the Status Watch and MAC Watch applications, allowing you to compare network statistics against a baseline

LANsentry Manager LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data captured by RMON-compliant devices (probes) on the network. LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes.

Use LANsentry Manager to:

■ Monitor current performance of network segments

■ See trends over time

■ Spot signs of current problems

■ Configure alarms to monitor for specific events

■ Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB (page 139) or the RMON2 MIB (page 140).


Traffix Manager Traffix™ Manager is a performance-monitoring application that provides information about layer 3 conversations between nodes. It helps you to assess traffic patterns on your network. Traffix Manager:

■ Monitors all the stations seen by the RMON2–compliant probes deployed on your network

■ Captures and stores RMON and RMON2 data for your network’s protocols and applications

■ Displays traffic between stations in user-defined views of the network

■ Graphs current or historical data on the devices selected

■ Delivers reports for user-specified stations and time periods as postscript to your printer or as HTML to your web server

■ Launches LANsentry Manager tools for in-depth analysis of a station or a conversation between stations

You can use Traffix Manager to:

■ Know your network — Understand overall flow patterns and interactions between systems and see how your network is really being used at the application level

■ Optimize your network — Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB (page 140).

Network Management Platforms 35

Device View The Device View application is a device configuration tool. When troubleshooting your network, you can use Device View to check or change a device’s configuration and upgrade a device’s agent software. You can also use Device View to look at a device’s statistics and to set alarms.

Device View manages only 3Com devices.

See the Device View help for which 3Com devices are supported by Device View.

You can also use Transcend Upgrade Manager, which is one of the Network Admin Tools applications, to perform bulk software upgrades on devices.

Network Management Platforms

As part of your troubleshooting toolbox, your network management platform is the first place that you go to view the overall health of your network. With the platform, you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role they play in the users’ work. The network management platform that supports your Transcend software installation can provide valuable troubleshooting tools.

For example, Transcend Enterprise Manager ‘97 for Windows NT software is integrated with HP OpenView Network Node Manager Version 5.01, which runs on Windows NT Version 4.0. Network Node Manager (NNM) provides a number of functions useful in troubleshooting.

It automatically discovers all the devices on your network and creates a database that contains information about each device. NNM updates the database when new devices are added or when existing devices are modified or deleted.

Using this device database, NNM creates a default map that displays a graphical representation of your network. Each device on your network appears as a symbol (icon) on the map. You can configure views of your network to show devices on the same subnetworks or floors.


You can use NNM to monitor network performance and to diagnose network performance and connectivity problems. You can:

■ Take a snapshot of your network in its normal state. The snapshot records the state of your network at a particular instant. If you later have network performance problems, you can compare the current state of your network to the snapshot.

■ Quickly determine the connectivity status of a device by noting the color of its map symbol. Red usually means a device disconnection.

■ Diagnose connectivity problems by determining whether two devices can communicate. If they can communicate, then examine the route between the devices, the number of packets sent and lost, and the roundtrip time between the two devices.

■ Manage MIB information (for example, collecting and storing MIB data for trend analysis and graphing) using MIB queries. NNM compiles MIBs and lets you navigate up and down the MIB Tree (page 136) to retrieve MIB objects from devices. You can set thresholds for MIB data and generate events when a threshold is exceeded.

■ Configure the software to act on certain events. The Event Categories window informs you of any unexpected events (which arrive in the form of traps).

For more information, see the HP documentation shipped with your software.

3Com SmartAgent Embedded Software

Traditional SNMP management places the burden of collecting network management information on the management station. In this traditional model, software agents collect information about throughput, record errors or packet overflows, and measure performance based on established thresholds. Through a polling process, agents pass this information to a centralized network management station whenever they receive an SNMP query. Management applications then make the data useful and alert the user if there are problems on the device.

For more information about traditional SNMP management, see SNMP Operation (page 133).

3Com SmartAgent Embedded Software 37

As a useful companion to traditional network management methods, 3Com’s SmartAgent® technology places management intelligence into the software agent that runs within a 3Com device. This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic.

SmartAgent software, which uses the RMON MIB (page 139), is self-monitoring, collecting and analyzing its own statistical, analytical, and diagnostic data. In this way, you can conduct network management by exception — that is, you are only notified if a problem occurs. Management by exception is unlike traditional SNMP management, in which the management software collects all data from the device through polling.

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs. The software can also take direct action without involving the management station. Devices that contain SmartAgent software may be able to:

■ Perform broadcast throttling to minimize the flow of broadcast traffic on your network

■ Monitor the ratio of good to bad frames

■ Switch a resilient link pair to the standby path if the primary path corrupts frames

■ Report if traffic on vital segments drops below minimum usage levels

■ Disable a port for five seconds to clear problems, and then automatically reconnect it

To configure these advanced SmartAgent software features, see your device documentation.

The Transcend applications LANsentry Manager (page 33) and Traffix Manager (page 34) make RMON data collected by the SmartAgent software more usable by summarizing and correlating important information.


Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network:

■ Network software, such as Ping (page 38), Telnet (page 40), and FTP and TFTP (page 40). You can use these applications to troubleshoot, configure and upgrade your system.

■ Network monitoring devices, such as Analyzers (page 41) and Probes (page 41).

■ Tools, such as Cable Testers (page 42), for working on physical problems.

Many of the tools discussed in this section are only useful in TCP/IP networks.

Ping Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices. Ping sends a packet from one device, attempts to transmit it to a station on the network, and listens for the response to ensure that it was correctly received. You can validate connections on the parts of your network by pinging different devices:

■ A successful response tells you that a valid network path exists between your station and the remote host and that the remote host is active.

■ Slower response times than normal can tell you that the path is congested or obstructed.

■ A failed response indicates that a connection is broken somewhere; use the message to help locate the problem. See Tips on Interpreting Ping Messages (page 40).

Some network devices, like the CoreBuilder® 5000, must be configured to be able to respond to Ping messages. If you are not receiving responses from a device, first check that it is set up to be a Ping responder.

Other Commonly Used Tools 39

Strategies for Using Ping

Follow these strategies for using Ping:

■ Ping devices when your network is operating normally so that you have a performance baseline for comparison. See Identifying Your Network’s Normal Behavior (page 62) for more information.

■ Ping by IP address when:

■ You want to test devices on different subnetworks. This method allows you to Ping your network segments in an organized way, rather than having to remember all the hostnames and locations.

■ Your DNS server is down and your system cannot look up host names properly. You can Ping with IP addresses even if you cannot access hostname information.

■ Ping by hostname when you want to identify DNS server problems.

■ To troubleshoot problems involving large packet sizes, Ping the remote host repeatedly, increasing the packet size each time.

■ To determine if a link is erratic, perform a continuous Ping (using PING -t on Windows NT or ping -s on UNIX), which provides you with the time that it took the device to respond to each Ping.

■ To determine a route taken to a destination, use the trace route function (tracert) on Windows 95 and Windows NT.

■ Consider creating a Ping script that periodically sends a Ping to all necessary networking devices. If a Ping failure message is received, the script can perform some action to notify you of the problem, such as paging you.

■ Use the Ping functions of your network management platform. For example, in your HP Openview map, selecting a device and right-clicking provides access to Ping functions.


Tips on Interpreting Ping Messages

Use the following Ping failure messages to troubleshoot problems:

■ No reply from <destination> — Shows that the destination routes are available but that there is a problem with the destination itself.

■ <destination> is unreachable — Shows that your system does not know how to get to the destination. This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down.

■ ICMP host unreachable from gateway — Indicates that your system can transmit to the target address using a gateway, but the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is down.

Telnet Telnet, which is a login and terminal emulation program for Transmission Control Protocol/Internet Protocol (TCP/IP) networks, is a common way to communicate with an individual device. You log into the device (a remote host) and use that remote device as if it were a local terminal.

If you have an out-of-band Telnet connection established with a device, you can use Telnet to communicate with that device even if the network goes down. This feature makes Telnet one of the most frequently used network troubleshooting tools. Usually, all device statistics and configuration capabilities are accessible by using Telnet to connect to the device’s console. For more information about setting up an out-of-band connection, see Using Telnet, Serial Line, and Modem Connections (page 49).

You can invoke the Telnet application on your local system and set up a link to a Telnet process running on a remote host. You can then run a program located on a remote host as if you were working on the remote system.

FTP and TFTP Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software. Updating system software is often the solution to networking problems that are related to agent problems. Also, new software features may help correct a networking problem.

Other Commonly Used Tools 41

FTP provides flexibility and security for file transfer by:

■ Accepting many file formats, such as ASCII and binary

■ Using data compression

■ Providing Read and Write access so that you can display, create, and delete files and directories

■ Providing password protection

TFTP is a simple version of FTP that does not list directories or require passwords. TFTP only transfers files to and from a remote server.

Analyzers An analyzer, often called a Sniffer, is a network device that collects network data on the segment to which it is attached, a process called packet capturing. Software on the device analyzes this data, a process referred to as protocol analysis. Most analyzers can interpret different types of protocol traffic, such as TCP/IP, AppleTalk, and Banyan Vines traffic.

You usually use analyzers for reactive troubleshooting — you see a problem somewhere on your network and you attach an analyzer to capture and interpret the data from that area. Analyzers are particularly helpful in identifying intermittent problems. For example, if your network backbone has experienced moments of instability that prevent users from logging onto the network, you can attach an analyzer to the backbone to capture the intermittent problems when they happen again.

Probes Like Analyzers (page 41), a probe is a network device that collects network data. Depending on its type, a probe can collect data from multiple segments simultaneously. It stores the collected data and transfers the data to an analysis site when requested. Unlike an analyzer, probes do not interpret data.

A probe can be either a stand-alone device or an agent in a network device. The Transcend Enterprise Monitor 500 series and the SuperStack® II Monitor series are stand-alone RMON probes. LANsentry Manager and Traffix Manager use data from probes that are compliant with the RMON MIB (page 139) or the RMON2 MIB (page 140).


You can use a probe daily to check the health of your network. The Transcend applications can interpret and report this data, alerting you to possible problems so that you can proactively manage your network. For example, an RMON2 probe can help you to analyze traffic patterns on your network. Use this data to make decisions about reconfiguring devices and end stations as needed.

Cable Testers Cable testers check the electrical characteristics of the wiring. They are most commonly used to ensure that building wiring and cables meet Category 5, 4, and 3 standards. For example, network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements. Testers are also used to find defective and broken wiring in a building.

STEPS TO ACTIVELY MANAGING YOUR NETWORK

These sections describe the steps you can take to effectively troubleshoot your network when the need arises:

■ Designing Your Network for Troubleshooting (page 43)

■ Preparing Devices for Management (page 53)

■ Configuring Transcend Software (page 54)

■ Knowing Your Network (page 58)

Designing Your Network for Troubleshooting

Designing your network for troubleshooting facilitates your access to key devices on your network when your network is experiencing connectivity or performance problems. Having adequate management access depends on these design criteria:

■ Position of the management station so that it can gather the greatest amount of network data through SNMP polling

■ Position of probes for distributed management of critical networks

■ Ability to communicate with each device even when your management station cannot access the network

The following sections discuss how to design your network with the above criteria in mind:

■ Positioning Your SNMP Management Station (page 44)

■ Using Probes (page 45)

■ Monitoring Business-critical Networks (page 47)

■ Using Telnet, Serial Line, and Modem Connections (page 49)

■ Using Communications Servers (page 50)

■ Setting Up Redundant Management (page 51)

■ Other Tips on Network Design (page 52)

44 STEPS TO ACTIVELY MANAGING YOUR NETWORK

Positioning YourSNMP Management

Station

In a typical LAN, it is best to locate your Windows NT or UNIX management station directly off the backbone where it can conduct SNMP polling and manage network devices. The backbone is usually the optimum location for the management station because:

■ The backbone is not subject to the failures of individual subnetworked routers or switches.

■ In a partial network outage, the information collected by a backbone management station is probably more accurate than a station in a routed subnet.

■ The backbone is usually protected with redundant power and technologies, like FDDI, that correct their own problems. This redundancy ensures that the backbone remains operational, even when other areas of the network are having problems.

■ The backbone is typically faster and has a higher bandwidth than other areas of your network, making it a more efficient location for a management station.

Make sure that the capacity of your backbone can accommodate the SNMP traffic that is generated by the management applications.

Figure 2 shows a management station that is set up at the network backbone and polling network devices.

Figure 2 SNMP Management at the Backbone

FDDI Backbone

x x x

x x x x x x x x x

Managementworkstation

x

x = Network devices that you want to poll

FDDI card ornetwork device

Designing Your Network for Troubleshooting 45

Although SNMP management from the backbone is a good way to keep track of what is happening on your network, do not rely on it exclusively. Because SNMP management occurs in-band (that is, SNMP traffic shares network bandwidth with data traffic), network troubleshooting using SNMP can become a problem in these ways:

■ Very heavy data traffic or a break in the network can make it difficult or impossible for the management station to poll a device.

■ Traffic added to the network by SNMP polling may contribute to networking problems.

Using Probes To minimize the frequency of SNMP traffic on your network, set up one or more Probes (page 41) to collect Remote Monitoring (RMON) data from the network devices. In the distributed model illustrated in Figure 3, the management station using SNMP polling collects data from the probes rather than from all the network devices. Distributing the management over the network ensures you of some continued data collection even if you have network problems.

Many management applications support data from MIBs other than the RMON MIBs. For this reason, even if you are using RMON probes, some SNMP polling to individual devices from a key management station is always useful for a complete picture of your network.


Figure 3 Management at the Backbone with as Attached Probe

To extend your remote monitoring capabilities, use embedded RMON probes or roving analysis (monitoring one port for a period of time, moving on to another port for a while, and so on). However, with roving analysis, you cannot see a historical analysis of the ports because the probe is moving from one port to another.

Some probes, like 3Com’s Enterprise Monitor, are designed to support the large number of interfaces found in switched environments. The probe’s high port density supports this multi-segmented switched environment. The probe’s interfaces can also be used to monitor mirror (or copy) ports on the switch, which means that all data received and transmitted on a port is also sent to the probe.

Probes will not indicate which port has caused an error. Only a managed hub (a hub or switch with an onboard management module) can provide that level of detail. Probes and a hub’s own management module complement each other.

FDDI Backbone

x x x

x x x x x x


x


Probe

xFDDI card ornetwork device

FDDI card ornetwork device

Probex

x x x

x

x


MonitoringBusiness-critical

Networks

On business-critical networks, you need to increase your level of management by dedicating probes to the essential areas of your network. For detailed network management, it is not enough to gather raw performance figures — you need to know, at the network and conversation level, who is generating the traffic and when it is being generated. For this type of analysis, use reporting tools, such as Traffix Manager (page 34), and low-level, fault diagnostic tools, such as LANsentry Manager (page 33).

The three critical areas on this type of network that you should monitor are discussed in these sections and shown in Figure 4:

■ FDDI Backbone Monitoring (page 48)

■ Internet WAN Link Monitoring (page 48)

■ Switch Management Monitoring (page 48)

Figure 4 Probes Monitoring a Business-critical Network

FDDI Backbone

x x

x x x x x x


x


SuperStack® IIWAN Monitor 700

SuperStack IIEnterprise Monitorwith FDDI module

Direct connection to themanagement workstation

WAN = Possible probe attachment to a switch’s

SuperStack IIEnterprise Monitor

x

roving analysis port

Inline monitoringon Fast Ethernet


FDDI Backbone Monitoring

On the FDDI backbone, you need to continually monitor whether it is being overutilized, and, if so, by what type of traffic. By placing the SuperStack® II Enterprise Monitor with an FDDI media module directly at the backbone, you can gather utilization and host matrix information. This data is used by Traffix™ Manager to provide regular segment utilization reports and Top-N host reports. In addition, the probe provides a full range of FDDI performance statistics that can be recorded with LANsentry® Manager or reported to the management station by way of SNMP traps.

To ensure management access to the probe, provide a direct connection to the probe from your management station. This connection allows you to access probe data even if the ring is unusable and keeps management traffic off the main ring.

Internet WAN Link Monitoring

The Internet link is a concern for dedicated network management because it represents an external cost to the company that requires budgeting and because it is a possible security problem. In a way similar to monitoring the FDDI backbone, Traffix Manager reports can indicate whether you are paying for too much bandwidth or whether you need to purchase more. It can also indicate the level of use on a workgroup basis for internal billing and highlight the top sites visited by users. Similarly, you can monitor for unexpected conversations and protocols.

You also need to know the error rates on this link and whether you are experiencing congestion because of circumstances on the Internet provider’s network. LANsentry Manager can record and display these statistics and provide a detailed real-time view.

Switch Management Monitoring

The third area of interest in this network is the large number of switch-to-end station links. When detailed analysis of these devices is required (for example, if one of the ports on the network suddenly reports much higher traffic than normal), you need to track the source of the problem and decide whether you can optimize the traffic path. In this case, you need a way to view the traffic on the switch port at a conversation level.


By placing a Superstack II Enterprise Monitor in a central location, you can easily attach it to the switches that have the most Ethernet ports as the need arises. Using the roving analysis feature of many 3Com devices, data from a monitored port can be copied to the port on the switch to which the SuperStack II is connected. When a problem arises, roving analysis is activated for a particular switch and LANsentry Manager or Traffix Manager collects the data from the SuperStack II Enterprise Monitor. These applications can then monitor the network data for the devices connected to that switch.

Using Telnet,Serial Line, and

Modem Connections

To minimize your dependency on SNMP management, set up a way to reach the console of your key networking devices. Through the console, you can often view Ethernet, FDDI, ATM, and token ring statistics, view routing and bridging tables, and check and modify device configurations.

These console connections are also key to network troubleshooting because they can be out-of-band (that is, management using a dedicated line to a device). If the network goes down, your console connections are still available.

The types of console connections include:

■ Telnet (page 40) — Out-of-band and in-band access using a network connection. For example, on 3Com’s CoreBuilder 6000 switch, using Telnet you can access the management console by using a dedicated Ethernet connection to the management module (out-of-band) and from any network attached to the device (in-band).

■ Serial line — Direct, out-of-band access using a terminal connection. This type of connection allows you to maintain your connections to a device if it reboots.

■ Modem — Remote, out-of-band access using a modem connection.

Figure 5 shows management of a device through the serial line and modem ports.


Figure 5 Out-of-band Management Using the Serial and Modem Ports

Sometimes, direct access to network devices through out-of-band management is the only way to examine a network problem. For example, if your network connections are down, you can Telnet (page 40) to one of your key routers and examine its routing table. The routing table shows the devices that the router can reach, allowing you to narrow the area of the problem. You can also Ping (page 38) from this device to further investigate which areas of the network are down.

UsingCommunications

Servers

While out-of-band management keeps you in contact with a particular device during a network problem, it does not inform you about all the areas of your network from a central point. You must access each device separately. To make device management more central, you can set up a communications server (often called a comm server), through which you can easily manage all devices configured to that server from one management station. See Figure 6. 3Com communication servers include the C/S 2500 and C/S 3500.

Modem

Modem



Wiring closet

Networkswitch

Serial line port

Modem port

Attached LAN


Figure 6 Out-of-band Management with a Communications Server

For optimal benefit, provide two management connections to the comm server:

■ Connect the comm server to the network (an in-band connection) so that you can access the devices from anywhere on the network using reverse Telnet.

■ Connect your management workstation directly to one of the serial ports of the comm server (an out-of-band connection) so that you can access the devices when the network is down.

Setting UpRedundant

Management

To add redundancy to your management strategy so that a management station can always access the backbone, set up a “buddy system” of management. In this setup, management applications (often different ones) run on separate management workstations, which are connected to the backbone through separate network devices or by using a network card.

This setup allows the management workstations to check on each other and report any problems with their attached network devices. The buddy system also provides a backup management connection to your network if one management station loses connectivity.


Wiring closet

Serial line port

Attached LAN

Serial line port

Communications server(“Comm” server)

Wiring closet

Networkswitch

Networkswitch


Other Tips onNetwork Design

This section provides some additional tips for designing your network for troubleshooting.

Management Station Configuration

■ Configure the management station to run without any network connection — including NIS, NFS, and DNS lookups. Because your management station should run with all network cables pulled out, do not install Transcend® Enterprise Manager on a network drive.

■ Have more than one interface available on the management station, an arrangement called dual hosting. Connect vital probes to the second interface to create a private monitoring LAN (one without regular network traffic) on which network problems will not impair communication.

■ Do not give the management station privileges on the network, such as the ability to log in with no passwords (rsh). Hackers can easily spot management stations.

■ Connect the management station to an uninterruptible power supply (UPS) to protect the station from events that interrupt power, such as blackouts, power surges, and brownouts.

■ Regularly back up the management station.

More Tips

■ Provide remote access through a modem to the management station so that you can keep track of your network’s activity remotely.

■ Use managed hubs to narrow which link is causing an error. Even if your budget does not allow you to manage all hubs, strategically install one managed hub for error tracking.

■ Keep copies of all configurations on a file server and on the management station. See Knowing Your Network’s Configuration (page 58) for more information.

Preparing Devices for Management 53

Preparing Devices for Management

Before Transcend management software (or any other management software) can work with the devices on your network, make sure that the devices are configured appropriately for management communication.

If you have a problem establishing a management connection, see Manager-to-Agent Communication (page 67) for more information about solving this problem.

ConfiguringManagement

Parameters

Before attempting to manage the supported devices with Transcend applications, check these prerequisites for each device:

■ The device must have an IP hostname and IP address. When you are managing modular devices, use the IP address of the device’s management module, if one is present.

■ The device and your network management platform must use the same SNMP read (get) and write (set) community strings. See Security (page 135) for more information about community strings.

Configuring Traps SNMP trap reporting means that management agents send unsolicited messages to management stations, relaying events that have occurred at the device, such as a system reboot. Traps include an object identification (OID) that passes integer values or strings that are decoded by the management software.

Configure each device to send the SNMP traps that are required by the network management applications to the management station. You can set SNMP traps using the device’s console program or Device View (page 35), a Transcend application.

For more information about traps, see Trap Reporting (page 134).


Configuring Transcend Software

Configure your Transcend management software to monitor your network most effectively, identify when thresholds are exceeded, and alert you to problems or potential problems.

Monitoring Devices For Transcend management software to monitor your devices:

■ Use your platform’s autodiscovery feature to detect all manageable devices on your network and to create a network map. Transcend applications use this data for their operation. For Transcend applications to recognize 3Com devices from the platform, the device icons must be 3Com device icons.

■ Add 3Com devices to an inventory database using Transcend Central (page 32). You can import devices from your platform’s database. The Transcend Central database defines the devices to be managed by many of the Transcend applications and allows you to group devices for easier management and faster troubleshooting.

■ Create logical and physical groups of the devices in your database using Transcend Central.

Setting Thresholdsand Alarms

Thresholds are the upper and lower limits that you set for the network conditions and events that you are monitoring with network management software. When these limits are exceeded, the management software reports that a threshold has been exceeded (usually by icons changing color). Alarms add to this reporting functionality by allowing you to configure an action to be taken (such as disabling ports or sending e-mail) if the threshold is exceeded.

Alarms are powerful tools that, when configured correctly, can be used to prevent inconvenient or even catastrophic network failures. The main advantage of alarms is that you can specify at exactly which point an action should take place, and you can tailor them to suit the normal operating conditions of your network.

The first time you are using the Transcend applications, you should use the default thresholds to see how they apply to your network. After assessing your network’s normal behavior, you can adjust the thresholds and alarms to make them more useful for your particular network. See Identifying Your Network’s Normal Behavior (page 62) for more information.

Configuring Transcend Software 55

Setting Thresholds in Status Watch

You can set a rising threshold and a falling threshold for most Status Watch (page 32) tools. The rising threshold triggers a status severity change when the threshold is exceeded. The falling threshold causes a status severity change when the excessive activity or abnormal condition has returned to normal.

For example, your Ethernet network may normally accommodate 50 percent utilization. If it exceeds 60 percent for an extended time, your network slows considerably. You want to know when and for how long you network exceeds the threshold of 60 percent.

Status Watch also allows you to set status severity levels for events in the FDDI Status and the System Status tools. You can set the severity level setting for the conditions and events. For some conditions and events, you can specify severity level settings for the individual values of the variables.

For more information about setting thresholds in Status Watch, see the Status View User Guide and Status Watch help.

Setting Thresholds and Alarms in LANsentry Manager

Much of network management involves monitoring for specific network events. LANsentry Manager (page 33) lets you specify these events in advance and then lets you know as soon as they occur. This process is known as setting alarms.

Consider the following examples of alarms:

■ Example A: The router on your network, which is capable of forwarding data at 3,000 packets per second (pps), appears to have problems forwarding at the top of its specification. You configure an alarm to tell you as soon as the traffic approaches this rate.

■ Example B: Your network is running at 1,400 pps. Typically, a Cyclic Redundancy Check (CRC) rate of more than 1 percent of network traffic is considered excessive. You configure an alarm to tell you as soon as the CRC rate climbs above the threshold of 14 pps.

Over time, you build up a library of alarms tailored to your own network.


Refining Alarm Settings

You can refine your alarms for more exact monitoring by setting the hysteresis zone and defining Start and Stop events.

Hysteresis zone For more control over the conditions that trigger an alarm, you can also specify a hysteresis zone around the specified value. The hysteresis zone ensures that alarms are not triggered due to small fluctuations around the threshold value. The hysteresis zone is the area where a value has fallen below the upper threshold (also called the rising threshold) but has not yet reached a lower threshold (also called the falling threshold). After a rising threshold generates an alarm, the value must fall below the falling threshold before another alarm is generated. For alarms set on falling thresholds, the rule is reversed. An example of this alarm mechanism is shown in Figure 5-7.

Figure 5-7 Alarm Triggering Mechanism

Hysteresis zone

Configuring Transcend Software 57

Stop and Startevents

As well as using alarms on their own, in LANsentry Manager, you can use them as Start or Stop events when capturing packets with the Capture application. In Example A, you could start capturing all packets transmitted by the router whenever the traffic rate rose above 2,800 packets per second and then stop capturing when it dropped below this level. By combining alarms and the Capture application, you have powerful troubleshooting capabilities.

For more information about setting alarms with LANsentry Manager, see the LANsentry Manager User Guide and help.

Setting Alarms Based on a Baseline

When you have determined the baselines of your network’s normal activity with Traffix Manager (page 34) and LANsentry Manager, you can use the Alarms View in LANsentry Manager to set alarms that trigger when network activity deviates from the baseline. See Baselining Your Network (page 62) for more information.

When determining the baseline for setting utilization alarms, use either of these approaches:

■ Set alarms for any peaks in network utilization — Pick a baseline value that covers most of your network traffic, ignoring any obvious one-time-only peaks. For example, as users log on at the start of the day, you would to see a large peak in network utilization. The alarm is triggered whenever such peaks occur.

■ Set alarms for exceptional peaks in network utilization — Pick a baseline value that covers the highest possible peak seen when service was still provided. The alarm is triggered at levels higher than this peak, alerting you to the most serious utilization on your network.

When you choose the baseline for error alarms, pick the lowest possible baseline so that the alarm is triggered by any peaks.

Other Tips for Setting Thresholds and Alarms

For SNMP traps to be effective, their thresholds must be high enough so that they do not generate false alarms. On the other hand, high thresholds also mean that small amounts of errors can escape detection.


A very small error rate that regularly occurs (such as four per minute) can cause major problems with protocols with large retry delays. For example, some MAC-level errors corrupt packets so that a switch does not forward them.

Knowing Your Network

You can better troubleshoot the problems on your network by:

■ Knowing Your Network’s Configuration (page 58)

■ Identifying Your Network’s Normal Behavior (page 62)

Knowing YourNetwork’s

Configuration

Part of understanding how your network normally looks is in knowing its physical and logical configuration. You should know which devices are on your network, how the devices are configured, which devices are attached to the backbone, and which devices connect your network to the outside world (WAN). To keep track of your network’s configuration, gather the following information:

■ Site Network Map (page 58)

■ Logical Connections (page 60)

■ Device Configuration Information (page 60)

■ Other Important Data About Your Network (page 61)

This data, when kept up to date, is extremely helpful for locating information when you experience network or device problems.

Site Network Map

A network map helps you to:

■ Know exactly where each device is physically located

■ Easily identify the users and applications that are affected by a problem

■ Systematically check each part of your network for problems

You can create a network map using any drawing or flow chart application. Store your network map online. In addition, make sure that you always have a current version on paper in case you cannot access the online version. Figure 8 shows a simple example of a network map consisting of 3Com devices.

Knowing Your Network 59

Figure 8 Example of a Site Network Map

Consider including the following information on your network map:

■ Location of important devices and workgroups (by floor, building, or area)

■ Location of the network backbone, data center, and wiring closets, as appropriate for your network

■ Location of your network management stations

■ Location and type of remote connections

■ IP subnetwork addresses for all managed switches and hubs

■ Other subnetwork addresses, such as Novell IPX and AppleTalk, if appropriate for your network

CoreBuilder 5000with SwitchModules

NETBuilder II®8-slot

AccessBuilder®

5000 7-slot

CS/2500

Windows NT workstations

Macintosh workstationsPrinters

Networkmanagementstationwith FDDI card

Floor 1

SuperStack® IISwitch 2200

CoreBuilderTM 2500

Internet ModemsISDN

Ethernet

Windows 95workstations

Printers

FDDIIP: 138.6.12.xxx

Floor 2

Floor 1

Ethernet

SuperStack IIHub 100 TX

UNIX workstations

CoreBuilder 2500

Ethernet

FastEthernet

FDDIIP: 138.6.13.xxx

FDDI BackboneIP: 138.6.1.xxx

Data center

Fast Ethernet

Fast Ethernet

FDDI

Mail serverNetWare servers

Web serverServer farm

SuperStack IISwitch 3000 FX

SuperStack IIWAN Monitor 700

NETBuilder II8-slot

SuperStack IIEnterprise Monitorwith FDDI module

Servers

UNIX workstations


■ Type of media (by actual name, such as 10BASE-T, or by grouping, such as Ethernet), which can be shown with callouts, colors, line weights, or line styles

■ Virtual workgroups, which can be shown with colors or shaded areas

■ Redundant links, which can be shown with gray or dashed lines

■ Types of network applications used in different areas of your network

■ Types of end stations connected to the switches and hubs

Complete data about end station connections is usually too detailed for the network map. Instead, maintain tables that detail which end stations are connected to which devices, along with the MAC addresses of each end station. Use tools like MAC Watch (page 33) to generate the MAC address information.

Logical Connections

With the advent of virtual LANs (VLANs), you need to know how your devices are connected logically as well as physically. For example, if you have connected two devices through the same physical switch, you can assume that they can communicate with each other. However, the devices could be in separate VLANs that restrict their communication.

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection.

The Transcend application ATMvLAN Manager allows you to view the logical makeup of your network. Depending on the complexity of your network and VLAN configurations, you can use colors to show the VLANs graphically on your network map.

Device Configuration Information

Maintain online and paper copies of device configuration information. Make sure that all online data is stored with your site’s regular data backup. If your site does not have a backup system, you should copy the information onto a backup disc (CD, Zip disk, and the like) and store it offsite.

The Transcend Network Admin Tools includes applications that allow you to save device configurations.


Follow these guidelines for saving configuration information:

■ Because the easiest way to recover a device’s configuration is to use FTP or TFTP, save the configuration settings of each device that supports this method of uploading.

■ For other devices, Telnet in and save the session (which contains configuration details) to a file. If you cannot print the configuration of a device, then create a quick “rebuild” guide that explains the quickest way to configure the device from a fresh install.

■ For devices that store information to diskette, store this data as part of your site’s regular backup.

■ For routers and other important devices with text configuration files, store this data online in a revision control system. Keep the most recent version on paper. Keep previous versions.

■ For PCs, keep a recovery disk for each type of PC. For any device that you use as a server, store all startup scripts and copies of registries.

Other Important Data About Your Network

For a complete picture of your network, have the following information available:

■ All passwords — Store passwords in a safe place. Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version.

■ Device inventory — The inventory allows you to see the device type, IP address, ports, MAC addresses, and attached devices at a glance. Software tools, such as Transcend Central (page 32), can help you keep track of the 3Com devices on your network. Using Transcend Central, you can group devices by type and location and have this information on hand for troubleshooting.

■ MAC address-to-port number list — If your hubs or switches are not managed, you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches. Generate and keep a paper copy of this list, which is crucial for deciphering captured packets, using MAC Watch (page 33).


Do not rely on getting an up-to-date list of MAC addresses from MAC Watch because the network may be down, which prevents SNMP polling. If the network is down, an exported copy of MAC Watch’s data is invaluable (online or on paper).

■ Log book — Document your interactions, no matter how trivial, with each device that is critical to your network’s operation (that is, routers, remote access devices, security servers). For example, document that you noticed a fan making noise one morning (which is probably not a problem). Your note may help you to identify why a device is over temperature a week later (because the fan stopped working).

■ Change control — Maintain a change control system for all critical systems. Permanently store change control records.

■ Contact details — Store, online and on paper, the details of all support contracts, support numbers, engineer details, and telephone and fax numbers.

To be ready to remotely access your network, store the network maps, contact details, and important network addresses at the homes of those who support the network.

Identifying YourNetwork’s Normal

Behavior

By monitoring your network over a long period, you begin to understand its normal behavior. You begin to see a pattern in the traffic flow, such as which servers are typically accessed, when peak usage times occur, and so on. If you are familiar with your network when it is fully operational, you will be more effective at troubleshooting problems that arise.

Baselining Your Network

You can use a baseline analysis, an important indicator of overall network health, to identify problems. A baseline can serve as a useful reference of network traffic during normal operation, which you can then compare to captured network traffic while troubleshooting network problems. A baseline analysis speeds the process of isolating network problems.

By running tests on a healthy network, you compile “normal” data to compare against the results you get when your network is in trouble. For example, Ping (page 38) each node to discover how long it typically takes you to receive a response from devices on your network.


Applications such as Status Watch (page 32), LANsentry Manager (page 33), and Traffix Manager (page 34) allow you to collect days and weeks of data and set a baseline for comparison. Through the reporting mechanisms in the following list, you can continuously assess the data from your network and ensure that its performance is optimal:

■ Web Reporter (page 33) generates daily or weekly reports from data collected by Status Watch.

■ Traffix Manager generates weekly reports from collected data and calculates the baselines for you. Set up Utilization History and Error History reports with data resolution set to Weekly.

■ LANsentry Manager History View generates daily utilization graphs, sampled every 30 minutes, for each day over one week. Use these graphs to calculate your network baselines manually.

Identifying Background Noise

Know your network’s background noise so that you can recognize “real” data flow. For example, one evening after everyone is gone, no backups are running, and most nodes are on, analyze the traffic on your network using the Traffix Manager (page 34) application. The traffic you see is mostly broadcast and multicast packets. Any errors you see are the result of very faulty devices (trace). This traffic is the background noise of your network — traffic that occurs for little value. If background noise is high, redesign your network.


II
NETWORK CONNECTIVITY PROBLEMS AND SOLUTIONS
Manager-to-Agent Communication (page 67)

FDDI Connectivity (page 73)

MANAGER-TO-AGENT COMMUNICATION

Use these sections to identify and correct problems with communication between the management station and network devices:

■ Manager-to-Agent Communication Overview (page 67)

■ Checking Management Configurations (page 68)

See Manager-to-Agent Communication Reference (page 69) for additional conceptual and problem analysis detail.

Manager-to-Agent Communication Overview

If your management workstation cannot communicate with devices on the network, check your management configurations for the devices and your management station configurations.

For more information about SNMP, see SNMP Operation (page 133).

Understandingthe Problem

If your management station or the devices you are managing are incorrectly configured for management, then the management station, which includes your Transcend applications, cannot perform autodiscovery, polling, or SNMP Get and Set requests on the device.

If you have not configured port connections (including a possible out-of-band serial or modem connection) and created an administration password for access to the management agent, then do so before continuing.

Identifyingthe Problem

Check your management configurations for any device that your management station cannot reach. Also check your management station setup. If you can reach a device but are not receiving traps, first check the trap configurations (the trap destination address and the traps configured to send). See Configuring Traps (page 53) for more information.

68 MANAGER-TO-AGENT COMMUNICATION

Solving the Problem Either modify device configurations so that they are the same as your management stations or modify the management station to match the configurations of your devices.

Checking Management Configurations

Check the following management configurations:

■ IP Address (page 69)

■ Gateway Address (page 69)

■ Subnetwork Mask (page 69)

■ SNMP Community Strings (page 69)

■ SNMP Traps (page 72)

How these parameters are configured can vary by device. For more information, see the user guide provided with each device.

Follow these steps:

1 Ping the device.

If the device is accessible by Ping, then its IP address is valid and you may have a problem with the SNMP setup. Go to step 5.

If the device is not accessible by Ping, then there is a problem with either the path or the IP address.

2 To test the IP address, Telnet into the device using an out-of-band connection.

If Telnet works, then your IP address is working.

3 If Telnet does not work, connect to the device’s console using a serial line connection and check your device’s IP address setting.

If your management station is on a separate subnetwork, make sure that the gateway address and subnetwork mask are set correctly.

4 Using a management application, perform an SNMP Get and an SNMP Set (that is, try to poll the device or change a configuration using management software).

5 If you cannot reach the device using SNMP, access the device’s console and make sure that your SNMP community strings and traps are set correctly.

You can access the console using Telnet, a serial connection, or a web management interface.

Manager-to-Agent Communication Reference 69

Manager-to-Agent Communication Reference

This section explains terms relevant to management configurations and provides additional conceptual and problem analysis detail.

IP Address Devices use IP addresses to communicate (that is, to talk to the management station and to perform routing tasks). Assign a unique IP address to each device in your network. Choose each IP address from the range of addresses assigned to your organization.

Gateway Address The default gateway IP address identifies the gateway (for example, a router) that receives and forwards those packets whose addresses are unknown to the local network. The agent uses the default gateway address when sending alert packets to the management workstation on a network other than the local network. Assign the gateway address on each device.

Subnetwork Mask The subnetwork mask is a 32-bit number in the same format and representation as IP addresses. The subnetwork mask determines which bits in the IP address are interpreted as the network number, which as the subnetwork number, and which as the host number. Each IP address bit that corresponds to a 1 in the subnetwork mask is in the network/subnetwork part of the address. This group of numbers is also called the Network ID. Each IP address bit that corresponds to a 0 is in the host part of the IP address.

The subnetwork mask is specific to each type of Internet class. The subnetwork mask must match the subnetwork mask that you used when you configured your TCP/IP software.

SNMP CommunityStrings

An SNMP community string is a text string that acts as a password. It is used to authenticate messages sent between the management station (the SNMP manager) and the device (the SNMP agent). The community string is included in every packet transmitted between the SNMP manager and the SNMP agent.


After receiving an SNMP request, the SNMP agent compares the community string in the request to the community strings that are configured for the agent. The requests are valid under these circumstances:

■ Only SNMP Get and Get-next requests are valid if the community string in the request matches the read-only community.

■ SNMP Get, Get-next, and Set requests are valid if the community string in the request matches the agent’s read-write community.

For more information about SNMP requests and community strings, see SNMP Operation (page 133).

A device is difficult or impossible to manage if:

■ The device is not using the correct community strings.

■ Your management station uses community strings that do not match those of the devices it manages

If community strings do not match, either modify the community string at the device so it is the string expected by the management station, or modify the management station so that it uses the device’s community strings.

Table 6 lists the default community strings for some common 3Com devices. Modify these default strings when you install a new device. You can use Device View (page 35) to change community strings of most 3Com devices.

Community string settings are case-sensitive for all devices.

Manager-to-Agent Communication Reference 71

Although community strings are SNMP’s way to secure management communication, these strings appear in the SNMP packet header unencrypted and are visible if the packet data is analyzed. For this reason, change community string settings frequently to improve management security.

Table 6 Default Security Settings for Common 3Com Devices

DeviceRead-Only Community

Read-Write Community

AccessBuilder® 7000 BRI Card and PRI Card public private

CoreBuilder™ 2500 public private





NETBuilder® public *

* By default, no setting exists or is needed for initial access on this device.

NETBuilder II® public *

OfficeConnect® products monitor security

OfficeConnect® Remote 511, 521, and 531 public private

Online™ hubs public *

SuperStack® II Desktop Switch public security

SuperStack® II Hub TR Network Management Module

public private

SuperStack® II Enterprise Monitor public admin

SuperStack® II PS Hub monitor security

SuperStack® II Switch 1000 public security

SuperStack® II Switch 2000 TR public private

SuperStack® II Switch 2200 public private

SuperStack® II Switch 3000 (all variations) public security

SuperStack® II Token Ring Monitor public admin

SuperStack® II WAN 700 Monitor public admin

Transcend® Enterprise Monitor 540 public admin




SNMP Traps If your platform or management applications do not report events for some devices, then SNMP trap reporting may not be configured correctly for those devices.

If you find that traps are overwhelming your management workstation, you can filter out (disable) some common traps so that the management station does not receive them. Most devices allow you to select which traps to send to a management station IP address.

You can use Device View (page 35) to change the trap reporting configuration of most 3Com devices.

See Trap Reporting (page 134) for more information.

FDDI CONNECTIVITY

Use these sections to identify and correct connectivity errors on an FDDI ring:

■ FDDI Connectivity Overview (page 73)

■ Monitoring FDDI Connections (page 77)

See FDDI Connectivity Reference (page 79) for additional conceptual and problem analysis detail.

FDDI Connectivity Overview

FDDI, a self-healing technology, automatically corrects ring faults to maintain connectivity throughout most of the network. However, you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity.


As shown in Figure 9, in a thru FDDI LAN, no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated. However, users complaining about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented.

Figure 9 Thru Ring

thru

thruthru

thru

74 FDDI CONNECTIVITY

Wrapped ring By monitoring the Peer Wrap Condition (page 79), you can see when the Configuration State changes. In a wrapped ring (Figure 10), two stations on the LAN are in a wrapped Configuration State. This condition may or may not affect the connectivity of certain stations. Although operational, your network may have a cabling problem or a problem with a link.

Figure 10 Wrapped LAN

Segmented ring In a segmented ring (Figure 11), more than two stations are wrapped on the trunk ring. Although this mode of operation is a valid FDDI LAN configuration, your LAN is probably experiencing a degraded or degrading condition.

Figure 11 Segmented Ring

When a network connection has excessively high link errors, Station Management (SMT) shuts down the connection and tries to bring it up again. A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network. See Making Your FDDI Connections More Resilient (page 77) for information about keeping a dual-attachment station connection from wrapping.

Isolated station Sometimes a network wraps a particular station out of the ring. Stations on either side of a problem station can be wrapped. This effectively isolates the station or links that have problems, as shown in Figure 12.

wrap_B

wrap_Athru

thru

wrap_B

wrap_A

wrap_B

wrap_A

FDDI Connectivity Overview 75

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring, then a segmented ring results, as shown in Figure 13.

Figure 13 Segmented Ring with Isolated Stations

Twisted ring In a twisted ring, an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections. A twisted ring, which always has two twist points (stations), can exist in either a Thru or Wrap state. You can monitor the Twisted Ring Condition (page 80) and Undesired Connection Attempt Event (page 80) for evidence of twisted ring and other connection problems.


To identify the problem, follow this process:

1 At the FDDI LAN level, verify that your network is up.

If the network is up, the FDDI ring may be segmented, and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network.

2 Determine if a ring is in a Thru, Wrap, or Segmented state. See Monitoring FDDI Connections (page 77) for more information.

If the FDDI ring is segmented or wrapped, look for a problem with a link somewhere in the network or for a nonfunctioning node on your

wrap_B

wrap_Athru

thru

isolated

wrap_B

wrap_Athru

isolated

thru

1st down

wrap_B

wrap_Athru

thru

isolated

thru

wrap_B

wrap_A

isolated

isolated

2nd down

wrap_A

wrap_B


trunk ring. If the ring is up and is not segmented, or if it is segmented but you still have connectivity to the stations in question, move to a more specific level in your network.

3 Determine if the poorly performing station is an Ethernet or FDDI station.

If the problem is an FDDI station, find out if it is congested (that is, if the station is so busy that it cannot accept all the network traffic directed to it) by checking its Bandwidth Utilization (page 85). You must also determine if the station has a high frame error rate by checking the FDDI Ring Errors (page 119).

If the problem is an Ethernet station, check for congestion by examining Ethernet Packet Loss (page 107) and Bandwidth Utilization (page 85).

Solving the Problem Identify the station that is causing the disconnection and take the appropriate steps:

■ If the disconnection is caused by a wrapped ring, then fix the hardware or cabling problem at that station.

■ If the station is congested, you have a device problem rather than a network problem. For example, if the congested station is a file server and every other machine on the network is retrieving and saving files using that server, consider upgrading your server or adding additional servers to the network. A variety of devices from different vendors may be communicating on an FDDI or Ethernet network; some are faster and more capable, and some slower and more prone to congestion.

■ If the station is an Ethernet station attached to an Ethernet segment, reevaluate the setup of your Ethernet network and make some changes to improve its performance.

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail. See Making Your FDDI Connections More Resilient (page 77) for more information.

Monitoring FDDI Connections 77

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool.

Status Watch Use Status Watch to identify these FDDI connectivity errors:

■ Peer Wrap Condition (page 79)

■ Twisted Ring Condition (page 80)

■ Undesired Connection Attempt Event (page 80)

Follow these steps:

1 In the Device area, select the device that is located where you suspect an FDDI ring connectivity problem.

2 Monitor the FDDI Status tool for the currently selected device.

Here are some pointers for monitoring:

■ If the Peer Wrap Configuration State variable is Isolated, the device is not connected to the FDDI trunk ring. If you intend the device to remain isolated, this indication is not a serious condition. However, if the device is supposed to be connected on a trunk ring, a serious problem may exist. The device is no longer transmitting packets to the larger trunk ring.

■ If the Peer Wrap flag (SMTPeerWrapFlag) is set, the device is one of the wrap points. The cause of the wrapped ring is somewhere in the portion of the network between the two stations reporting the peer wrap condition.

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring, there is a break in the fiber path that causes the ring to wrap until the ring is made whole again. To prevent the break in the FDDI connection, you can implement dual homing or install an Optical Bypass Unit (OBU).

Implementing DualHoming

When the operation of a dual attachment node is critical to your network, dual homing adds reliability by providing a backup connection if the primary link fails. Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M), it is possible to use one of them as a “standby” link if the active link fails. Using dual homing, only one of the two attachments is active at a time. In this


sense, a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link.

Through SMT, a DAS can be dual homed to the same concentrator or, more commonly, to two concentrators. This arrangement provides a more stable trunk ring of concentrators. If one concentrator fails, the DAS enables the standby link to another concentrator to become the active link. See Figure 14.

If the station is a dual path or dual path/dual MAC station, the dual-homed station can be configured in one of two ways:

■ With both links active

■ With one link active and one connection withheld as a backup, only becoming active if one link fails.

Figure 14 Dual Homing Configuration

A

A

A

B

A

B

SAS

SASserver

Dual-homedswitch

Standby link set by SMTconfiguration policy

Concentrator #1

FDDIdualring

Concentrator #2

Active link

M

M

M

B

B

M

M

M

M

FDDI Connectivity Reference 79

Installing an OpticalBypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it. To use an OBU, your device needs an optical bypass interface. This interface lets the bypass know if your device is still on the ring or not. See Figure 15.

If your device is removed or if it fails, the bypass unit acts by diverting the optical path away from your device, keeping the ring whole. You can use a bypass on devices that are prone to failure or are likely to be removed often, such as diagnostic equipment.

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms relevant to FDDI connectivity and provides additional conceptual and problem analysis detail.

Peer Wrap Condition A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring. Normally, the two stations that are adjacent to the fault wrap to maintain full connectivity. However, if a second fault occurs before the first is repaired, the network partitions itself into two or more rings and stations lose connectivity.

When a station reports a Peer Wrap condition, locate and repair the problem that caused the station to wrap the rings. Potential causes include faulty FDDI port hardware, faulty cables or connectors, unplugged connectors, and powered-down stations. You can expect to find the cause of the problem somewhere in the portion of the network between the two stations reporting the Peer Wrap condition.

B

A

OBUFDDIdualring

MICreceptacles

Power/control cableconnected to the opticalbypass interface of the DAS

DASA A

B B


Twisted RingCondition

A Twisted Ring condition occurs when certain undesirable connection types exist.

See Table 7 for more information. Although similar to the Undesired Connection Attempt, the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis.

UndesiredConnection

Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology. Whether the connection attempt is successful depends on the current setting of the station’s connection policies.

Connections that the FDDI standard defines as undesirable are described in Table 7. The managed devices may or may not permit these connections, depending on their FDDI station configurations.

Table 7 Undesirable Connection Types

Connection Type*

* SuperStack II Monitor series and Transcend Enterprise Monitor series use type 1 to representconnection type A and type 2 to represent connection type B.

Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

FDDI Connectivity Reference 81

FDDI connections that create valid topologies are described in Table 8.

Table 8 Valid Connection Types

Connection Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy. In a single MAC node, Port B has precedence (by default) for connecting to a Port M.

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy. In a single MAC node, Port B has precedence (by default) for connecting to a Port M.

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection


III
NETWORK PERFORMANCE PROBLEMS AND SOLUTIONS
Bandwidth Utilization (page 85)

Broadcast Storms (page 93)

Duplicate Addresses (page 101)

Ethernet Packet Loss (page 107)

FDDI Ring Errors (page 119)

Network File Server Timeouts (page 123)

BANDWIDTH UTILIZATION

Use these sections to identify and correct problems that are indicated by changes in bandwidth utilization:

■ Bandwidth Utilization Overview (page 85)

■ Identifying Utilization Problems (page 86)

■ Generating Historical Utilization Reports (page 88)

See Bandwidth Utilization Reference (page 89) for additional conceptual and problem analysis detail.

Bandwidth Utilization Overview

To determine how your network is operating on a day-to-day basis, examine its bandwidth utilization. Changes in utilization can alert you to actual or potential problems.


Utilization varies depending on the media and on how your network is configured and used. Become aware of your network’s normal behavior so that you know when to examine utilization levels more closely. See the section Identifying Your Network’s Normal Behavior (page 62) for more information.


Check the current utilization of all media on your network (Ethernet, FDDI, token ring, and ATM) to see whether utilization rates are exceeding thresholds that you have set in the management software.

On most networks, utilization gradually increases as users begin using more network resources, such as electronic mail, network printing, and file sharing. You should be concerned with utilization peaks that do not follow this pattern of use.

The process of identifying immediate utilization levels is discussed in Identifying Utilization Problems (page 86).

86 BANDWIDTH UTILIZATION

Examine your network’s historical trends (its typical utilization over time) and note whether your network has experienced a gradual or sudden increase in utilization. Here are ways to assess trends:

■ A sharp increase in utilization indicates an abnormal condition. Check the area of the network where the increase occurred. For example, a device could be causing Broadcast Storms (page 93).

■ A sustained high or low level of utilization indicates an increasing or decreasing load on your network. Balance your network’s load by adding or redistributing segments.

The process of identifying historical trends is discussed in Generating Historical Utilization Reports (page 88).

A high rate of utilization can lead to high rates of packet fragments. As utilization exceeds the alarm threshold, packet fragments become common. See Ethernet Packet Loss (page 107) for information about identifying when packet fragments are occurring.

Solving the Problem Narrow the utilization problem to the ports that have excessively high or low utilization. If necessary, redistribute network traffic accordingly by segmenting your LAN with a bridge, router, or switch.

Sometimes, a hardware problem can cause abnormal utilization rates. In this case, see Ethernet Packet Loss (page 107) and FDDI Ring Errors (page 119) for troubleshooting information.

Identifying Utilization Problems

First, check utilization levels on your current network. Try to locate the segments that are experiencing high or low utilization levels.

When checking for bandwidth utilization, use Status Watch, which collects MIB-II data using SNMP polling

Status Watch The Status Watch utilization tools monitor the amount of traffic on network segments and show how the bandwidth is being allocated. These tools provide a real-time report of utilization data on the selected device or group of devices.

Table 9 shows the Status Watch tools that monitor your network’s utilization.

Identifying Utilization Problems 87

Follow these steps:

1 Select the group that you suspect has a performance problem.

The color-coded icons (for groups, devices, and tools) can guide you to the areas of your network that are experiencing problems. For example, red icons mean you should examine a problem immediately. If a group is red, click the group to see all devices in that group, and locate the device that is red. Select the device and examine which tool icons are also red.

2 Select the utilization tool icon that indicates a problem.

The tool report displays all the interfaces in the group or device. Check to see which interfaces reflect high rates. If some interfaces are experiencing excessively high utilization rates, check for broadcast storms and other conditions that cause packet loss, as described in:

■ Broadcast Storms (page 93)

■ Ethernet Packet Loss (page 107)

Table 9 Status Watch Tools Used for Examining Utilization

Tool Icon What It Tells You

Ethernet Utilization (page 89)

The aggregate percentage of utilization of an Ethernet segment (calculated by tracking the receive and transmit utilizations of Ethernet ports)

FDDI Utilization (page 90)

The percentage of utilization of the primary, secondary, and local FDDI rings (calculated by tracking the percent utilization of FDDI ports)

Token Ring Utilization (page 90)

The percentage of utilization of a token ring segment

ATM Utilization (page 89)

The percentage of utilization of supported ATM interfaces


If an increase in utilization causes an increase in Error rates (other than collisions), look for MAC and physical layer problems (for example, faulty network cards, illegal repeater hops, and cables that are too long). Additionally, monitor Collision rates as utilization rises, looking for large increases that are out of the ordinary. In particular, check devices on the segment for Excessive Collisions (page 116). While Collisions are normal, Excessive Collisions means network delays.

Generating Historical Utilization Reports

Use real-time utilization data to see how your network is operating at the moment. To gauge whether utilization is at a critical point for your network, look at historical data. Use Web Reporter to generate a historical report that shows the utilization trends for a specific set of devices on your network.

Web Reporter Using Web Reporter, you can save days and weeks of network data, save a baseline week of “normal” data, and determine when utilization is constantly high.

Follow these steps:

1 Access Web Reporter. Use as the uniform resource locator (URL) the directory where you installed Transcend® Enterprise Manager on the Web.

2 Generate a weekly Historical report to see utilization rates for the whole week.

3 Compare your weekly Historical report to a baseline of historical utilization data.

See Identifying Your Network’s Normal Behavior (page 62) and the Web Reporter help for more information about setting a baseline.

Bandwidth Utilization Reference 89

Bandwidth Utilization Reference

This section explains terms relevant to bandwidth utilization and provides additional conceptual and problem analysis detail.

ATM Utilization Over time, if a port has experienced increased, sustained utilization levels, then you need to balance the load of your ATM segments.

Status Watch calculates ATM utilization in this way:

greater of (in_util, out_util)

where:

■ in_util =

( ((rate of ifInOctets)*8) / ((linespeed)*0.9875) )*100

■ out_util =

( ((rate of ifOutOctets)*8) / ((linespeed)*0.9875) )*100

The 8 factor converts octets to bits.

The 0.9875 factor offsets the interframe gap.

Ethernet Utilization Over time, if a port has experienced increased utilization levels (often a sustained level of over 40 percent), then you need to rebalance the load of your Ethernet segments.

Typically, the larger the frame size, the more utilization your network can accommodate.

You may recognize utilization problems with certain protocols before other protocols because some protocols have less tolerance for high rates of traffic. When utilization becomes a problem also depends on users. For example, you may allow higher utilization rates on an engineering network, yet you want greater bandwidth availability on a financial network where data delivery is critical.

As general guidelines, your network is healthy in these conditions:

■ Utilization is running up to 15 percent most of the time.

■ Utilization is peaking at 30 to 35 percent for a few seconds at a time, with large gaps of time between peaks.


■ Utilization is peaking at 60 percent for a few seconds, with large gaps of time between peaks. However, in this instance, locate the reason for the peak. Determine if the problem will get worse or if you can isolate it.

If the 30 percent utilization peaks start occurring very close together, your network will start showing signs of degraded performance.

Status Watch calculates Ethernet utilization in this way:

in_util + out_util

where:

■ in_util =

( ((rate of ifInOctets)*8) / ((linespeed)*0.9875) )*100

■ out_util =

( ((rate of ifOutOctets)*8) / ((linespeed)*0.9875) )*100


The 0.9875 factor offsets the interframe gap.

FDDI Utilization FDDI accepts utilization levels that are equivalent to its rated speed. Unlike Ethernet, FDDI does not have delays and problems caused by collisions.

The best way to determine high FDDI utilization is to know the normal capacity of your FDDI network. Generally, if your FDDI network is consistently reporting 90 percent or more utilization, plan to balance the load on your network.

Status Watch calculates FDDI utilization in this way:

(1 - (delta(token_count)*latency) / delta(time) )*100

Token RingUtilization

Token ring media accepts utilization levels equivalent to its rated speed. Unlike Ethernet, token ring does not have delays and problems caused by collisions.

The best way to determine high token ring utilization is to know the normal capacity of your token ring network. Generally, if your token ring network is consistently reporting 90 percent or more utilization, plan to balance the load on your network.

Bandwidth Utilization Reference 91

Status Watch calculates token ring utilization in this way:

( ( rate*8) / (speed) )*100

where:

■ rate = ifInOctets / delta(time)

■ speed = line speed of 4 or 16



BROADCAST STORMS

Use these sections to identify and eliminate broadcast storms:

■ Broadcast Storms Overview (page 93)

■ Identifying a Broadcast Storm (page 94)

■ Disabling the Offending Interface (page 97)

■ Correcting Spanning Tree Misconfigurations (page 98)

See Broadcast Storms Reference (page 99) for additional conceptual and problem analysis detail.

Broadcast Storms Overview

A broadcast storm means that your network is overwhelmed with constant broadcast or multicast traffic. Broadcast storms can eventually lead to a complete loss of network connectivity as the packets proliferate.

Some devices, like the CoreBuilder™ 2500 and CoreBuilder 3500, have firewall protection against broadcast storms. If a certain broadcast transmit threshold is reached, the port drops all broadcast traffic. Firewalls are one of the best ways to protect your network against broadcast storms. Check to see if your network devices support this functionality.


Broadcast Packets (page 99) and Multicast Packets (page 99) are a normal part of your network’s operation. To recognize a storm, you must be able to identify when broadcast and multicast traffic is abnormal for your network.


You may start to suspect that a broadcast storm is occurring when your network response times become extremely slow and network operations are timing out. As a broadcast storm progresses, users

94 BROADCAST STORMS

cannot log into servers or access e-mail. As the storm worsens, the network becomes unusable.

When your network is operating normally, monitor the percentage of broadcast and multicast traffic. You can then use this data as a baseline to determine when broadcast and multicast traffic is too high.

The process of identifying the problem is discussed in Identifying a Broadcast Storm (page 94).

Solving the Problem Storms can occur if network equipment is faulty or configured incorrectly, if the Spanning Tree Protocol is not implemented correctly, or if poorly designed programs that generate broadcast or multicast traffic are used.

The process for solving the problem is discussed in these sections:

■ Disabling the Offending Interface (page 97)

■ Correcting Spanning Tree Misconfigurations (page 98)

Identifying a Broadcast Storm

When identifying broadcast storms, use the following applications:

■ Status Watch (page 94) — To recognize when broadcast and multicast traffic exceeds the normal rates for your network

■ Traffix Manager (page 96) — To monitor all broadcast traffic over time

Status Watch Using the Status Watch tools in Table 10, you can identify when and where a broadcast storm is occurring.

Identifying a Broadcast Storm 95

For the Broadcast Receive and Broadcast Transmit tools, if the value for receive utilization is less than ten percent, Status Watch ignores the high rate of broadcast traffic. This way, a broadcast problem is not falsely triggered in Status Watch for a segment on which a majority of traffic is spanning tree or Routing Information Protocol (RIP) packets.

Follow these steps:

1 Use the Summary View window to check the Broadcast Transmit tool and Broadcast Receive tool to see if any thresholds have been exceeded on your monitored devices.

These tools work together in this way:

■ If the thresholds for both the Broadcast Transmit tool and Broadcast Receive tool are exceeded on a device, then a broadcast storm is occurring on your network, and this device is receiving and transmitting the broadcast traffic.

■ If the threshold for the Broadcast Receive tool is exceeded but the Broadcast Transmit tool reports normal data on a device, then a broadcast storm is probably occurring on the segment attached to the interface reporting the excessive traffic, but this device might have a filter (such as a multicast packet firewall) that prevents the storm from propagating.

■ If the threshold for the Broadcast Transmit tool is exceeded but the Broadcast Receive tool reports normal data on a device, then the device is responsible for the broadcast storm.

Table 10 Status Watch Tools Used for Identifying Broadcast Storms

Tool Icon Description

Broadcast Receive The percentage of broadcast and multicast traffic received on an Ethernet or token ring port

Broadcast Transmit The percentage of broadcast and multicast traffic transmitted from an Ethernet or token ring port

Ethernet Utilization (page 89)

\

The aggregate percentage of utilization of an Ethernet segment as calculated by tracking the receive and transmit utilizations of Ethernet ports

96 BROADCAST STORMS

2 Check the ATM, Ethernet, FDDI, and token ring utilization tools to see if their reported rates are abnormally high. This means that traffic is flooding the network. See Bandwidth Utilization (page 85) for more information.

3 Check for Ethernet Packet Loss (page 107) as an additional indicator that a broadcast storm is occurring. Increased collisions occur as the network becomes saturated.

After baselining your normal network, you can set the Broadcast Transmit tool and Broadcast Receive tool thresholds to alert you when broadcast and multicast traffic is heavier than normal.

Traffix Manager Using Traffix Manager, you can monitor all broadcast traffic to identify exactly which devices are generating broadcast traffic.

Follow these steps:

1 Using the Select Database Traffic to Load dialog box, retrieve data to the Map using the 6-Hourly or Hourly data resolution.

Finer resolutions take longer to load from the database to the Map. However, they are more suitable for in-depth analysis of network traffic than the daily or weekly resolutions. For quicker retrieval of finer resolution data, consider selecting a shorter time range.

2 Launch the Protocol Selection dialog box and set all protocols to appear as Other:

a Click Clear All to deselect all protocols.

b Click the Other square to select it without selecting any child protocols.

c Set the Protocol Filter Mode to Unselected protocols are added to parent.

3 In the Map, select MAC Labels to display devices by their MAC addresses.

4 Use the Find Objects tool to locate the broadcast MAC address ff:ff:ff:ff:ff:ff and select it from the Object List or Map.

5 From the Display menu, select Show Conversations To and From to display all traffic going to and from the broadcast MAC address. Set the Map all objects button to Map connected objects.

Disabling the Offending Interface 97

6 To create a list of the devices that are sending broadcast traffic to the broadcast address, right-click the Traffix group and select Visible Device List….

7 To generate a baseline of broadcast traffic:

a Right-click the Traffix root group and select Protocol Distribution.

b Select Packets and the timeline graph format.

8 To generate a list of the Top-N sources of broadcast traffic:

a Right-click the Traffix root group and select Child Top N.

b Select Packets and the bar graph format.

c Set Top N to an appropriate value.

The Top-N list can indicate what interface is starting the storm and what interfaces are propagating the storm.

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to go down, you must immediately take action to disable the offending interface. You can enable the interface again after you have corrected the problem.

MAC Watch Use MAC Watch to track down and disable the interface causing the broadcast storm.

Follow these steps:

1 In the MAC Watch Global Find window, enter the address of the interface that seems to be receiving the broadcast traffic.

You can copy the MAC or IP address from the Status Watch report and paste it into MAC Watch’s Locate Address field.

2 Click Find.

3 When the Find Results dialog box appears, click Disable Port.

Disabling the port stops the broadcast storm before it interferes with all vital network traffic. You can bring this interface back up using MAC Watch or the device’s console at a later time.

98 BROADCAST STORMS

Correcting Spanning Tree Misconfigurations

Spanning tree does not cause broadcast storms, but a loop in your spanning tree topology can create data that looks like a storm. A loop can occur in your topology if:

■ Someone disables spanning tree on a port

■ You set up your spanning tree configuration incorrectly

Device View Use Device View to disable any spanning tree port that has a repeater attached to it and to correct spanning tree misconfigurations.

To correct spanning tree misconfigurations, you can use Device View to disable STP for a port on a SuperStack II Switch 1000, Switch 3000, Switch 3000 10/100, Switch 9000SX, Desktop Switch, LinkBuilder FMS II Bridge/Management Module, or CoreBuilder 6000.

To disable the STP port state for a port on a SuperStack II switch:

1 Select a port and click the right mouse button.

2 From the shortcut menu, select Configure.

3 In the Port section, click the STP tab.

4 From the STP Port State list box, select Disabled.

5 Click Apply.

To disable the STP port state for a port on a LinkBuilder FMS II Bridge/Management Module:

1 Double-click on the module.

2 From the shortcut menu, select Configure Bridge.

3 In the Port section, click the STP tab.

Broadcast Storms Reference 99

Broadcast Storms Reference

This section explains terms relevant to broadcast storms and provides additional conceptual and problem analysis detail.

Broadcast Packets Broadcast packets, which are a normal part of network operation, are transmitted by a device to a broadcast address. For example, IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP); IPX networks use a large number of broadcast packets to operate most effectively.

Problems arise when broadcast packets endlessly propagate throughout the network, increasing the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets.

Multicast Packets Multicast packets, which are a normal part of network operation, are transmitted by a device to a multicast group address. Hosts that want to receive the packets indicate that they want to be members of the multicast group, and then multicast packets are distributed to that group. For example, multicast packets are used to support the Spanning Tree Protocol. Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic. However, multicast traffic can also cause storms that saturate your network.

100 BROADCAST STORMS

DUPLICATE ADDRESSES

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses:

■ Duplicate Addresses Overview (page 101)

■ Finding Duplicate MAC Addresses (page 102)

■ Finding Duplicate IP Addresses (page 103)

See Duplicate Addresses Reference (page 105) for additional conceptual and problem analysis detail.

Duplicate Addresses Overview

Networks sometime generate duplicate MAC and IP addresses. Because duplicate addresses can cause problems with packet delivery, resolve them as soon as possible.


Duplicate MAC addresses are caused by data link layer problems with FDDI media and the passing of tokens on the FDDI ring. Duplicate IP addresses are caused by network layer problems. See these sections for more information about causes of duplicate addresses:

■ Duplicate MAC Addresses (page 105)

■ Duplicate IP Addresses (page 106)


Identify duplicate MAC and IP addresses by following the instructions in these sections:

■ Finding Duplicate MAC Addresses (page 102)

■ Finding Duplicate IP Addresses (page 103)

Solving the Problem Identify the cause of the duplicate address (such as user error or a hardware problem), and fix the problem, if possible.

102 DUPLICATE ADDRESSES

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring, monitor your network using these applications:

■ MAC Watch (page 104) — To find duplicate MAC addresses on 3Com devices and their attached networks

■ Status Watch (page 32) — To find identify duplicate FDDI MAC addresses

MAC Watch MAC Watch can help you determine when and where duplicate MAC addresses occur.

Follow these steps:

1 From within MAC Watch, start polling.

2 Select a group and check the MAC Watch graph for Duplicates.

Duplicates is the count of MAC addresses that appear on the same poll sample but at more than one device, slot, and port location.

3 If the MAC Watch graph shows duplicate MAC addresses, launch the MAC Change Report window by double-clicking the polling interval in the Summary Report window, located below the graph.

4 Check the Duplicates tab for information about the duplicate MAC addresses.

If the duplicate MAC addresses are caused by the setup of managed groups (that is, a group that includes cascading devices or devices that are managed by multiple IP addresses), use the Transcend® Central application to change the group setup.

To identify where a MAC address resides in a cascading environment, look for the device, slot, and interface in your sample that has the fewest MAC addresses listed.

Finding Duplicate IP Addresses 103

Status Watch The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses. A Duplicate Address condition occurs and is reported in Status Watch when two or more MACs on the same ring have the same MAC address.

Follow these steps:

1 In the Status Watch Summary View window, check to see if any FDDI Status conditions are reported. If there are, double-click on the table cell value, displaying the Device List window.

Another approach is to check only the devices that you know reside on your FDDI ring. In the Status Watch main window, see if those device icons appear red, which indicates that a threshold has been exceeded.

2 Select a device.

If you selected the device from the Device List window, the real-time report for that device appears in the Status Watch main window.

If you selected the device from the main window, also select the FDDI Status tool to view the real-time report.

3 Check to see if a Duplicate Address condition that caused the FDDI Status tool to trigger a Critical or Warning status for that device.

In Status Watch, you can specify the status severity level that should be applied to a Duplicate Address condition.

Finding Duplicate IP Addresses

To find out if duplicate IP addresses are occurring, monitor your network using these applications:

■ MAC Watch (page 104) — To find duplicate IP addresses on 3Com devices and their attached networks

■ LANsentry Manager (page 104) — To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgent® data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices


MAC Watch Use MAC Watch to determine when and where duplicate IP addresses occur.

Follow these steps:

1 Within MAC Watch, start polling.

2 Select a group and check the MAC Watch graph for IP Duplicates.

IP Duplicates is the count of IP addresses that appear on the same poll sample but at more than one device, slot, and port location.

3 If the MAC Watch graph shows duplicate IP addresses, select Show Global IP Duplicates from the Options menu.

4 Check the Global IP Duplicates dialog box for information about the duplicate IP addresses, such as when the duplicate was detected and the device to which the IP address belongs.

LANsentry Manager Use the Duplicates table in LANsentry® Manager to compile a list of all stations with duplicate IP addresses. This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software.

Follow these steps:

1 From the Address Map menu in the LANsentry Manager menu bar, select Duplicates. Address Map data will be retrieved from the probe and displayed as a table.

2 To export the contents of the table, click Export to launch the Data Export dialog box.

Duplicate Addresses Reference 105

Duplicate Addresses Reference

This section explains terms relevant to duplicate addresses and provides additional conceptual and problem analysis detail.

Duplicate MACAddresses

Each device on your network has a unique MAC address. This address identifies a single device on the network, allowing packets to be delivered to correct destinations.

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that is provided by the Address Resolution Protocol (ARP). Therefore, if MAC addresses are duplicated on the network, ARP caches of routing devices contain erroneous destinations. In FDDI, devices monitor network traffic, checking for their own MAC address in each packet to determine whether to decode the packet. If MAC addresses are not unique, two stations cannot be distinguished from each other.

Duplicate MAC addresses can occur for the following reasons:

■ Someone has manually configured a MAC address for a device instead of using the address supplied by the vendor or allowing it to be assigned dynamically, and this address is the same as one assigned to a different device.

■ In rare circumstances, loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table.

■ On DECnet Phase 4 networks, MAC addresses are set from the DECnet address. A duplicate NET address can cause a duplicate MAC address.

A router mapping the same MAC address to more than one IP address is creating a valid network configuration. These MAC address assignments are not considered duplicate MAC addresses.

Burnt-in addresses (BIAs), which are those MAC addresses permanently given to a device by the vendor, are always unique.


Duplicate IPAddresses

Because IP addresses are critical for transmission of packets on TCP/IP networks, resolve them immediately.

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address assigned to a different device. Address assignments, although possible for you to configure manually, are usually made using one of these protocols:

■ Dynamic Host Configuration Protocol (DHCP) — Allows your network to dynamically assign IP addresses to nodes. With this protocol, a DHCP server temporarily assigns an IP address to a node, or you can statically configure addresses as needed.

■ BOOTstrap protocol (BootP) — Allows you to statically assign IP addresses to nodes. This protocol is more efficient than RARP.

■ Reverse ARP (RARP) — Allows you to statically assign IP addresses to nodes. However, because this protocol relies on the MAC address to identify the node, it cannot be used on networks that dynamically assign hardware addresses.

ETHERNET PACKET LOSS

Use these sections to identify and correct Ethernet packet loss:

■ Ethernet Packet Loss Overview (page 107)

■ Checking for Packet Loss (page 109)

See Ethernet Packet Loss Reference (page 115) for additional conceptual and problem analysis detail.

Ethernet Packet Loss Overview

If your Ethernet network is showing signs of congestion, it may be experiencing packet loss. When your network is congested, utilization is usually high, packets are discarded because buffers are full, and collision rates are up. Problems related to Collisions (page 115) are often at the heart of packet loss.


Collisions are normal in Ethernet networks. In many cases, Collision rates of 50 percent will not cause a large decrease in throughput. The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear), which is usually around 70 percent. If Collisions increase above this upper limit, your network can become unreliable.

When the Collision rates increase, so do Excessive Collisions (page 116), which means that there is a delay in transmitting data. An increase in Collisions also means that network utilization and network errors, such as FCS Errors (page 116), are probably increasing.

The real packet problems to watch for, however, are undetected collisions that show up as Late Collisions (page 116).

If small packets are colliding, you do not necessarily see a rise in utilization, but you may still have a problem. You can capture packets to determine their size.

108 ETHERNET PACKET LOSS

Identifying theProblem

To identify that your network’s problem is related to packet loss, verify that frames are being dropped on your network by checking this packet loss data:

■ Alignment Errors (page 115)

■ Collisions (page 115)

■ Excessive Collisions (page 116)

■ FCS Errors (page 116)

■ CRC Errors (page 116)

■ Late Collisions (page 116)

■ Receive Discards (page 118)

■ Too Long Errors (page 118)

■ Too Short Errors (page 118)

■ Transmit Discards (page 118)

The process of identifying the problem is discussed in Checking for Packet Loss (page 109).

Solving the Problem If you notice that packet loss data is consistently high, then your network is too congested. In this case, segment your network with the appropriate network device (such as a switch or router). If Collision data shows increases but your network’s utilization is the same, then your network may have a physical problem, such as cabling that is too long. Other problems that packet loss data can indicate include:

■ Faulty connectors or improper cabling

■ Excessive numbers of repeaters between network devices

■ Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Checking for Packet Loss (page 109).

Checking for Packet Loss 109

Checking for Packet Loss

When checking for packet loss, use the following applications:

■ Status Watch (page 109) — For Ethernet and MIB-II data collection using SNMP polling

■ LANsentry Manager Network Statistics Graph (page 111) — For RMON data collection using an RMON probe

■ Device View (page 35) — On a per-device basis, you can evaluate statistics for any port on the device.

Status Watch Status Watch monitors:

■ Alignment Errors (page 115)

■ Excessive Collisions (page 116)

■ FCS Errors (page 116)

■ Receive Discards (page 118)

■ Transmit Discards (page 118)

Follow these steps:

1 Check to see if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded.

Table 16 identifies the problems that this data can indicate and your possible actions. For information about problems related to a nonstandard Ethernet implementation, see Nonstandard Ethernet Problems (page 117).

Table 16 Alignment Errors (page 115), FCS Errors (page 116), and CRC Errors (page 116) Data

Possible Problem Possible Action

Faulty cabling Check the cable and cable connections for breaks or damage.

Network noise Check for improper cabling, faulty cable, faulty network equipment, or cables too close to equipment that emits electromagnetic interference (lamps, for example).

Faulty transceiver Use an analyzer to identify the problematic transceiver. If necessary, replace the transceiver, network adapter, or station.

(continued)


2 Check to see if the Excessive Collisions tool threshold is being exceeded.

Table 17 identifies the problems that this data can indicate and your possible actions.

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics.

2 Check the transceiver or adapter card of the device connected to the problem port.

3 If the card appears to be operating correctly, check the cable and cable connections for breaks or damage.

Station powering up or down None required.

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up; they frequently cause Alignment Errors and FCS Errors in an otherwise stable network.

When powering up, some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media. Network monitors report TDR tests as Alignment Errors and FCS Errors.

Faulty adapter Replace the adapter.

Table 16 Alignment Errors (page 115), FCS Errors (page 116), and CRC Errors (page 116) Data (continued)


Table 17 Collisions (page 115) and Excessive Collisions (page 116) Data


Busy network Use a bridge, router, or switch to reconfigure your network into segments with fewer stations.

Faulty device (adapter, switch, hub, and the like) that does not listen before broadcasting. This problem increases the incidence of all types of collisions.

Isolate each adapter to see if the problem stops.

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously.


3 Check to see if the Receive Discards and Transmit Discards tools thresholds are being exceeded.

If these errors are high in conjunction with the data that you checked in steps 1 and 2, then your network is overloaded. Segment your network.

LANsentry ManagerNetwork Statistics

Graph

Use the LANsentry® Manager Network Statistics graph to view data for:

■ Collisions (page 115)

■ Late Collisions (page 116)

■ Bandwidth Utilization (page 85)

■ CRC Errors (page 116)

■ Too Long Errors (page 118)

■ Too Short Errors (page 118)

Follow these steps:

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance.

This graph shows the most recent trend in Collision rates. If you have set up a History sample, you can also look at the historical trend. If a number of segments are connected by repeaters, examine the graph for each Ethernet segment.

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component.

If Utilization rates are high, then the collisions are probably caused by an overloaded segment. If you have added nodes or new applications to your network, consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment. This action should level the network load.


If Utilization rates are stable and appear normal, then the collisions are probably caused by faulty components. In this case, check the following:

■ If the network consists of repeaters, compare the Network Statistics graphs for each segment connected to the repeater. Because repeaters “repeat” traffic across all connected segments (which makes many segments seem like one network), you should see similar levels of traffic on all segments. One segment showing dissimilar levels of traffic and collisions may indicate faulty hardware. In this case, monitor several collisions to track the source station that is transmitting too soon after collisions and repair the station. Packets transmitted too soon after collisions are unlikely to be valid. See Table 17 for more information about Collisions.

■ On other networks, check the segment cable length.

3 Check the CRC Errors and Late Collisions, which often indicate cabling or component problems.

Table 16 identifies the problems that CRC Errors can indicate and your possible actions. Table 18 identifies the problems that Late Collisions data can indicate and your possible actions.

Table 18 Late Collisions (page 116) Data


Cabling problems:

■ Segment too long

■ Failing cable

■ Segment not grounded properly (noise)

■ Improper termination

■ Taps too close (10BASE-5 and 10BASE-2 only)

■ Noisy cable

Correct the cabling problem by doing one or more of the following:

■ Reduce the segment length.

■ Replace the cable.

■ Ground the cable.

■ Terminate the cable correctly.

■ Check the taps.

■ Check for cables too close to equipment that emits electromagnetic interference.

Component problems:

■ Deaf or partially deaf node

■ Failing repeater, transceiver, or controller cards

Correct the component problem by doing one of the following:

■ Trace the failing component and replace it.

■ Replace the NIC or the transceiver.


4 Trace Too Short Errors and Too Long Errors to the sender.

These errors often indicate faulty routers or LAN drivers and transceiver problems. Table 19 identifies the problems that this data can indicate and your possible actions.

Table 19 Too Long Errors (page 118) and Too Short Errors (page 118) Data


A transceiver on your network is adding bits to the packets that are transmitted by the attached station.

1 Use a network analyzer to identify the problematic transceiver.

2 If necessary, replace the transceiver, network adapter, or station.

The jabber protection mechanism on a transceiver has failed; it can no longer protect the network from the jabbering produced by the attached station.

Replace the network card.

Excessive noise on the cable

Note: Some 10/100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed, causing excessive noise.

Check for improper cabling, faulty cable, faulty network equipment, or cables too close to noisy electronic equipment (lamps, for example)

If your network card autodetects the network speed, and you have ruled out other problems, manually configure the network speed.

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer.

Faulty LAN driver Replace the driver.

A normal condition on a LinkSwitch® 1000, LinkSwitch® 3000, or CoreBuilder™ 5000 FastModule

If you use maximum-sized, 1518 Ethernet frames, the device’s VLT-enabled ports add a frame tag of 4 bytes, resulting in a misleading Too Long Frame error.

These frames are passed successfully but will create the Too Long Frame error message.

If you want to eliminate the error message, reduce your Ethernet packet frames by 4 bytes.


Device View Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss. These statistics and their use in troubleshooting are described in Table 20.:

Table 20 Activity and Error Statistics in Device View

Statistics Group Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port.

This data shows readable packets, broadcast packets, Collisions (page 115), total errors, and runts, which cause Too Short Errors (page 118). You can interpret this data in the following ways:

■ The presence of runts can often be caused by Collisions; however, if the values increase at specific times of the day, it may indicate you need to change the network topology to manage the traffic more efficiently (for example, with switches or routers).

■ Runts can also be caused by a badly terminated coax cable.

■ Large numbers of runts, not associated with high levels of collisions, could indicate a transmission problem (check the cable).

■ Particularly high numbers of Collisions, compared to the total number of readable packets, could point to a hardware problem (a bad adapter) or to a data loop.

■ A high proportion of Broadcast Packets (page 99) (>10%) on a heavily utilized network (>50% of available bandwidth) can point to an incorrectly configured bridge or router on the network.

Errors Displays the number of frames with errors on the selected port.

The significance of errors depends on accompanying errors and prevailing network conditions. See the following error data for more information:

■ Alignment Errors (page 115), Table 16

■ FCS Errors (page 116), Table 16

■ Too Long Errors (page 118), Table 19

■ Too Short Errors (page 118) or runts, Table 19

■ Late Collisions (page 116), Table 18

Ethernet Packet Loss Reference 115

To display Activity and Errors statistics for a device or port, follow these steps:

1 Select the required port or device.

2 From the shortcut menu, select Activity or Errors.

The statistics available depend on the type of port or device selected. See Table 20 for troubleshooting information.

You may not be able to access these statistics on some devices using Device View. Check the Device View documentation for additional information.

Ethernet Packet Loss Reference

This section explains terms relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail.

Alignment Errors An Alignment Error indicates a received frame in which both are true:

■ The number of bits received is an uneven byte count (that is, not an integral multiple of 8)

■ The frame has a Frame Check Sequence (FCS) error.

Alignment Errors often result from MAC layer packet formation problems, cabling problems that cause corrupted or lost data, and packets that pass through more than two cascaded multiport transceivers. See FCS Errors (page 116) for more information about interpreting Alignment Errors.

Collisions Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay). Because only one device can transmit at a time, both devices must stop sending and attempt to retransmit. Collisions are detected by the transmitting stations.

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time. However, if the two devices retry at nearly the same time, packets can collide again; the process repeats until either the packets finally pass onto the network without collisions, or 16 consecutive collisions occur and the packets are discarded.


CRC Errors A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors (page 116) and Alignment Errors (page 115). These errors indicate that packets were received with:

■ A bad FCS and an integral number of octets (FCS Errors)

■ A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze. If a large number of CRC Errors are attributed to a single station on the network, replace the station’s network interface board. Typically, a CRC Error rate of more than 1 percent of network traffic is considered excessive.

Excessive Collisions Excessive Collisions indicate that 16 consecutive collisions have occurred, usually a sign that the network is becoming congested. For each excessive collision count (or after 16 consecutive collisions), a packet is dropped. If you know the normal rate of excessive collisions, then you can determine when the rate of packet loss is affecting your network’s performance. See Knowing Your Network (page 58) for more information.

FCS Errors Frame Check Sequence (FCS) Errors, a type of CRC, indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check. The FCS is a mathematical way to ensure that all the frame’s bits are correct without having the system examine each bit and compare it to the original. Packets with Alignment Errors also generate FCS Errors.

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments. In a network that complies with the Ethernet standard, FCS or Alignment Errors indicate bit errors during a transmission or reception. A very low rate is acceptable. Although Ethernet allows a 1 in 108 bit error rate, typical Ethernet performance is 1 in 1012 or better.

Late Collisions Late Collisions indicate that two devices have transmitted at the same time, but cabling errors (most commonly, excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision. Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network. As a

Ethernet Packet Loss Reference 117

result, neither of the devices that cause the late collision senses the other’s transmission until the entire packet is on the network.

Although late collisions occur for small packets, the transmitter cannot detect them. As a result, a network suffering measurable Late Collisions for large packets is losing small packets as well.

NonstandardEthernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard.

Table 21 Symptoms of Common Ethernet Network Problems

Symptoms Problem Notes

FCS Errors (page 116) and Alignment Errors (page 115) increase significantly.

Network cabling is too long.

If you use a promiscuous network monitor, the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor.

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits).

Network segment is noisy.

Typically observed on a 10BASE-T network segment in a noisy environment. If you use multiple promiscuous monitors, the FCS and Alignment Errors among the monitors will not correlate.

If the monitor can track runts, also called Too Short Errors (page 118), the number of runt packets should be significantly higher than normal.

FCS and Alignment Errors are much higher than normal.

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMA/CD).

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 802.3 repeaters.

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG). The IPG is smaller on one side of the repeated network, causing a lost packet.

Ethernet controllers cannot receive packets that are separated by 4.7 µs or less. Some controllers cannot sustain receptions of packets separated by as much as 9.6 µs. If runt packets are received one after another and are followed by a collision fragment, Ethernet controllers that cannot sustain reception will lose packets.


Receive Discards Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors.

Too Long Errors A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed. Too Long Errors are often caused by a bad transceiver, a malfunction of the jabber protection mechanism on a transceiver, or excessive noise on the cable.

Too Short Errors A Too Short Error, also called a runt, indicates that a packet is less than 64 octets long (including FCS octets) but otherwise well formed.

Transmit Discards Transmit Discards indicate that packets were not transmitted because of network congestion.

FDDI RING ERRORS

Use these sections to identify and correct FDDI ring errors:

■ FDDI Ring Errors Overview (page 119)

■ Identifying Ring Errors (page 121)

See FDDI Ring Errors Reference (page 121) for additional conceptual and problem analysis detail.

FDDI Ring Errors Overview

FDDI often corrects its own problems. However, because FDDI cannot correct all errors (especially those related to hardware problems), you should monitor FDDI errors.


FDDI ring errors that you should monitor include:

■ Elasticity Buffer Error Condition (page 121)

■ Frame Error Condition (page 121)

■ Frames Not Copied Condition (page 122)

■ Link Error Condition (page 122)


First determine the type of FDDI ring errors and where they are occurring. As with other FDDI problems, identify the upstream and downstream neighbors of the devices that you are monitoring.

Several types of network errors can cause FDDI performance problems. For example, problems with cables or physical connections may result in a link or frame error. Elasticity buffer (EB) errors can also lead to link and frame errors.

120 FDDI RING ERRORS

FDDI deals with port-related errors as follows:

■ The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm. When the LER is greater than the alarm setting, Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port.

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring.

■ When link errors reach the threshold defined by the variable PORTLERCutoff, SMT breaks the connection, disabling the PHY that detected the problem. A Link Error Condition (page 122) is also generated.

FDDI deals with MAC-related errors as follows:

■ When MAC frame errors reach a certain threshold, a Frame Error Condition (page 121) is generated. Because the actual error could be further upstream than the immediate connection, the connection remains intact.

■ For a large network, the worst case MACFrameErrorRatio is less than 0.1 percent. However, during network configuration, frame error ratios can reach 50 percent for short periods. When you detect a sustained frame error ratio of more than 0.1 percent, a problem exists between the station that is reporting the condition and the nearest upstream MAC.

See Identifying Ring Errors (page 121) for more information.

Solving the Problem To solve problems related to FDDI errors, fix the hardware, cabling, or congestion problem.

Identifying Ring Errors 121

Identifying Ring Errors

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts.

Status Watch Use Status Watch to identify FDDI ring errors.

Follow these steps:

1 Monitor the FDDI Status tool for the currently selected device.

2 Check to see if Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors, Frames Not Copied, or Link Error Rates for the currently selected device.

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail.

Elasticity Buffer ErrorCondition

The Elasticity Buffer Error condition occurs when a port’s elasticity buffer overflows or underflows. This condition usually indicates that a port’s hardware is not operating within the tolerances specified by the FDDI standard. Look for the problem in the hardware of either the port that is reporting the condition or the immediately adjacent port.

Frame ErrorCondition

The Frame Error condition occurs when the percentage of frames containing errors exceeds a preset threshold. In the situation when a device is an uplink to FDDI (that is, a device is transmitting onto FDDI), this type of condition indicates that the ring is saturated. The ring is out of buffer space and packets are being dropped from the device’s backbone port.

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor. Because many physical connections can lie along this path, the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring.

122 FDDI RING ERRORS

Frames Not CopiedCondition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold. This condition indicates that the station is congested and is unable to process frames as quickly as they arrive. To help eliminate congestion:

■ Add more capacity to the station

■ Reconfigure your network so that end stations that communicate heavily with one another are on the same bridge or switch

■ Filter out certain traffic

Link Error Condition The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold. When the Link Error threshold is exceeded, the station removes itself from the ring and tries to reinsert itself on the ring. This action creates a MAC Neighbor Change Event (page 122) (which also occurs if a ring wraps).

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector. Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter.

MAC NeighborChange Event

The MAC Neighbor Change event occurs when a MAC’s upstream or downstream neighbor changes.

This event indicates either:

■ A network reconfiguration

■ Another station leaving or joining the ring

NETWORK FILE SERVER TIMEOUTS

Use these sections to identify and correct timeouts on network file servers:

■ Network File Server Timeout Overview (page 123)

■ Checking for Obvious Errors (page 124)

■ Reproducing the Fault While Monitoring the Network (page 126)

■ Correcting the Fault (page 129)

See Network File Server Timeouts Reference (page 129) for additional conceptual and problem analysis detail.

Network File Server Timeout Overview

A network file server can time out if your network gets congested or if your server is having problems. Users might have problems downloading data from or to the server or copying files from or to the server. To help you to understand the troubleshooting process for this type of problem, an EXAMPLE throughout this section follows the symptoms, analysis, and resolution of a typical file server timeout problem.


When users log in, their station makes network file server calls, either to check quotas (if this features has been enabled) or to mount user home directories. The network file server timeout messages, even when spread across multiple nodes, indicate a problem either with the network or with a server.

EXAMPLE: UNIX users are noticing that it takes a long time — over 30 seconds in some cases — to log into any machine. Some machines are reporting network file server timeout messages, but the messages have no obvious pattern and are infrequent. You begin to get a sense of the problem.

124 NETWORK FILE SERVER TIMEOUTS

Identifying theProblem

First, rule out the obvious causes. Ask these questions:

■ Can you access the network file server with Telnet?

■ Have any alarms been triggered?

■ Are there any new errors?

The process of identifying the problem is developed in Checking for Obvious Errors (page 124).

Solving the Problem To determine the cause, reproduce the fault while monitoring the network. Once you know the cause, you can fix the problem.

The solutions to the network file server timeout are identified in these sections:

■ Reproducing the Fault While Monitoring the Network (page 126)

■ Correcting the Fault (page 129)

Checking for Obvious Errors

To check for obvious errors, use these applications:

■ Ping and Telnet (page 124) — To check for connectivity to the network file server nodes

■ LANsentry Manager Alarms View (page 124) — To check for triggered alarms

■ LANsentry Manager Statistics View (page 125) — To check for errors

■ LANsentry Manager History View (page 125) — To check for trends

Ping and Telnet Check whether you can contact network file server nodes using Ping (page 38) and Telnet (page 40). If the response is extremely slow, then a problem may exist with the connections to the nodes. No delay indicates that the connections are normal, implying that the delay is occurring elsewhere. In this case, use LANsentry® Manager tools to see whether packets are being lost or ignored.

LANsentry ManagerAlarms View

Using the LANsentry Manager Alarms View, you can see if any configured alarms have been triggered.

Check the Alarms View to see if any MAC events have been logged.

Checking for Obvious Errors 125

EXAMPLE: MAC events have not been logged for the network on which the UNIX users are attached.

Even though no alarms have occurred, errors may exist. For example, a lower rate of background errors may exist just below the alarm threshold. Based on maximum and minimum values, RMON errors may miss constant, periodic, or low amounts of errors.

Before monitoring your network with LANsentry Manager, you should have already set up alarms for obvious errors related to MAC events and loading problems. See Setting Thresholds and Alarms in LANsentry Manager (page 55) for more information.

LANsentry ManagerStatistics View

Using the LANsentry Manager Statistics View, you can display a multisegment graph of utilization and error statistics.

Follow these steps:

1 Set up a graph showing utilization and errors on all your major segments.

2 Check to see if any segments are particularly busy or error prone.

EXAMPLE: You notice that one segment of the UNIX network, HUB3, is reporting Too Long Errors (page 118) and FCS Errors (page 116) roughly every second sample. While the amount is not higher than normal, it is currently higher than any other segment.

LANsentry ManagerHistory View

Using the LANsentry Manager History View, you can display a rolling history table to determine if the errors that you are seeing are new. For example, if you have a history table running for 30-minute samples over two days, you can compare the most recent sample to a previous sample, looking for new errors. If your probe has the resources, use a much finer resolution sample stored for a shorter time (every 30 seconds for two hours) to more easily spot recent errors.

EXAMPLE: You see that the history table shows that no error rates remained constant throughout the day. However, errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors.

If you notice low error rates that are not triggering alarms, use a recent history of the network to see if the errors occur in regular bursts and to estimate the average number of errors.


Reproducing the Fault While Monitoring the Network

While the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem, it may not provide enough data to solve the problem. To determine the cause of the problem, reproduce it while monitoring the network by using these applications:

■ LANsentry Manager Top-N Graph (page 126) — To locate a quiet node to use for reproducing the fault

■ LANsentry Manager Packet Capture (page 126) — To capture packets from the hub to which the quiet node is attached

■ LANsentry Manager Packet Decode (page 127) — To analyze the packets to assess network file server traffic and delays

■ MAC Watch (page 128) — To find the location of the problem nodes

EXAMPLE: Using LANsentry Manager, you find a hub on the network with a higher than normal error rate. However, the error rate does not seem high enough to cause login delays of 60 seconds or more.

LANsentry ManagerTop-N Graph

Using the Top-N graph in the LANsentry Manager main window, locate a quiet node that has been showing the same problem. Choose a quiet node so that you do not receive excessive traffic when trying to isolate the problem.

EXAMPLE: You see that the node, Monolith, which has the same NFS mounts as the other nodes on the network, is quiet. You decide to use this node for reproducing the fault. See Network File System (NFS) Protocol (page 129) for more information about NFS.

LANsentry ManagerPacket Capture

Using the LANsentry Manager Packet Capture application, capture packets from the network using predefined patterns and start-and-stop conditions.

Follow these steps:

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node. Until you know more about the problem, set a very general filter.

EXAMPLE: You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith.

Reproducing the Fault While Monitoring the Network 127

2 Telnet into and log out from the quiet node. Then reset the capture buffer. Repeat this procedure until you see the problem reflected in your captured data. To keep the buffer information clean, reset the buffer each time you repeat the procedure.

3 When you see the delay, note the rough value of the packet count on the LANsentry Manager packet buffer.

By noting the packet count at which you think the delay has occurred, you can narrow the problem to within about 20 packets in the buffer. If you have used an extremely quiet node, you may even identify the exact packet.

LANsentry ManagerPacket Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail: summary information, header information, and actual packet content.

Follow these steps:

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred.

When you Telnet into a node, the traffic that the Telnet operation generates appears in the capture buffer. Expect this traffic when reading the buffer.

2 Select the packet and launch a MAC-layer conversation filter. In the filter display, look for a gap in the conversation (that is, where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem).

3 Repeat the test to see if the result concentrates on one node or if it appears on other nodes.

EXAMPLE: On the quiet node that you selected, the delay is obvious. You see an NFS request going out to a node and a repeat of the request 30 seconds later. During that time, the node did not respond. You now know that the delay occurred because nodes were not seeing responses for NFS requests. When you repeat the test on other nodes, you find that the delay is happening with more than one destination node.


MAC Watch Use MAC Watch, which polls managed devices, to determine the hubs to which the problem nodes are attached. If the problem end stations are located on unmanaged devices, then you can at least narrow the problem to those unmanaged devices.

EXAMPLE: Although your network does not have managed hubs that Transcend® management software can poll, it does have managed switches. When polling the switches, MAC Watch displays the switch ports on which addresses were last seen. This information tells you the hub (but not the hub port) on which the device is located.

If you need to take immediate action to resolve this problem for your users, move all the network file servers to different hubs. This quick fix reduces the amount of timeouts.

LANsentry ManagerPacket Decode

Once you know the location of the hub that has the problem node, monitor the problem from the hub using LANsentry Manager Packet Decode.

Follow these steps:

1 To capture packets from one of the nodes on the hub, set up another capture buffer and repeat the exercise described in LANsentry Manager Packet Capture (page 126). Because a delay may occur on a different node, use two capture buffers without stopping the first one.

Note the rough packet count where the delay appears.

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation.

EXAMPLE: You are hoping that the nodes will be on the same hub. You find that all the nodes are on HUB3. This result indicates that FCS Errors may be causing the timeouts. However, because the errors are occurring at a low rate, you decide to verify this diagnosis. You monitor the problem from the hub, logging in and out many times, and the delay eventually occurs. This time, the delay shows that the node’s reply had an FCS Error even though the node received the request. The switch would not have transmitted this packet, causing a timeout on the NFS protocol. The retry time is presumably 30 seconds. During this test, you see the problem occurring on another node.

Correcting the Fault 129

Correcting the Fault

Without a managed hub, you may find it very difficult to further track down a network file server timeout error. To find the problematic node, you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub.

EXAMPLE: You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission. A possible reason for this occurrence is a Jabbering (page 129) node. This explanation makes sense because FCS/Jabber frames increased linearly when you were monitoring the live network.

Network File Server Timeouts Reference

This section explains terms relevant to network file server timeouts and provides additional conceptual and problem analysis detail.

Jabbering Jabbering occurs when a node transmits illegal length packets and is possibly not operating within carrier specifications. In effect, another node has written bad data over a valid packet. This bad data is often interpreted as a repeated sequence of data.

Network File System(NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks. This protocol has been incorporated into products by more than two hundred companies. It is now a de facto Internet standard.

NFS is one protocol in the NFS suite of protocols, which includes NFS, RPC, XDR (External Data Representation), and others. These protocols are part of a larger architecture that Sun refers to as Open Network Computing (ONC). ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun.


IV
REFERENCE
SNMP in Network Troubleshooting (page 133)

Information Resources (page 143)

SNMP IN NETWORK TROUBLESHOOTING

SNMP and the MIBs it uses are important for troubleshooting your network. These sections provide information about:

■ SNMP Operation (page 133)

■ SNMP MIBs (page 136)

SNMP Operation Simple Network Management Protocol (SNMP), one of the most widely used management protocols, allows management communication between network devices and your management workstation across TCP/IP internets.

Most management applications, including Status View (page 32) applications, require SNMP to perform their management functions.

Manager/AgentOperation

SNMP communication requires a manager (the station that is managing network devices) and an agent (the software in the devices that talks to the management station). SNMP provides the language and the rules that the manager and agent use to communicate.

Managers can discover agents:

■ Through autodiscovery tools on Network Management Platforms (page 35) (such as HP OpenView Network Node Manager)

■ When you manually enter IP addresses of the devices that you want to manage

For agents to discover their managers, you must provide the agent with the IP address of the management station or stations.

134 SNMP IN NETWORK TROUBLESHOOTING

Managers send requests to agents (either to send information or to set a parameter), and agents provide the requested data or set the parameter. Agents can also talk to the managers (without being asked by the managers) through trap messages, which tell the manager that certain events have occurred.

SNMP Messages SNMP supports queries (called messages) that allow the protocol to transmit information between the managers and the agents. Types of SNMP messages:

■ Get and Get-next — The management station asks an agent to report information.

■ Set — The management station asks an agent to change one of its parameters.

■ Get Responses — The agent responds to a Get, Get-next, or Set operation.

■ Trap — The agent sends an unsolicited message informing the management station that an event has occurred.

Management Information Bases (MIBs) define what can be monitored and controlled within a device (that is, what the manager can Get and Set). An agent can implement one or more groups from one or more MIBs. See SNMP MIBs (page 136) for more information.

Trap Reporting Traps are unsolicited, asynchronous events generated by devices to indicate status changes. Every agent supports some trap reporting. You must configure trap reporting at the devices so that these events are reported to your management station to be used by the Network Management Platforms (page 35) (such as HP OpenView Network Node Manager) and the Transcend Applications (page 31).

Not all traps are important for your management tasks. To decrease the burden on the management station and on your network, you can limit the traps reported to the management station.

MIBs are not required to document traps. SNMP supports the limited number of traps defined in Table 11. More traps may be defined in vendors’ private MIBs.

SNMP Operation 135

To minimize SNMP traffic on your network, you can implement trap-based polling. Trap-based polling allows the management station to start polling only when it receives certain traps. Your management applications must support trap-based polling for you to take advantage of this feature.

Security SNMP uses community strings as a form of management security. To enable management communication, the manager must use the same community strings that are configured on the agent. You can define both read and read/write community strings.

Because community strings are included unencoded in the header of a User Datagram Protocol (UDP) packet, packet capture tools can easily access this information. As with any password, change the community strings frequently.

See SNMP Community Strings (page 69) for more information.

Table 11 Traps Supported by SNMP

Trap Indication

Cold Start The agent has started or been restarted.

Warm Start The agent’s configuration has changed.

Link Down The status of an attached communication interface has changed from up to down.

Link Up The status of an attached communication interface has changed from down to up.

Authentication Failure The agent received a request from an unauthorized manager.

EGP Neighbor Loss In routers running the Exterior Gateway Protocol (EGP), an EGP Neighbor has changed to a down state.


SNMP MIBs SNMP MIBs include MIB-II, other standard MIBs (such as the RMON MIB), and vendors’ private MIBs (such as enterprise MIBs from 3Com). These MIBs and their objects are part of the MIB tree.

MIB Tree The MIB tree is a structure that groups MIB objects in a hierarchy and uses an abstract syntax notation to define manageable objects. Each item on the tree is assigned a number (shown in parentheses after each item), which creates the path to objects in the MIB. See Figure 22. This path of numbers is called the object identifier (OID). Each object is uniquely and unambiguously identified by the path of numeric values.

When you perform an SNMP Get operation, the manager sends the OID to the agent, which in turn checks to see if the OID is supported. If the OID is supported, the agent returns information about the object.

For example, to retrieve an object from the RMON MIB, the software uses this OID:

1.3.6.1.2.1.16

which indicates this path:

iso(1).indent-org(3).dod(6).internet(1).mgmt(2).mib(1).RMON(16)

SNMP MIBs 137

Figure 22 MIB Tree Showing Key SNMP MIBs

ROOT

ccit(0) iso(1) joint(2)

standard(0) reg-authority(1) member-body(2) indent-org(3)

dod(6)

internet(1)

directory(1) mgmt(2) experimental(3) private(4)

mib(1)

system(1)

interfaces(2)

at(3)

ip(4)

icmp(5)

tcp(6)

udp(7)

egp(8)

enterprises(1)

3Com enterprise MIBs:a3Com(43)

synernetics(114)

chipcom(49)

startek(260)

onstream(135)

transmission(10)

snmp(11)

RMON(16)

RMON2(17)

MIB-II (1-11)

retix(72)


MIB-II MIB-II defines various groups of manageable objects that contain device statistics as well as information about the device, device status, and the number and status of interfaces.

The MIB-II data is collected from network devices using SNMP. As collected, this data is in its raw form. To be useful, data must be interpreted by a management application, such as Status Watch.

MIB-II, the only MIB that has reached Internet Engineering Task Force (IETF) standard status, is the one MIB that all SNMP agents are likely to support.

Table 12 lists the MIB-II object groups. The number following each group indicates the group’s branch in the MIB subtree.

MIB-I supports groups 1 through 8; MIB-II supports groups 1 through 8, plus two additional groups.

Table 12 SNMP MIB-II Group Descriptions

MIB-II Group Purpose

system(1) Operates on the managed node

interfaces(2) Operates on the network interface (for example, a port or MAC) that attaches the device to the network

at(3) Were used for address translation in MIB-I but are no longer needed in MIB-II

ip(4) Operates on the Internet Protocol (IP)

icmp(5) Operates on the Internet Control Message Protocol (ICMP)

tcp(6) Operates on the Transmission Control Protocol (TCP)

udp(7) Operates on the User Datagram Protocol (UDP)

egp(8) Operates on the Exterior Gateway Protocol (EGP)

transmission(10) Applies to media-specific information (implemented in MIB-II only)

snmp(11) Operates on SNMP (implemented in MIB-II only)

SNMP MIBs 139

RMON MIB RMON is an SNMP MIB that enables the collection of data about the network itself, rather than about devices on the network.

A typical RMON system consists of two components:

■ Probe — Connects to a LAN segment, examines all the LAN traffic on that segment, and keeps a summary of statistics (including historical data) in the probe’s local memory. The probe can stand alone or be embedded within the agent software. See Other Commonly Used Tools (page 38) and 3Com SmartAgent Embedded Software (page 36) for more information.

■ Management station — Communicates with the probe and collects the summarized data from it. The station can be on a different network from the probe and can manage the probe through either in-band or out-of-band connections.

The IETF definition for the RMON MIB specifies several groups of information. These groups are described in Table 13.

Table 13 RMON Group Descriptions

RMON Group Description

Statistics(1) Total LAN statistics

History(2) Time-based statistics for trend analysis

Alarm(3) Notices that are triggered when statistics reach predefined thresholds

Hosts(4) Statistics stored for each station’s MAC address

HostTopN(5) Stations ranked by traffic or errors

Matrix(6) Map of traffic communication among devices (that is, who is talking to whom)

Filter(7) Packet selection mechanism

Capture(8) Traces of packets according to predefined filters

Event(9) Reporting mechanisms for alarms

Token Ring(10) ■ Ring Station — Statistics and status information associated with each token ring station on the local ring, which also includes status information for each ring being monitored

■ Ring Station Order — Location of stations on monitored rings

■ Source Routing Statistics — Utilization statistics derived from source routing information optionally present in token ring packets


RMON2 MIB RMON and RMON2 are complementary MIBs. The RMON2 MIB extends the capability of the original RMON MIB to include protocols above the MAC level. Because network-layer protocols (such as IP) are included, a probe can monitor traffic through routers attached to the local subnetwork.

Use RMON2 data to identify traffic patterns and slow applications. The RMON2 probe can monitor:

■ The sources of traffic arriving by a router from another network

■ The destination of traffic leaving by a router to another network

Because it includes higher-layer protocols (such as those at the application level), an RMON2 probe can provide a detailed breakdown of traffic by application.

Table 14 shows the additional MIB groups available with RMON2.

Table 14 RMON2 Group Descriptions

RMON2 Group Description

Protocol Directory(11) Lists the inventory of protocols that the probe can monitor

Protocol Distribution(12) Collects the number of octets and packets for protocols detected on a network segment

Address Map(13) Lists MAC-address-to-network-address bindings discovered by the probe, and the interface on which the bindings were last seen

Network Layer Host(14) Counts the amount of traffic sent from and to each network address discovered by the probe

Network Layer Matrix(15) Counts the amount of traffic sent between each pair of network addresses discovered by the probe

Application Layer Host(16) Counts the amount of traffic, by protocol, sent from and to each network address discovered by the probe

Application Layer Matrix(17) Counts the amount of traffic, by protocol, sent between each pair of network addresses discovered by the probe

User History(18) Periodically samples user-specified variables and logs the data based on user-defined parameters

Probe Configuration(19) Defines standard configuration parameters for RMON probes

SNMP MIBs 141

3Com EnterpriseMIBs

3Com Enterprise MIBs allow you to manage unique and advanced functionality of 3Com devices. MIB names and numbers are usually retained when organizations restructure their businesses; therefore, many of the 3Com Enterprise MIB names do not contain the word “3Com.” Figure 22 shows some of the 3Com Enterprise MIB names and numbers.


INFORMATION RESOURCES

This section lists the information resources you can use to help troubleshoot problems with your network. It contains:

■ Books (page 143)

■ URLs (page 144)

Books The books listed in Table 15 can help you with network troubleshooting.

Table 15 Reference Books

IBM’s Token-Ring Networking Handbook (J. Ranade Series on Computer Communications)

Author: George C. Sackett

Publisher: McGraw Hill Text

ISBN: 0070544182

Publish Date: June 1993

Interconnections: Bridges and Routers (Addison-Wesley Professional Computing Series)

Author: Radia Perlman

Publisher: Addison-Wesley Publishing Co.

ISBN: 0201563320

Publish Date: May 1992

Internetworking with TCP/IP: Design, Implementation, and Internals

Authors: Douglas E. Comer, David L. Stevens

Publisher: Prentice Hall

Edition: 2nd

ISBN: 0131255274


Internetworking with TCP/IP: Principles, Protocols, and Architecture

Author: Douglas E. Comer

Publisher: Prentice-Hall

Edition: 3rd

ISBN: 0132169878

Publish Date: April 1995

(continued)

144 INFORMATION RESOURCES

URLs The following uniform resource locators (URLs) lead to Web sites that are useful for network troubleshooting:

■ www.3Com.com — 3Com Corporation’s web site, which contains:

■ The latest release notes and documentation for all 3Com products. Documents are organized in the Support area by product type.

■ White papers and other technical documents about networking technology and solutions.

■ 3Com product information.

■ The 3Com Shopping Network.

Managing Switched Local Area Networks.

Author: Darryl Black

Publisher: Addison Wesley Longman, Inc.

ISBN: 0201185547

Publish Date: November 1997

Network Management Standards: SNMP, CMIP, TMN, MIBs, and Object Libraries (McGraw-Hill Computer Communications)

Author: Uyless Black

Publisher: McGraw Hill Text

Edition: 2nd

ISBN: 007005570X

Publish Date: November 1994

The Complete Guide to Netware LAN Analysis

Authors: Laura A. Chappell, Dan E. Hakes

Publisher: Sybex

Edition: 3rd

ISBN: 0782119034

Publish Date: July 1996

The Simple Book: An Introduction to Networking Management

Author: Marshall Rose

Publisher: Prentice-Hall

Edition: 2nd

ISBN: 0134516591

Publish Date: 1996

Token Ring Network Design (Data Communications and Networks)

Author: David Bird

Publisher: Addison-Wesley Publishing Co.

ISBN: 0201627604

Publish Date: July 1994

Troubleshooting TCP/IP (Network Troubleshooting Library)

Author: Mark A. Miller

Publisher: M & T Books

Edition: 2nd

ISBN: 1558514503


Table 15 Reference Books (continued)

URLs 145

■ wwwhost.ots.utexas.edu/ethernet/ethernet-home.html — Charles Spurgeon’s Ethernet Web Site, which includes Ethernet troubleshooting information.

■ techweb.cmp.com/nc/netdesign/series.htm — Network Computing Online’s Interactive Network Design Manual, which helps you to design and troubleshoot networks.

■ www.nmf.org — Network Management Forum (NMF), a nonprofit global consortium that promotes and accelerates the worldwide acceptance and implementation of a common, service-based approach to the management of networked information systems.

■ www.ovforum.org — HP OpenView Forum’s web site. HP OpenView Forum is a nonprofit corporation formed by the largest licensees of Hewlett-Packard OpenView to represent the interests of HP OpenView users and developers world-wide. The Forum is an independent corporation, not affiliated with Hewlett-Packard Company.

■ hpcc920.external.hp.com/openview/index.html — HP OpenView home page.

■ www.iol.unh.edu/index.html — University of New Hampshire InterOperability Lab (IOL) web site. Information on IOL consortiums, test suites, and technology tutorials.

■ www.3com.com/nsc/500251.html — Location of the document RMON Methodology: Towards Successful Deployment for Distributed Enterprise Management by John McConnell of McConnell Consulting, published in 1997.

These URLs are known to work; however, URLs are subject to change without notice.

146 INFORMATION RESOURCES

INDEX

Numbers3Com enterprise MIBs 141

AAddress Resolution Protocol

role in duplicate MAC addresses 105alarms

defined 54defining Start and Stop events 57setting against a baseline 57setting in LANsentry Manager 55tips for setting 57

alignment errorscauses and actions 109defined 115See also FCS errors

analyzersdefined 41use in troubleshooting 41See also probes

analyzing symptoms 25application layer 21ARP (Address Resolution Protocol)

role in duplicate MAC addresses 105ATM (Asynchronous Transfer Mode) utilization 89ATMvLAN Manager

VLAN map 60audience description, About This Guide 11

Bbackbone

checking utilization 85location of management station 44monitoring with probes 48

background noise 63balancing network load 29bandwidth utilization

ATM parameters 89Ethernet parameters 89FDDI parameters 90problems with 85token ring parameters 90

baselinescreating 63defined 62setting alarms from 57

book resources 143BootP

defined 106BOOTstrap protocol 106broadcast packets

defined 99See also broadcast storms

broadcast stormsbroadcast packets 99defined 93disabling the offending interface 97first clues 93identifying with Traffix Manager 96monitoring with Status Watch 94multicast packets 99troubleshooting 93

Ccable testers 42cabling

faulty 109problems 74, 112testing 42too long 118too short 118

collisionscauses and actions 111defined 115excessive 107late 107related to packet loss 107when normal 107See also excessive collisions and late collisions

communications serversconnecting on the network 51defined 50

community stringsdefault settings for 3Com devices 71defined 135device configuration 53

148 INDEX

congested station 76connections

adding redundancy 77undesirable for FDDI 80valid for FDDI 81

connectivity problemsdefined 19FDDI ring disconnections 73manager-to-agent communication 67

conventionsnotice icons, About This Guide 13text, About This Guide 14troubleshooting icons, About This Guide 13

CRC (Cyclic Redundancy Check) errorscauses and actions 112defined 116

Ddata link layer 21DECnet Phase 4 networks 105default community strings 71default thresholds 54designing a network 43device configurations

for management 53misconfigured 26Ping responder 38storing 60

Device Viewchecking packet loss statistics 114correcting spanning tree configurations 98defined 35using to set traps 53

devicesconfiguration information 60configuring for management 53default community strings 71faulty 110grouping 32, 54inventory 32, 61monitoring with probes 48monitoring with Transcend software 54

DHCP (Dynamic Host Configuration Protocol)defined 106

diagnostic equipment on FDDI 79disabling an interface 97DNS server problems 39dropped packets. See Ethernet packet lossdual homing

configuration 78defined 77

dual hosting 52

duplicate adressescauses 101troubleshooting 101with IP addresses 103with MAC addresses 102

Dynamic Host Configuration Protocol 106

EECAM 103, 104elasticity buffer errors

causes 119defined 121

Enterprise Communications Analysis Module 103enterprise MIBs 141equipment

backups 29for testing 28replacing 29

Ethernetcabling problems 112network problems 76nonstandard cabling problems 117segment problems 76station problems 75utilization 89

Ethernet packet losschecking with LANsentry Manager 111checking with Status Watch 109Ethernet standard violations 117troubleshooting 107

excessive collisionscauses and actions 110defined 116related to packet loss 107

Ffault management. See connectivity problemsFCS (Frame Check Sequence) errors

defined 116related to packet loss 107

FCS errorsSee also alignment errors

FDDIidentifying problems with Status Watch 121MAC errors 120ring errors 119station problems 76utilization 90

FDDI backbonemonitoring with probes 48position of management station 44

INDEX 149

FDDI connectivityadding redundancy 77dual homing 77Optical Bypass Unit 79SMT role 74troubleshooting 73undesired connections 80valid connections 81

Fiber Distributed Data Interface. See FDDI entriesfile servers

correcting timeouts 124File Transfer Protocol. See FTPfirewalls

protection against broadcast storms 93restricting access 26

Frame Check Sequence. See FCS errorsframe errors 119


frame loss. See Ethernet packet lossframes not copied

defined 122FTP (File Transfer Protocol)

compared to TFTP 40defined 40

Ggateway address

defined 69

Hhardware

backups 29upgrading 29

historical reports 88HP OpenView NNM

defined 35hysteresis zone

controlling alarms 56

IIETF

MIB-II MIB 138RMON MIB 139

in-band management 49information resources 143installation problems 12intermittent connectivity 19International Standards Organization. See ISOInternet Engineering Task Force. See IETFinternet link

monitoring with probes 48

IP addressescauses of duplicates 101, 106defined 69device configuration 53dynamically assigned 106identifying duplicates 103Pinging 39

IP hostnamesdevice configuration 53Pinging 39

ISO (International Standards Organization) 21isolated stations

defined 74

Jjabbering

defined 129protection mechanism failure 113

LLAN driver, faulty 113LANsentry Manager

analyzing file server timeouts 124, 126checking Ethernet packet loss 111decoding packets 127defined 33identifying duplicate IP addresses 104setting alarms 55setting thresholds 55

late collisionscauses and actions 112defined 116related to packet loss 107

LER cutoff 120link errors 74


log bookmaintaining 62

logical network configuration 60loss of connectivity

overview 19

MMAC addresses

causes of duplicate 101, 105finding 33identifying duplicate 102storing 61

MAC neighbor change events 122

150 INDEX

MAC Watchdefined 33finding duplicate IP addresses 104finding duplicate MAC addresses 102stopping a broadcast storm 97troubleshooting file server timeouts 128

MACFrameErrorRatio variable 120MAC-to-IP address translation 33managed hubs

defined 46in troubleshooting 52troubleshooting file server timeouts 128

management configurationschecking 68design of network 43gateway address 69IP address 69SNMP community strings 69SNMP traps 72

Management Information Base. See MIB entriesmanagement software

ATMvLAN Manager 60Device View 35LANsentry Manager 33MAC Watch 33Network Admin Tools 29Status Watch 32Traffix Manager 34Transcend Central 32Upgrade Manager 35Web Reporter 33

management stationconfiguration 52connecting to UPS 52dual hosting 52location on network 44RMON MIB 139security 52

MIB browserin NNM 36viewing the tree 136

MIB-IIdefined 138objects 138

MIBsenterprise 141example of OID 136in SNMP management 134MIB-II 138RMON 139RMON2 140tree representation 137tree structure 136

misconfigurationsin newly connected devices 26

modemaccessing the device console 49out-of-band connections 49

multicast packetsdefined 99See also broadcast storms

NNetwork Admin Tools 29network changes

interpreting 23network configuration

device configurations 60site map 58VLAN setup 60

network congestion. See bandwidth utilization, broadcasts storms, and Ethernet packet loss

network designconsole connections 49criteria 43for business-critical networks 47position of management station 44redundant management 51tips 52using communications servers 50using probes 45

network file server timeoutschecking for errors 124correcting the problem 129decoding packets 127description 123overview 123reproducing the fault 126

Network File System. See NFSNetwork ID 69network layer 21network load

balancing 29network loop 110network management

position of management station 44network management platforms

defined 35in troubleshooting 36

network mapcontent 59creating in NNM 36defined 58example 59

Network Node Manager. See NNMnetwork noise 109NFS

defined 129in file server timeouts 126

INDEX 151

NNMcreating a network map 36defined 35MIB browser 36

normal networksbaselining 62collision rates 107defined 62identifying background noise 63setting thresholds and alarms 54

Oobject identification. See OIDOBU

configuration 79defined 79

OIDexample 136MIB tree 137use in trap reporting 53

ONC 129Open Network Computing 129Open Systems Interconnect. See OSI reference modelOpenView 35Optical Bypass Unit. See OBUOSI reference model

and network troubleshooting 21graphical representation 22layers and troubleshooting tools 21

out-of-band connectionsdefined 49with Telnet 40

Ppacket capturing

using analyzers 41Packet Internet Groper. See Pingpacket loss. See Ethernet packet losspasswords

community strings 135storing 61

peer wrap conditioncauses 74defined 79evaluating 77

performance problems 119checking utilization 85correcting duplicate addresses 101correcting FDDI ring errors 119defined 20Ethernet congestion problems 107solving file server timeouts 123stopping broadcast storms 93

physical connection break 25physical layer 21Ping

checking file server response 124creating a script 39defined 38device configuration 38interpreting messages 40strategies for using 39

Ping responder 38platforms 35presentation layer 21probes

defined 41in troubleshooting 41on business-critical networks 47placement on a network 45RMON MIB 139roving analysis 46See also analyzers 41

problemsanalysis example 27device configuration 25identifying causes 26physical connection break 25recognizing symptoms 24software installation 12solving 29testing causes 26Transcend software errors 12understanding 25

protocol analysis 41

QQoS (Quality of Service) 26

RRARP 106receive discards 118redundant connections

dual homing 77Optical Bypass Unit 79

redundant management 51replacing faulty equipment 29reporting

with Web Reporter 33reports

historical 88utilization 88

resourcesbooks 143URLs 144

152 INDEX

Reverse ARP 106RIP packets 95RMON

groups 139LANsentry Manager 33MIB definition 139probes 41SmartAgent software 37Traffix Manager 34

RMON2groups 140LANsentry Manager 33MIB definition 140probes 41purpose 140Traffix Manager 34

routers, faulty 113Routing Information Protocol 95routing table

examining 50roving analysis

in business-critical networks 49with probes 46

Ssecurity

of management station 52SNMP community strings 71, 135

segmented ringdefined 74identifying 75

serial lineaccessing the device console 49out-of-band connections 49

serverscomm 50timeouts 123

session layer 21Simple Network Management Protocol. See SNMP

entriessite network map. See network mapSmartAgent software

defined 36use in troubleshooting 37

SMT (Station Management)role in FDDI connectivity 74

SMTConfigurationState variable 73SMTPeerWrapFlag variable 77Sniffer. See analyzersSNMP

defined 133messages 134

SNMP agentdefined 133troubleshooting communication problems 67

SNMP community strings3Com defaults 71defined 69, 135device configuration 53

SNMP Getdefined 134when valid 70

SNMP Get Responsesdefined 134

SNMP Get-nextdefined 134when valid 70

SNMP managementlocation of station on network 45problems with 45

SNMP managerdefined 133troubleshooting communication problems 67

SNMP Setdefined 134when valid 70

SNMP trapsdefined 72, 134device configuration 53message description 134supported objects 135

softwarealerts 24backups 29problems 12upgrading 29

solving problemsbalancing network load 29overview 20replacing equipment 29upgrading software and hardware 29

spanning treecausing broadcast storms 94correcting configurations 98traffic not monitored 95

Spanning Tree Protocol. See spanning treeStatus Watch

checking for Ethernet packet loss 109checking utilization 86defined 32identifying a broadcast storm 94identifying duplicate FDDI MAC addresses 103identifying FDDI ring errors 121setting thresholds 55

Stop and Start events 57storm. See broadcast stormssubnetwork mask

defined 69

INDEX 153

switches. See devicessymptoms

analyzing 25recognizing 24software alerts 24user comments 24

TTelnet

accessing the device console 49checking file server response 124defined 40examining a routing table 50out-of-band connections 40, 49use in troubleshooting 40

testingequipment 28proving a theory 26

TFTPcompared to FTP 41defined 41

thresholdsdefined 54hysteresis zone 56setting in LANsentry Manager 55setting in Status Watch 55tips for setting 57

thru ring 73timeout problems

network file servers 123overview 19

token ring utilization 90too long errors

causes and actions 113defined 118

too short errorscauses and actions 113defined 118

traffic patternsevaluating 34RMON2 MIB 140

Traffix Managerdefined 34identifying broadcast storms 96

transceiver, faulty 109Transcend Central

3Com inventory database 54defined 32grouping devices 54

Transcend SoftwareUpgrade Manager 35

Transcend softwareATMvLAN Manager 60Device View 35LANsentry Manager 33MAC Watch 33monitoring devices 54Network Admin Tools 29Status Watch 32Traffix Manager 34Transcend Central 32troubleshooting toolbox 31Web Reporter 33

transmit discardsdefined 118

transport layer 21trap reporting

defined 134device configuration 53

trap-based polling 135Trivial File Transfer Protocol. See TFTPtroubleshooting strategy 23twisted ring

defined 75, 80evaluating 77

Uundesired connection attempt

defined 80evaluating 77

uninterruptible power supply 52upgrading software

to solve problems 29using FTP 40using TFTP 41

UPS 52URL resources 144user complaints 24utilization

ATM parameters 89Ethernet parameters 89FDDI parameters 90historical reports 88problems with 85token ring parameters 90

Vvalid service 26VLANs (virtual LANs) 60

154 INDEX

WWAN Link

monitoring with probes 48Web Reporter

defined 33historical utilization reports 88

wiringtesting 42

World Wide Web. See WWW browserwrapped ring

defined 74identifying 75peer wrap condition 74, 79

WWW browserwith Web Reporter 33