Upload
oscar-dumont-de-andrade
View
229
Download
0
Embed Size (px)
Citation preview
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 1/93N E T W O R K S U P E R V I S I O N
ApplicationTroubleshooting Guide
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 2/93
Table of contents
Introduction
Background The TCP Protocol The Li e o a Packet Other Troubleshooting Factors 7Why UDP is so Di erent Why is Understanding Troubleshooting Important 9
Application Flow The Amazing Sequence o Events in Connectingto a Server DHCP DNS Lookups ARP Broadcasting The Route to the Server 24Establishing the Connection to the Server 26Sender/Receiver Interaction 30 The Flow o Data Closing the Connection 40Understanding How Applications Fail 42 Failures with DNS Failures with ARP Failures with Routing 47 Problems in Establishing a Connection 51 Slow Responses rom Servers 53 Closing the Connection 54
Troubleshooting Applications Baselining
Five Key Steps to Success ul Application Troubleshooting 58 Determine the domain o the problem and
exonerate the network 58 Validating Connectivity to the Application Server 65 Determine the Network Path 70 Application Flow Analysis 74 Using OptiView ® Protocol Expert 78 Fix the Problem Validate the Fix
Document the Fix 8Case Studies Case Study 1: Obtaining Switch Statistics 87 Case Study 2: Investigating WAN Link Per ormance 89
Summary
www fukenetworks com1
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 3/93
Fluke Networks 2
Guide to Troubleshooting Application Problems
IntroductionControversy began early on when computers were connected toeach other and programmers began to write applications that
sent in ormation back and orth between them. The controversy iswhether the apparent slowness in per ormance o the applicationis due to how the program is executing, or the slowness in thenetwork. When networks were rst used, links were very slow whencompared to direct connections between computers, printers anddisk drives. So, networks o ten were blamed or the waiting time
or the system to produce output.
However, today’s networks operate at vastly higher speeds. Whenusers get rustrated with the per ormance o an application, it isdi cult to isolate whether the lack o per ormance is in the pro-cessing in the network attached devices, or in the network itsel .
In this document, we will address this problem. In order to dothis we will need to discuss how applications work. In particular,we will ocus on how they use or ail to use the network. This will include both ailure to send in ormation in a timely manner and
ailure to respond to requests coming rom the network. We’ll also give you a guide in what to look or when troubleshooting
applications.
BackgroundCompanies have become almost totally dependent on computersto conduct their business. How many times have you been in aretail store or government o ce and heard a sta member say,
“We can’t help you because the system is down.” Companiesdepend on computers or a variety o tasks. Revenue andexpenditures are recorded in transaction processing systems.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 4/93
www fukenetworks com3
Application Troubleshooting Resource Center: www fukenetworks com/appts
Background
Orders are taken and inventories are adjusted in similar systems.Most written communications are now done by email ratherthan by letters that are typed and mailed. Even instant written
communications such as instant messaging is moving rom adstatus to serious message communications where business isnegotiated and deals are closed.
Beginning about 2000, businesses began to seriously evolvevoice communications over to data networks with VoIP (Voice overIP). Most communications carriers and a majority o large enter-prise companies use VoIP exclusively or voice messaging. Withthe explosion o YouTube and Internet TV, video is beginning tocome into enterprise networks as well. Very recently, a new areais collaboration. Under the label o Web 2.0, it represents a groupo tools that allow users to share and exchange voice, data andvideo or the purpose o collaborating on projects. These toolsand services are rapidly becoming popular with lawyers, marketinggroups, consulting rms and service providers such as architectsand engineers.
In TCP/IP networks, applications can be distinguished bythe protocol over which they run. Nearly 90% run over TCP(transmission control protocol). That is, they send and receivetheir messages in packets that have been built by TCP. The other10% use the UDP (user datagram protocol). These two protocolswere devised nearly orty years ago but have proven to beextremely robust and enduring. While modi cations in TCP havebeen made over the years, its essential operation has remainedalmost constant. In our document we’ll ocus on TCP applications,
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 5/93
Fluke Networks 4
Guide to Troubleshooting Application Problems
since they represent the large majority o the instances in whichusers are reporting slowness in application per ormance.
The TCP ProtocolI you segment the tra c on a corporate network into categoriesdetermined by the purpose o that tra c, you would probably
nd a break down something like this: email (10%), transactionprocessing (25%), http web tra c (35%), broadcast and control (10%) and other tra c (20%). The only part o the mix that is
based primarily on UDP is in the “other” category. The remainderuse mostly TCP.
TCP is designed to be adaptive to the network. This means when itworks on behal o the application,it modi es how it is interacting with
the network. This is based on howthe network seems to be respond-ing to the requests and commandsit is sending into the network. I the network seems to be ast, it will become more aggressive in sending
in ormation. I it senses a slow-down or problem with packets beingdropped, it will quickly decrease therate at which it sends packets.
TCP is based on a well documented algorithm ( ormula or opera-tional procedure). However, individual operating systems such asMicroso t® Windows® XP and a particular version o Linux make
TCP has three main unctions:
(1) It handles the rate o fowbetween the two end applica-tions. (2) It assures reliability
or the delivery o all data. (3)It provides or the detection o errors and correction o data byusing retransmission o packetsthat are corrupted or dropped.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 6/93
www fukenetworks com5
Application Troubleshooting Resource Center: www fukenetworks com/appts
Background
slight variations in how they implement the algorithm. A detailedunderstanding o the algorithm is not necessary to troubleshoota network. Yet, its e ect is pro ound and later, we’ll see some
examples o how it works.
The adaptive nature o TCP is what makes it very use ul. It can beused over 56 kbps dial-up links and 10 Gbps links. The applicationprogrammer doesn’t have to deal with that signi cant di erence inlink speeds - TCP deals with it. Onthe other hand, this adaptive natureo TCP means that slowness in theapplication per ormance might be theway the application is coded or theway in which the application interactswith the network. I the applica-tion processing is slow, the data tobe transmitted isn’t sent to the TCPstack in a timely manner. This alsomeans TCP can’t deliver packets to thenetwork. In the other case TCP sensesa slow network and decreases the rate at which it o ers packets tothe network.
So, when TCP is the protocol being used, both network problemsand server problems are hidden by TCP’s adaptive nature.
IP Ne tw ork
Transpo rt
App lica t ion
Figure 1 TCP Holds the Applicationand Network Together
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 7/93
Fluke Networks 6
Guide to Troubleshooting Application Problems
The Li e o a PacketLet’s consider a relatively simple model o client/server interactionwhere TCP is being used. Let’s say we have data to send rom the
client to the server. (It works almost identically when the serversends data to the client.) The client application creates a blocko data and sends it to the TCP so tware in the client’s computer.It’s placed in an output bu er and timers are started thatare associated with that block o data. TCP nds out rom theoperating system how much data it can send in each outgoing
TCP block called a segment . Usually it is 1460 bytes maximum.This segment is delivered to the IP so tware and the IP addresseso the sender and receiver are added. Then the IP so twarecontacts the operating system to say an outgoing message isready to be sent. When the operating system gives the okay,the outgoing packet is moved to the network inter ace card (NIC)
to be sent. In this entire process, the per ormance o the clientprocessor is critical to how ast this all can take place.
A ter the packet is sent rom the inter ace card, the networkper ormance is the critical actor. Packets can ollow a short, astroute, a slow, long route or any combination o links. It is common
or a packet to traverse 10-20 individual network links when goingover an Internet connection. Like the adage about the strengtho a chain depending on the weakest link, the speed through thenetwork depends on the slowest link. More o ten than not, theslowest link is the access link into the network and the egress linkout o the network.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 8/93
www fukenetworks com7
Application Troubleshooting Resource Center: www fukenetworks com/appts
Background
At the server, the TCP segment is received on the NIC, deliveredto the IP so tware and the process described above is reversed.During this process, it is the speed o the processing in the server
that a ects how quickly the data is received and the acknowledge-ment is prepared to be sent back.
In conclusion, there are three critical phases necessary or thesuccess ul delivery o the application message to the server’sapplication:
1. Processing o the message in the client to prepare it to be sent.
2. Network delivery.
3. Processing in the server when the message arrives.
Other Troubleshooting FactorsThere are some other things that make it di cult to troubleshootTCP applications. The TCP protocol isn’t very well understood. Justrecently in ormation about TCP’s operation has begun to appearin networking textbooks and industry whitepapers. Yet, it is thesingle layer o the seven layer model which has a signi cantinfuence on the per ormance o the application-to-networkinter ace. Because it isn’t well understood, network engineers andtechnicians usually don’t look at it as a source o explaining whatis causing a problem. O ten, they look at server con gurations,cabling issues, or system memory as explanations when actuallythese have little relationship to the underlying problem.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 9/93
Fluke Networks 8
Guide to Troubleshooting Application Problems
Also, TCP isn’t an especially e cient protocol. For years, datacommunications experts considered an overhead amount o 20%or more to be excessive. But, TCP routinely operates with 50% or
more overhead when the actual application data bytes and thetotal bytes are used to calculate the overhead.
Finally, as we will learn later in this document, TCP reacts veryaggressively to packets that are dropped within the network.When a single packet is dropped, a le trans er may decrease itsdelivery rate by 70% or more.
Why UDP is so Di erentWhile we won’t examine UDP in detail in this paper, it is worth-while to consider how it compares with TCP. That will ampli y whatwe have learned about TCP.
UDP makes no provision or fow control, error detection or correc-tion, or the reliability o data delivery. These responsibilities arepushed up to the application. There ore, since UDP involves muchless processing, poor per ormance is more o ten dependent on theper ormance o the network or the slowness o the processor unit.I the processor is slow, other applications will show the sameslowness. This makes the isolation o the cause o slowness lesscomplicated than in the case o TCP applications.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 10/93
www fukenetworks com9
Application Troubleshooting Resource Center: www fukenetworks com/appts
Background
Why is Understanding TroubleshootingImportantWhen applications appear to be slow, rustrated users waste time
and energy. Slow applications cause lost time which, in turn,means lost money. Also, IT sta time is misspent. In someinstances, help desk or PC analysts expend much o their timerebooting client stations or checking IP con gurations andcabling connections.
When an e ort is made to learn about troubleshooting applicationsand the skills are applied to the system to improve the per or-mance o both the network and the applications, it’s an investmentin the company. It pays o in dividends the same way that savingon shipping costs, reducing uel consumption or eliminating anyother ine ciency, pay o .
But troubleshooting TCP applications isn’t easy. Unlike voice orvideo, which are typically UDP based, application per ormancedegradation isn’t obvious. I the sound is bad or the TV screenshows distortion, everyone sees it immediately. But with TCPapplications like transaction processing systems, per ormancetends to be judged based on how the system behaved yesterday.I it is slower today than it was yesterday, users begin tocomplain. Even worse, i the application degrades slowly, thechange may not be detected at all and the company begins tolose money through the ine ciency.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 11/93
Fluke Networks 10
Guide to Troubleshooting Application Problems
Understanding Types o Applications
There are many examples o applications that will all under ourscrutiny. Web applications include both applications that actuallycommunicate over the Internet and any that use a browser suchas Internet Explorer® or Fire ox.® The latter are o ten calledweb enabled applications . It is becoming more common orapplications to be described as “browser-based.” This generallymeans that the user begins by opening a browser and connectingto the server application rom within that browser window. Mosto us have used these applications when we do on-line banking,order a movie, or reserve an airline ticket. What is important to usis that the underlying tra c will be transmitted between the clientand the server using HTTP (hypertext trans er protocol) or HTTPS(secure hypertext trans er protocol). Both o these are TCP basedapplication program inter aces.
SQL(structured query language) applications involve calls roma client application to a server database application or recordsor tables rom a SQL database. These can be, or don’t have to be,web enabled and use HTTP or HTTPS. For example, the clientapplication can present a result to a user that asks or a eld to
be completed on the screen. A ter lling in a value, it is sent tothe server to do a computation. The server takes the result anddetermines that another record needs to be returned to theclient. All o this interaction takes place under the control o theTCP protocol and will be a ected by TCP’s operation, the individual processors’ per ormances and the network’s per ormance.
Closely associated to database activity are transaction processing systems. These o ten overlap SQL database applications.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 12/93
www fukenetworks com11
Application Troubleshooting Resource Center: www fukenetworks com/appts
Background
In transaction processing systems, the emphasis is on takingrelatively small amounts o data and doing a calculation in a shortperiod o time. ERP systems that have individual components such
as an accounts payable module, inventory module, or order entrymodule are examples o transaction processing systems. When auser changes a customer’s telephone number, a transaction hasoccurred. The data to make such a change and the in ormationthat shows that the change occurred are trans erred betweenthe client and the server under the control o TCP.
Imaging systems , such as those that inventory photos or x-rays,are also very o ten TCP based. They are usually very large les andmoving them rom a server to a client is a bulk trans er. Such bulktrans ers are the reason TCP was originally designed. One majorapplication program inter ace that is used to do this is FTP( le trans er protocol). It can be executed rom the commandline, rom within a le trans er program or rom within a browser.A user clicks on a thumbnail o the image they want to view andthe FTP protocol is invoked to retrieve the image rom the server.FTP is one o the best examples o simple application o the TCPprotocol. The client identi es the le to be retrieved (in this casean image), and a separate connection is opened or the trans er.The emphasis in the operation is to send as much data as possibleas quickly as the network can handle it. These applications cansaturate links on networks more quickly than almost any other typeo application. As a result, they appear to be more sensitive to themethod o operation dictated within TCP. Small network problemsbecome very evident in a le trans er, especially i it is a large le.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 13/93
Fluke Networks 12
Guide to Troubleshooting Application Problems
Data warehousing systems are also dependent on bulk letrans ers. A data warehouse is generally considered to be a highlyorganized, indexed repository o a company’s electronic documents.
When a document needs to be retrieved, the client applicationindicates the document or documents or parameters that uniquelyde ne the in ormation needed. When the server applicationreceives this in ormation, it retrieves the in ormation and sendsit as a bulk le trans er. Consequently they are nearly always TCPbased. And, as a result, they behave very much like other le
trans er based processes.
While VoIP payload is trans erred between the phones using UDP,the call set-up o ten uses TCP. In the exchange that takes placewhen a phone goes o -hook, only small amounts o data aretrans erred. However, the process can involve ARP, DNS and theother process that typically de ne a connection to a server.Poor per ormance or ailure in this connection won’t a ect thequality o the call. It will be apparent in the time it takes to makethe connection or the lack o signaling tones to which we havebecome accustomed. For example, the user might dial but not heareither a busy signal or a ringing tone.
Another category o applications that use TCP are CRM(customer relationship management) systems . They are usuallya combination o transaction oriented and database systems.They also depend on TCP.
Be ore we delve more deeply into the application fow process,
we need to consider one more item.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 14/93
www fukenetworks com13
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
Application Flow
We’re going to consider six issues that a ect application fow:
1. DNS lookups
2. ARP resolution
3. Establishing the TCP connection
4. Sender/receiver (interaction)
5. Data fow
6. Closing the TCP connection
In this section we will not consider UDP applications because theytypically represent 15% or less o the tra c on the network and
it is easier to isolate the cause o the poor per ormance. We will discuss UDP later in the paper.
The Amazing Sequence o Events inConnecting to a ServerVery ew o us ever stop to consider what is actually taking
place when we click on an icon that tells our application to getsomething rom a server. But understanding the process can bevery help ul. Let’s start when our user sits down at their desk tobegin working with the application. We’ll assume the computer wasturned o over night and the rst thing they want to do is checkthe company’s home page or noti cations about weather related
cancellations.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 15/93
Fluke Networks 14
Guide to Troubleshooting Application Problems
Here is a typical sequence:
1. Computer o .
2. Computer powered up.
3. OS loaded, NIC detected.
4. TCP/IP stack checked: IP address, mask, de ault router(gateway) and DNS server known.
5. I DHCP is enabled, the DHCP server is contacted to receive
these values.
6. User indicates to start connection to server (e.g. open IE).
7. IE request is created or the de ault home page(e.g. www.google.com).
8. IP so tware looks up a DNS server IP address
(which we assume is not local).
9. IP so tware sends ARP request to local net to get MAC addresso router.
10. The router receives the broadcast, recognizes that it is thetarget o the query and sends a response containing its MACaddress onto the network as a broadcast.
11. IP so tware sends IP packet with DNS request to router,which orwards to DNS server.
12. DNS server returns IP address or www.google.com.
13. TCP/IP stack sends connection request to server at Google. TM
14. Network delivers connection request.
15. Connection established.
16. (Story to be continued later).
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 16/93
www fukenetworks com15
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
It’s rather easy to see that there is much that can go wrong.We’ll be re erring back to this sequence later in our discussion.
With the dependence that companies have on TCP basedapplications, we now turn our attention to the six phases o the TCP connection between the client and server applications.
DHCPThere are two ways a user can make sure that a client has the
appropriate con guration parameters to connect to the network.One way is to assign the values through the operating system.In Windows,® a screen in the Network Settings allows entry o theIP address, subnet mask, de ault router, and DNS server.On the other hand, in the same window, the user can check theradio buttons to obtain these automatically. In this case, the
dynamic host confguration protocol (DHCP) will be used.Figure 2 illustrates the process.
client server
DHCPREQUEST
DHCPOFFER
DHCPDISCOVER
DHCPACK
Figure 2 DHCP Process
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 17/93
Fluke Networks 16
Guide to Troubleshooting Application Problems
To begin, the client sends a local broadcast indicating it needsservice rom a DHCP server. This rame is called aDHCP Discover .Typically, all DHCP servers that hear the broadcast will respond
with a broadcast, which is called a DHCP O er . The o er will contain the con guration parameters the client needs. More thanone o er may be received since there may be multiple DHCPservers. The client chooses one o the o ers by sending a broad-cast called a DHCP Request . The servers that sent the o ers thatweren’t accepted realize they are no longer part o the process.
The server that made the accepted o er completes the assignmento the parameters by sending a broadcast called the DHCP
Acknowledgement .
In the event that the DHCP server is on a di erent network thanthe client, the local router that separates their networks mustknow to orward DHCP broadcasts. This orwarding action is calledDHCP relay. Routers do not normally orward broadcasts.
DNS LookupsBe ore, we briefy mentioned DNS. But now let’s consider it inmore detail. DNS re ers to both the protocol domain name services
and the device called the domain name server . While it seems tocontain a redundancy, a DNS server is a domain name servicesserver. The purpose o DNS is to match IP addresses o hosts tothe name that resides in that host. For example, i Apple BakerCompany, Inc. has the domain name abc.com registered with theInternet community and they have also registered the address
221.221.13.0, we say that abc.com maps to 221.221.13.0.
221.221.13.0 <-> abc.com
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 18/93
www fukenetworks com17
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
When a process queries with a name and learns the associatedaddress, we say the name was resolved . Most o ten names areresolved to provide the address. However, it is sometimes
necessary to resolve the address to learn the name. This is knownas a reverse lookup. DNS server can do both o these. The DNSserver is a computer with a stored table that contains the mapping(or association) o many names and IP addresses. Note that theDNS table does not contain the hardware addresses o the devicethat has a particular IP address. That’s in a di erent table and
we’ll consider that later.
In the sequence o events on page 14, the client browser knew thehome page was www.google.com. In steps 10 and 11, you see thatthe client browser asked to be connected to the server at Google TM But the client’s TCP/IP so tware didn’t know the IP address orwww.google.com. As a result a query was sent to the DNS serverand the IP address was returned.
On some rare occasions, another method is used to resolve namesto addresses. Clients will store a host name table. This table actslike a local DNS table and also the administrator o the local computer to create xed, permanent associations between namesand IP addresses. This is occasionally used in manu acturingoperations where there is a need, or desire, to avoid a name server.This technique is not adequate or situation where the user mightconnect to the public Internet because the names that need to beresolved are unpredictable.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 19/93
Fluke Networks 18
Guide to Troubleshooting Application Problems
Now that we know something about how DNS works, let’s considerwhat can go wrong. First, it is important to understand thatapplications are rarely written to send messages or data to an
IP address. It is almost always sent to a server with a name.So, DNS is almost always invoked. Second, the address o the DNSserver that is con gured in the client or o ered by DHCP is o tenthe wrong server or one that isn’t easily reached as in Figure 3.I it is the wrong server, the name query will be relayed to anotherserver, and potentially to more servers until it reaches one that
can resolve the name. Consequently, to the client application,it appears the query was handled slowly by the local DNS server,when, in act, the query was resolved by a device that was much
urther away. In addition, or e ciency and redundancy purposes,many large enterprises have many DNS servers. In one largeuniversity, there are ten DNS servers. Users are emailed a list and
they may pick two to con gure their client. In a case like this, thelikelihood that the user picks the best two servers, assuming thereare two best, is less than 3%! So, in most cases i there are manyDNS servers, you can probably assume the best two haven’t beencon gured in the client.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 20/93
www fukenetworks com19
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
Figure 3 The Location o DNS
The packet that contains the query name is called a command.The response, coming rom the DNS server that contains the IPaddress, is called the response. In Figure 4 you can see three DNScommands and three corresponding responses.
DNS Serverclient
Local Network
Ideal
DNS query
client DNS ServerRemote NetworkInefficient
DNS query
14 router hops
Local Network
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 21/93
Fluke Networks 20
Guide to Troubleshooting Application Problems
Figure 4 DNS Commands and Responses
So, in conclusion, DNS can be slow due to network slowness,improper or poor choice o DNS server, or a lost query. In thelatter case, usually the client automatically sends a second or thirdrequest a ter it times out on the rst request. DNS can also ail,i the query contains a name that can’t be recognized or i noDNS server can be ound. O these two possibilities, the second ismore common because the administrator o the computer lists anonexistent DNS server or provides an invalid address in the TCP/IPcon guration. For example, you might be surprised by how manyusers con gure their TCP/IP stack by inserting the company’s email or web server in the DNS server eld.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 22/93
www fukenetworks com21
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
ARP BroadcastingARP (address resolution protocol) is used when a device knowsthe IP address but doesn’t know the MAC address o the same
device. In our description o connection to the server on page 14,step 9 indicates what happens. The client was con gured with theIP address o the router. But sending the packet requires the MAC(hardware) address o the device. The ARP broadcast containsthe query and the response to the query lls the need and theaddresses are resolved. When the client learns the MAC address,
it will temporarily store it in a table called its arp cache(it is not common to use upper case letters in this case.)
Figure 5 The ARP Command
Figure 5 shows the arp table o a client. This table is available inWindows® clients by typing the command arp –a on the commandline. The entries in the arp table are usually stored or abouttwo minutes in Windows.® Then it is discarded. Other operatingsystems store the entries or di erent lengths o time. All operat-ing systems supporting TCP/IP will have such a table. The arp tableis a good place to look to rst to see i a device knows the addresso a particular local station. It is important to remember that thearp table contains only devices rom the local network (subnet).
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 23/93
Fluke Networks 22
Guide to Troubleshooting Application Problems
Devices beyond a router will not be listed because they are onanother network.
Another view o this problem is illustrated here:
IP A <--> IP B
MAC x MAC y
Suppose the device with IP address A wants to send a message tothe device with IP address B. It knows its own MAC address but not
the MAC address o the other station. Its application directed themessage to be sent to IP address B. Or, possibly, its applicationdirected it be sent to a name at B and DNS provided the IP addressB. Either way, it still needs the MAC address o B to complete itstask. ARP will be invoked and the entry B and y will be added tothe arp table or temporary storage.
The ARP protocol is a necessary part o TCP network (in the caseo IPv4). ARP is also used or some more discretionary tasks.Sometimes re erred to as discovery techniques, ARP is used whena device like a server wants to see who is actually on a local network. By sequentially sending an ARP query to every possibleaddress and then watching who responds, it can build a list o devices that are actually unctioning on the network. Windows®browsing and many network troubleshooting tools implement thistechnique.
Figures 6 and 7 show an ARP query and the correspondingresponse.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 24/93
www fukenetworks com23
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
Figure 6 The ARP Query
Figure 7 The ARP Response
The Internet standard that speci es the ARP protocol also makesprovision or a process called reverse ARP. This allows a device tobroadcast a MAC address to nd out who has the corresponding IPaddress. While it might seem this could be use ul, it is rarely usedin practice.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 25/93
Fluke Networks 24
Guide to Troubleshooting Application Problems
The Route to the ServerSuppose that the client has now obtained the IP address o theserver and knows it isn’t on the local network. The client also
knows the MAC address o the router. The client has a connectionrequest that needs to reach the server. O course it will send it tothe router which will in turn orward it to another router, and so
orth until it arrives at the router on the server’s local network.Keep in mind, that when it reaches the destination network, thatrouter may need to invoke ARP to get the server’s MAC address.
But how many steps (links) will the connection request passthrough and how ast and reliable is each link? And, is the routeselected the optimal route? Usually that is determined by twothings: how the network was designed and what routing protocol was used. I a particular link is slow and was used to create thepath rom the client to the server, a response to the connection
request will be slow. Had a aster link been used to create the paththe response will be quicker.
In today’s IP networks, routers establish a set o links betweenthe potential sending and receiving networks using a routing
protocol. Investigating how such protocols work is beyond the
scope o this document, but it is important to understand a ewthings that are common to all routing protocols. First, they aredynamic and automatic. That is, they will build a set o linksamong themselves so that all devices can be reached by otherdevices. Second, they periodically update each other so that i alink ails, a new set o paths are created. Finally, they all make
provisions or static routes , or routes that are manually created and
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 26/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 27/93
Fluke Networks 26
Guide to Troubleshooting Application Problems
Establishing the Connection to the ServerThe method o creating the connection or all TCP applicationsis the same. It is widely known as the three-way handshake.
Here’s what happens.
Be ore data can be trans erred over the connection, certainparameters o operation must be agreed upon by the sending andreceiving TCP so tware modules. Recall that TCP is responsible orfow control, reliable delivery, and error control. In order to agree
on how this will be handled, each TCP stack exchanges certainin ormation during this three-way handshake. For example, in orderto establish the maximum number o bytes that will be permittedin each packet, the sender includes a statement o the intendedMTU (maximum transmission unit). While this is normally 1460bytes, it isn’t mandatory that value be used. I the other TCP
module receives this indication and doesn’t like the value andthinks it should be lower, it simply doesn’t acknowledge that itreceived it. The sender must take the hint and lower the value.
A second thing that must be established is how many bytes thereceiver can receive in total in its receive bu er. This value, calledthe window advertisement , is also indicated in the rst packetexchanged in the handshake. The other end records this value andany adjustments it receives so that it never overwhelms its partnerwith too much data. The exchange o these two values provide orfow control.
Another value that must be established is the point at which eachend will begin to count bytes o data. This is a random value calledthe initial sequence number and each announces their value in the
rst two steps o the handshake. From that initial value, every
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 28/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 29/93
Fluke Networks 28
Guide to Troubleshooting Application Problems
Port numbers or server processes generally are well-knownorregistered . That means that everyone agrees on which applicationprocess the number is associated with. For example, email o ten
uses ports 25 or 110. HTTP uses port 80. There are hundreds o others. It’s a good idea to learn most o the common ones. Likeso many other network topics, using your avorite search engineand the phrase “TCP port numbers” will lead you to many lists thatare available. In this step, the client will normally use a randomlychosen port. Windows® o ten selects a value in the range o 2000-
2500.
The SYN packet also contains the sender’s window advertisement.Also, it contains a request or the maximum transmission unit(MTU). Finally, it indicates the act that it is a SYN packet bysetting a single bit equal to one. That particular bit is never setto one unless SYN is being indicated.
In the second step in the three-way handshake, the server sendsa response called the SYN/ACK packet to the client. It containsits window advertisement, initial sequence number and the portnumbers o the two application processes. It also takes thesequence number it received, increases the value by one andreturns it as its acknowledgement value. This is how it tells theother TCP entity that it is now synchronized to its sequencenumbers. It will also change the value o a single bit, theacknowledgement bit to one. There ore, in its packet both theSYN bit and the ACK bit are one and that is why the packet iscalled the SYN/ACK packet.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 30/93
www fukenetworks com29
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
In the third and last step o the handshake, the client sendsa packet with the sequence number increased by one (That’scheating because it didn’t actually send data bytes. It’s the only
exception to the rule about sequence numbers.) This third packetallows the client to acknowledge that it received the server’sacceptance o the opening o the connection. The connection isnow prepared to allow data to fow between the two applicationsrepresented by the port numbers.
A common metric o per ormance o the network application is theround trip time (RTT). This is the time rom when the SYN is sentand the SYN/ACK is returned. This value is one o the things thesending TCP stack uses to determine how much data should beo ered to the network. I the RTT is low, the network appears tobe ast. I the RTT is high, the network or the server appears tobe slow and the TCP stack will adjust accordingly.
In Figure 10, an HTTP Get command is sent to the server in therst step. 65 ms. later that packet is acknowledged. This time
is the RTT. It is important to note that this value isn’t theapplication response time. This is the acknowledgement o theTCP protocol in the server. The application responds in the thirdstep shown in Figure 10.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 31/93
Fluke Networks 30
Guide to Troubleshooting Application Problems
Sender/Receiver InteractionThe most common model or interaction in the TCP/IP environ-ment is the client server model. In this model, the client makes a
request or some orm o service and the server ul lls this request.For example, the client browser may request the objects on a homepage. An ERP client may request a eld rom within a data recordon the server. As illustrated be ore, a client process may invoke FTPto receive a large le. In each case, the client is making a requestthat it hopes the server will ul ll. The server applications that
ul ll these requests are sometimes called services. So, we may saythat a particular service is or isn’t available on the server. The waya client knows whether or not a service (process) is available isby the response to the SYN packet. I the server responds to theSYN with a SYN/ACK, it is in orming the client that the service isavailable. For example, that’s why requests or web pages are sent
to port 80. That port identi es a server that is acting as a webserver. While it might also act as an email server, i it doesn’trespond to SYNs sent to port 25, it is in orming the client itdoesn’t do email services using the SMTP (simple mail trans erprotocol) system. That service is not available.
So, a ter a connection is established, a client makes a requestor in ormation. Say the request is or a web page element. The
server may acknowledge receipt o the request with an ACK packetcontaining no data. Then, it may send the le containing therequested in ormation. In act that is what is being illustratedin Figure 10.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 32/93
www fukenetworks com31
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
Figure 10 A Request and the Response
TCP acknowledgements are governed by a airly complicated seto rules. However, there is a basic set o behaviors we need tounderstand. First, not every packet needs to be acknowledged.The TCP stack, written or the operating system involved, mayimplement a procedure in which every packet is acknowledged,only occasional packets are acknowledged, or some combinationo the two rules. What is rather typical o all TCP stacks is thatdata is sent in a TCP segment and then the receiver waits until one o two things occurs. Either it receives another segment, or200 ms. transpire. I the 200 ms. pass and no additional segmentsare received, it will acknowledge the one it received. I a secondor third segment is received, it may acknowledge them at anypoint that it is ready to do so. Sometimes when the packets areexamined, you will see a very homogeneous pattern: two segmentssent, acknowledgement received, two more sent, acknowledgementreceived and so orth.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 33/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 34/93
www fukenetworks com33
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
• It uses a technique called slow start . In this procedure one ortwo segments are sent and the acknowledgements are awaited.I they return quickly, the sender increases the block size (that is,
the number o segments it sends simultaneously). It will continueto increase the block size until one o two things happens: eitherthere is a time out because an acknowledgement isn’t received(such as when a segment has been discarded) or the total datasent is one-hal the receiver’s advertised window value.
Let’s consider some examples. Suppose the sender has a hundredor so segments o data to transmit. We show the interaction inFigure 11. However, rather than complicate things by using theactual acknowledgement values, we simply use consecutiveintegers that correspond to the number o blocks o data.
Step one starts a ter the three-way handshake has been completedand the client is ready to up load its in ormation. The client sendsa rame and awaits an acknowledgement. The acknowledgementreturns quickly as Ack1. In step 2, because the client’s TCP stackis using slow start and it has received one acknowledgement, itnow increases the send group to two segments and sends segment2 and segment 3. Again, the server responds quickly with Ack3.In step 3, it increases the outgoing group size to 4 because itreceived acknowledgements or two more segments. It sends all
our segments. Once again, the server quickly responds withAck7, assuring the client that it received all our segments.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 35/93
Fluke Networks 34
Guide to Troubleshooting Application Problems
1
client
time
2
server
1
3
4
Ack 1
2
3
4
7
8
Ack 12
Ack 3
Ack 7
12
}
Figure 11 TCP Operation
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 36/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 37/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 38/93
www fukenetworks com37
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
In step 1, the client sends segment 1 and the server quicklyresponds with Ack1. In step 2, using slow start, the client sendstwo blocks o data and the server quickly responds with Ack3.
In step three, again because o slow start, the client sends oursegments, segments 4, 5, 6 and 7. But segment 6 is discardedby the network. When the server receives segments 4, 5 and 7,it doesn’t acknowledge receipt o 7. Instead, it acknowledgessegment 5, the last one that was success ully received in order .In step our, the client has received Ack5 but isn’t sure i six was
lost or is still wandering around the network waiting to arrive. So,the client sends just segment 8. Since, segment 6 isn’t wanderingaround but has been lost, the server quickly acknowledges receipto data but it acknowledges segment 5 again with Ack5. In step5, the client still isn’t permitted to assume that segment 6 is lostso it again sends one segment, segment 9. The server sends Ack5
again. Finally, because the client has received three duplicateacknowledgements (steps 3, 4 and 5), it is permitted to assumethat segment 6 has been lost and it retransmits that segment.Now that the server has segments 4 though 9, it sends Ack9. Asillustrated in the nal step, the client continues as it begins andsends segment 10. It will continue to implement slow start with
sending succeeding segments.
This illustration showed some rather remarkable acts. First,because a single rame was dropped, the sending TCP entityreacted very strongly by cutting back to sending one rame at atime until the problem was resolved by the retransmission. Keep inmind that i the segments each contained 1000 bytes, or example,this reaction may have been caused by a single bit error!
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 39/93
Fluke Networks 38
Guide to Troubleshooting Application Problems
This clearly demonstrates that TCP does not behave well in errorprone environments such as those created by wireless or poorlycabled networks. When comparing the rst and second examples,
it was clear that a single bit error could reduce the throughput bythousands o bytes in the time interval illustrated.
Our third example involves a clean network but a severely over-whelmed server. It’s illustrated in Figure 13.
1
2
3
client
time
2
server
200 ms.
230 ms.
280 ms.
1
3
4
}}
}
Ack 1
Ack 3
Ack 5
4
5
6
7
Figure 13 Overwhelmed Server
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 40/93
www fukenetworks com39
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
The client begins by sending a single segment in step 1. In orderto comply with delayed acknowledgement, the server waits 200ms. to see i more data will arrive. It does not. So, it sends Ack1.
In step 2, the client increases the outgoing bu er to two segmentsand sends segments two and three. The server’s reaction isimportant to understand. I the server were quickly processingthe incoming data, it would respond on or be ore the 200 ms.wait time. Instead, the server uses 230 ms. to acknowledge thatit received the two segments. Then it sends Ack3. In step three,
the client’s calculation o round trip time (RTT) tells TCP that thissituation involves either a slow network or slow server. One o the two is not operating optimally. So, rather than increase itsoutgoing number o segments as it did in the other examples, theclient sends two segments again, segments our and ve. In step3, the server is still operating sub optimally, so it takes it 280 ms.
to determine that it received two more segments. Then the serversends Ack5. In the nal step illustrated, step 4, the client still realizes the RTT is high, so it proceeds to send only two segmentsagain. A ter these segments are delivered, i the server wouldhappen to increase its per ormance (maybe it was busy withanother large computational task), the RTT would be a lower value.
The client’s calculation would indicate that it can increase thenumber o simultaneous segments it is sending.
This example points out an additional important point. A subtlepoint is illustrated by the act that the client never aggressivelyincreased the number o segments being simultaneously sent. Thisindicated slowness somewhere. The turn-around time at the server
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 41/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 42/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 43/93
Fluke Networks 42
Guide to Troubleshooting Application Problems
In some o the early implementations o TCP applications, theconnection was closed by issuing a segment with the RESET bit setto one. While the other TCP stack might see this as an indication
the session was over, it might also correctly interpret it as anindication that the timers and sequence numbers were con usedand needed to be corrected. Issuing resets occasionally causesservers to retain resources or the TCP session long a ter they areno longer being used by the applications. O course, nearly thesame thing might be expected i the FIN segment were to be lost.
But in this case, the other end would not acknowledge receipt o the FIN segment and it would eventually be retransmitted. So, the
our-way handshake is the proper way or a session to be closed.
Understanding How Applications Fail
Failures with DNSIn order to understand how DNS can ail we need to look moreclosely at how it works. When a client wants to make a requesto a server, it usually has the name o the server such aswww.psu.edu or www.fukenetworks.com. DNS is a distributeddatabase that stores the names and the corresponding IP
addresses. The database in DNS is a hierarchical tree whichdenotes two things. First, every node (DNS server) is above orbelow another DNS server. Second, there is a single path betweenany two servers. The nodes close to the root o the tree are calledroot servers. They keep track o who knows all the names in theirown branch which is below them. This can be seen in Figure 15.
For example, the .com server knows who is responsible or a namelike fukenetworks.com because the su x is .com. Likewise, thenode fukenetworks.com knows who can resolve names containingsupport.fukenetworks.com.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 44/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 45/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 46/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 47/93
Fluke Networks 46
Guide to Troubleshooting Application Problems
server 1 server 2
firewall
Where isServer 1?
Here Iam!
Figure 17 Proxy ARP
In this case when a station wants to communicate with Server 1,it will send an ARP broadcast. The rewall will respond with aunicast response. This creates the impression in the station thatthe rewall is, in act, Server 1. Network administrators like thistechnique because it completely hides the act that the two
servers are on a separate, protected network.
But proxy ARP is prone to being miscon gured. Especially in thecase o servers that need multiple IP addresses on a physical NICand in cases where it is used in mobile IP applications.
On some occasions con guration mistakes can lead to incorrect
ARP responses. Consider the example in Figure 18. Suppose theserver has been accidentally con gured with the incorrect address
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 48/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 49/93
Fluke Networks 48
Guide to Troubleshooting Application Problems
and most servers that act as routers use the RIP (routing in orma-tion protocol). That protocol is over thirty years old. Routers andservers that use the protocol choose the destination path based on
the number o hops (links between routers). However, no provisionis made to assess the speed or throughput o a link. Consequently,links might be in a path that is very slow when an alternative pathis not being used. The best way to spot a slow link is probably touse a tool like trace route. In Figure 8 (page 25) you can see thatlink 8 had a higher than normal RTT with 88 ms. While that may be
an aberration, repeated use o trace route might con rm that it isalways the slow link.
One actor that can be important is the network router discardingpackets. Among routers, there is no provision or retransmission.So, the client or the server must determine that the routers havediscarded packets. When they do, they will generally retransmit thepackets. But you will recall that we pointed out that this retrans-mission can happen signi cantly later and cause TCP to slow downits rate o delivery by a large actor. So, why would routers discardpackets? There are two main reasons. First, a packet arrives at arouter and it is corrupted. That is, it has been damaged and therouter detects that some in ormation it contains is incorrect.It won’t try to determine the correct in ormation. That woulduse valuable processor cycles. It discards the packet. In certaininstances, it will send a packet called an ICMP (Internet Control Message Protocol) packet back to the source indicating thediscard. But the packet is lost. The second thing that can happenis that the router simply becomes too congested (that is, busy).Nearly all routers will reach a threshold where they begin to
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 50/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 51/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 52/93
www fukenetworks com51
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
or the cause and miss the actual source o the degradation inper ormance. Other causes o packet loss can be an over subscribedquality o service technique, bad NIC cards in routers, or
intermediate wireless links (such as wireless bridges).
Problems in Establishing a Connection
Applications communicate between ports. In act, in TCP andUDP discussions, we say the TCP session is de ned or uniquelycharacterized by these our parameters: the sending IP address,the sending TCP port number, the receiving IP address, and thereceiving TCP port. Programmers re er to the set o our parametersalong with the protocol as a socket . These our values
(IP address A, port x) <--> (IP address B, port y)
uniquely identi y the logical communications channel between theclient and the server.
client server
IP A IP B
APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC APPLIC
IP Destination Address BIP Source Address ASource Port X
Destination Port YFigure 20 Defning a Session
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 53/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 54/93
www fukenetworks com53
Application Troubleshooting Resource Center: www fukenetworks com/appts
Application Flow
Most servers have several open ports which represent servicesit will provide. For example, a server with ports 25 and 80 openwould be available to provide email and web page serving. I the
server actively announces these services, we say it advertises theservices. So, o ten a network support tool will scan the network tosee which devices have open ports and then send a SYN to veri ythe status o the port. Among the bad guys are hackers that areattempting to do reconnaissance on your network. They also wantto know which devices are servers and which ports are open.
But it isn’t likely that they have honorable purposes.
Slow Responses rom Servers
In the client-server relationship, when a client sends a request toa server, it is usually represented by a packet that has a unique
ormat which is determined by the application program inter ace.For example, with HTTP, when a client requests a home page orpart o a page rom a web server, an HTTP GET packet is sent to theserver. It contains a signi cant number o parameters such as theversion o HTTP being used, the date and time, the le requestedand other values. The GET is sent only a ter the client and sessionserver has been established by the three-way handshake.
How quickly the service request is ul lled may depend on several actors. The server may be slow to respond because it needs to
compute something or search or the le. In the case o HTTP,it may even send a response that indicates acknowledgemento the request without supplying the le to ul ll the request.It’s like saying, “I’m working on it.” The request may get lost or be
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 55/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 56/93
www fukenetworks com55
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
This may orce the operating system to keep memory resourcesand times active when they are no longer needed. Keeping theresources viable will take additional computational cycles rom
the server and decrease its overall per ormance.
Troubleshooting ApplicationsO ten troubleshooting is challenging and a “best” method isn’tobvious. Some technicians use the ew tools they are amiliar withto try to solve every problem. The adage, “When the only tool you
have is a hammer, everything begins to look like a nail, “ comesto mind. In network troubleshooting, technicians who know thephysical level sometimes attempt to solve a problem with cabletesters and meters. Technicians who have an RF communicationsbackground gravitate towards spectrum analyzers.
On the other hand, team members working on a problem mayeach have a di erent view o the same problem. The systemsanalyst will think o CPU per ormance, insu cient memory, and
ragmented disk space as the cause o the problem. The networkarchitect might see it as routing issues. Or, the manager o network in rastructure might decide to test the WAN links or
throughput or replace copper links with ber connections.
Good troubleshooting requires having a broad base o experience,a solid knowledge o the technology involved, and the availabilityo the proper tools.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 57/93
Fluke Networks 56
Guide to Troubleshooting Application Problems
BaseliningOne o the techniques that is used the least is baselining thenetwork. Yet, it is one o the most use ul techniques. Baselining isrecording data based on the “normal” per ormance o the network.By knowing how the network per orms when users are satis ed,you have a point o comparison when problems are reported. Havethe level o broadcasts gone up? Are there new protocols on the
network? Is utilization higher? Is the ERP application responsetime slower? Questions like these are impossible to answer withoutthe corresponding data gathered under normal operating
OptiView® Analyzers: Portableand Rack Models.
Fluke Networks’ OptiView Series IIINetwork Analyzers are available in twoorm actors to give you a clear view o
your entire enterprise – providing visibilityinto every piece o hardware, everyapplication, and every connectionon your network.
Choose the Integrated Network Analyzeror portable, all-in-one analysis or the
Workgroup Analyzer or permanent orsemi-permanent deployment that acts
as a “virtual network engineer 24/7” in the core or at remotesites. Or, use them together to create a power ul combination– a troubleshooting tool or the access layer and an analyzerwatching the core, remote site or critical network point. Networkpro essionals can conduct all the necessary tests at the remote sitewithout ever leaving the headquarters site.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 58/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 59/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 60/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 61/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 62/93
www fukenetworks com61
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
An easier approach is to use the same screen that we used totest DHCP. In gure 23 we can see that the OptiView has beencon gured to resolve the name DBase02.appsvr04. net.com. The
test was success ul and the IP address was returned in 140 ms.By clicking on the edit button on the right any speci c name canbe queried. By clicking on the Add DNS Server button, any speci cserver can be tested.
Figure 23 DNS Test using OptiView
A test which we use less requently is the reverse lookup.By submitting an IP address, the DNS server should return thecorresponding server that uses that name. While the command linetool nslookup can be used, you can add a reverse name lookup tothe OptiView by just entering the IP address in the screen thatpops up when you select a DNS Server and click Add . This is shownin Figure 24.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 63/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 64/93
www fukenetworks com63
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
and ping allowing up to N hops (ping – i N). You can see the rsto these illustrated in Figure 25. The continuous ping is usuallystopped with CNTL-C. While there are slight variations o the ping
command across di erent operating systems, most support theusage described here.
Figure 25 Ping with Designated Payload
So, suppose we believe that we have a route that is droppingpackets that are over 1000 bytes because the MTU isn’t thestandard value 1460. We can send a ping that speci es a payloado 900 bytes ollowed by a ping that speci es a payload o 1100bytes. This will give us an indication. Keep in mind that thisdepends on the act that nothing in the path is blocking ICMPpackets or security reasons.
Ping connectivity tests are easy to con gure and run in theOptiView. In Figure 26 you can see the screen that appears whenyou select a device on the Discovery screen and click on Device
Detail. From here you can run ping tests and other tests that wewill discuss later.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 65/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 66/93
www fukenetworks com65
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
Figure 28 The Ping Report
Validating Connectivity to the Application Server
There are three tasks to be accomplished here. First, we should seei we can connect to the application o interest. Second, we shouldmeasure the response time o the application. Third, we shouldcheck the vital statistics o the server.
Application Connectivity . As we discussed earlier, applicationscommunicate through ports. In particular, server applications usewell-known ports. Once the server has been selected, we can usethe same screen that we used to generate a ping test. The dropdown menu allows us to choose the TCP connectivity test. This isshown in Figure 29.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 67/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 68/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 69/93
Fluke Networks 68
Guide to Troubleshooting Application Problems
were, we get a list o the transactions, a Bounce Chart and atable o Transport Statistics about those transactions. From thetransport statistics we can see what the transaction was about
and how quickly the server responded. This is shown in Figure 34.
Figure 32 Server Overview Screen
Figure 33 HTTP Servers
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 70/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 71/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 72/93
www fukenetworks com71
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
From the command line o the operating system trace route (layer3 only) is a utility that is actually an extension o the ping utility.In Windows,® i you type the command tracert x.x.x.x, a series o
ping packets are sent to the target. In the rst packet, the hopcount (TTL) is set to zero. There ore, the rst routing device thatreceives the packet is required to discard it and report its action tothe source. Then the utility increases the hop count in the packetto 1 and sends the ping. This packet passes thought the rstrouter. But that router is required to decrease the hop count by
one. So, when it arrives at the second router, the hop counthas been reduced to zero and the second router drops thepacket. However, like the rst router, it reports to the sourcethat it dropped the packet. You can see what is happening.Each successive ping packet goes one more hop. Also, each time apacket is discarded, the source gets a report telling it who dropped
the packet. By using a timer on each ping, the source can build areport like the one in Figure 36.
Figure 36 Trace Route
This creates a listing o the path that was ollowed rom the sourceto the target x.x.x.x. The trace route tool also gives us important
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 73/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 74/93
www fukenetworks com73
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
in ormation about how long it takes the ping to travel acrosseach link.
Most technicians either don’t know or don’t remember the switchesused with trace route. So, OptiView makes trace route an easy-to-use tool. In addition, it determines the layer two and layer threeroutes on a single screen and allows you to move on to router andswitch queries and more. Figure 37 shows a trace route analysis
rom OptiView.
Figure 37 OptiView Trace Route
When you nd a link that is slow or appears to have a problemon one o its attached links, you can query the device to lookat the link rom the perspective o the switch or router. Later,
we will consider a case study in which solving the problem usessuch queries.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 75/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 76/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 77/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 78/93
www fukenetworks com77
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
Troubleshooting Intermittent Problems
When intermittent problems occur, most IT pro essionals turn to packetcapture to diagnose the problem. The OptiView analyzers make it easy totroubleshoot intermittent problems with advanced triggering and lter-ing to capture the tra c be ore, a ter or around the event occurrenceand ensures the event is captured the rst time to avoid doing randomtra c captures that may not contain anything o interest. To capture aspeci c event the analyzer must inspect the contents o each packet tosee i it matches a pattern or error message string which is indicativeo the event occurring. The string matching is per ormed in hardware
in real-time with line-rate gigabit capture to ensure all the relevantpackets are captured – a ter all, you can’t analyze packets that werenever captured. A total o eight sets o triggers or lters can be de nedto trigger a capture unattended or later analysis, allowing analysiswhen you have time, not when the event occurred.
Free String Match
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 79/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 80/93
www fukenetworks com79
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
Figure 41 The Capture Tab
This shows the details o the packets on the le t side o the screenand a bounce chart on the right side o the screen. The bouncechart shows the interaction and the time and size o packets inthe interaction. The packet detail on the le t is broken into threesections. The top o the screen shows a listing o the rames asthey were captured. The middle section shows the detailed decodeo the rame highlighted and the bottom section shows both theASCII value and the hex value o each byte in order.
From a portion o the bounce chart shown in Figure 42, you cansee the TCP connection being made between port 1080 on theclient and port 80 (HTTP) on the server. It takes 73.193 ms.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 81/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 82/93
www fukenetworks com81
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
Figure 43 Server In ormation
The screen that appears will look like Figure 44. It contains a
wealth o in ormation about the exchange between the clientand the server you picked. Here you can look or evidence o poorserver per ormance, small payload sizes and other issues.
Reporting and documenting
Fluke Networks’ OptiView® Reporter works with the OptiView IntegratedNetwork Analyzers and Workgroup Analyzers to provide documenting andtrending capabilities rom one central location. The OptiView hardwareagents automatically per orm detailed device, VLAN and networkdiscovery, problem identi cation, and in rastructure device analysis.This in ormation, in turn, is automatically imported into OptiViewReporter or reporting, trending, and event noti cation.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 83/93
Fluke Networks 82
Guide to Troubleshooting Application Problems
Figure 44 Server Details
Starting rom the Overview screen, i you click on the Transactionstab, you can see all the transactions or a particular server. This isshown in Figure 45.
Figure 45 The Transactions Tab
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 84/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 85/93
Fluke Networks 84
Guide to Troubleshooting Application Problems
From the Overview screen, you can click on the Issues Tab.That will display all o the issues that OptiView ound in theconnection. In Figure 48 you see a list o retransmissions thatwere discovered. As we discussed in the section on TCP operation,retransmissions will cause the TCP timing algorithm to assume thelink is not robust. As a result it will slow down the rate at whichit o ers packets to the connection.
Figure 47 The Bounce Chart or Slow Server
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 86/93
www fukenetworks com85
Application Troubleshooting Resource Center: www fukenetworks com/appts
Troubleshooting Applications
Figure 48 The Issues Tab
Fix the Problem
The third step in our ve step process is to x the problem.Because there can be a great number o causes, we can onlyassume you’ve ound it by this time. However, you should knowwhat single or what ew items combined to cause the inadequateper ormance o the application. Sometimes this step will be easy.
However, it can be both time consuming and expensive. Forexample, it could be as easy as changing the con guration o arouter. On the other hand, you might need a larger server or a
aster WAN link. However, it could also be the application itsel – as we pointed out earlier, TCP routinely operates with 50% ormore overhead, so imagine what is happening when an application
uses minimum length rames o 64 bytes. The MAC rame head-ers, IP and TCP headers and rame check sequences can occupy
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 87/93
Fluke Networks 86
Guide to Troubleshooting Application Problems
Application Troubleshooting Resource Center: www fukenetworks com/appts
46 bytes o the rame, leaving only 18 bytes available to trans erdata. Badly written applications could require thousands o ramesexchanged to complete a single simple transaction.
Validate the Fix
I the problem a ected one particular user, spend some time to besure the user sees a di erence a ter the problem has been xed.I the entire network was down, thoroughly check that all partsare now up and unctioning well. I connectivity to a serverapplication was the issue, don’t just check connectivity to theserver. Check that the application can be reached by making surea complete transaction can occur. I a particular link was slow, runa series o ping tests over time and check back to make sure thatall o the response times are good. Taking a little extra care a ter a
x is made can lead to work time saved because the problem won’toccur again.
Document the Fix
It’s a very good idea to keep written records o what you x and tohave the records well organized. This is important to other individ-uals who may need to x a similar problem i you aren’t availableto consult. Also, records are necessary or documenting the needto acquire larger servers or test equipment. Documentation shouldinclude screen shots, notes stored in text documents, spreadsheets with incidents and time on task and a host o other kindso documents. What is important is to create a record or your
uture re erral and or others who may have need o such history.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 88/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 89/93
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 90/93
www fukenetworks com89
Application Troubleshooting Resource Center: www fukenetworks com/appts
Case Study 2: Investigating WAN LinkPer ormance
In this case study, we’ll see how the OptiView can be used toanalyze a WAN link that appears to be a bottleneck in the path tothe server. In this scenario, access and per ormance across a WANlink has been slow. The service provider is saying that the companyneeds an additional T-1 circuit.
In Figure 51 below, we begin with the Device Detail Inter ace
screen that is similar to the one in Case Study 1. This time we arequerying a router.
Figure 51 Inter aces on the Router
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 91/93
Fluke Networks 90
Guide to Troubleshooting Application Problems
By highlighting the inter ace with the suspected WAN link, we cansee some o the inter ace statistics and see that it is, in act at T-1circuit running at 1.544 Mbps and that the MTU is 1500 bytes.
This means that IP packets with the standard maximum size o 1500 bytes will not be ragmented in order to traverse the link.By clicking on the bar graph button next to View . the screen inFigure 52 is produced.
Figure 52 Utilization o the WAN Link
This screen makes it obvious that the utilization o the WAN linkis only slightly under 50%. This screen is periodically updated,so you could study it over a period o time to be sure about yourconclusion. As we mentioned be ore, it would be wise to moveon to an analysis o the server or the application running on
the server.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 92/93
www fukenetworks com91
Guide to Troubleshooting Application Problems
Application Troubleshooting Resource Center: www fukenetworks com/appts
SummaryIn this document, we have provided in ormation on how networkapplications work. We have discussed the typical sequence or a
client to use when it makes a connection to a server and asks ora service to be provided. As part o that analysis we considered theDNS lookup, the role o ARP, how the TCP connection is opened,how the request is sent and the response is received, and howthe connection is closed. We considered the major categories o applications such as web applications, database applications and
transaction processing systems. During that explanation we notedthat 90% o the applications are based on the TCP protocol.
In analyzing how the client interacts with the server, we consid-ered the route to the server and how it can hide problems thata ect per ormance. We also did a brie tutorial on the operation
o TCP since its operation has such a signi cant infuence on howwell the application appears to per orm. We discovered that whenpackets are lost in the network, the impact on TCP per ormance issigni cant. This led us to the conclusion that the quality o thenetwork is very important.
In considering TCP operation, we described the three-wayhandshake that is used to create the connection, the methodo assuring reliability o the trans er o data and the our-wayhandshake that closes the connection.
8/8/2019 Trouble Shoot in Guide
http://slidepdf.com/reader/full/trouble-shoot-in-guide 93/93
Guide to Troubleshooting Application Problems
We moved on to a discussion o a Five Step Process to troubleshootapplications. We stress that a logical approach would include:(1) determining the domain o the problem; (2) per orming an
application fow analysis; (3) xing the problem; (4) validatingthat the problem is xed; and (5) documenting that was xed andhow it was xed.
Using the techniques suggested in this paper should reduce ngerpointing between your application developers and your networksta . And, it will increase your success in solving networkproblems, reducing costs to your company.
For more in ormation on Network Troubleshooting and ApplicationDiagnostics visit the Application Troubleshooting Resource Centerat
www.fukenetworks.com/apptsto request a ree 5-day network
analyzer trial, sign-up or upcoming events and downloadtechnical content