SUMMARY: Looking for advice on trouble-shooting network problems

From: Marlys.A.Nelson@uwrf.edu
Date: Wed Aug 21 1991 - 09:18:14 CDT


About a week ago I asked for sources of information explaining how to solve
network problems. In my specific case, I had just had a problem with a bad
cable and solved it via the trial-and-error method. This made me realize I
needed better methods or more knowledge for the next time.

I got lots of good ideas from various people. Some hardware solutions for
finding cable problems and other networks things were mentioned, though some
said this can become expensive so you may need a larger network to justify
buying them. In my case, while the Sun network is small, there are several
independent PC ethernets on campus so counting everything we will probably
have enough gear to justify the cost. Also, I neglected to mention in my
original posting that there are also DEC machines on the same ethernet as the
Suns. These seem less tolerant of network cable problems so there is now more
justification for buying tools to support our network.

Several software solutions for looking at the network were suggested:
etherfind, nfswatch, netstat, nnstat and SNMP. These seem to be standard with
Sun or are available via anonymous FTP. I have since also done a little
exploration and found some similar software that runs under VMS on DEC
machines. I have been told that some software exists for PCs as well.
In my original posting, I mentioned spray; from those who commented on this,
most seem to say forget it, sprays often losses packets and varies based on
CPU speed.

Several books, RFCs or other information available on the net were mentioned.
The specifics are in the summaries below.

Thanks to everybody who responded. I have several ideas on things to follow
up on.

Marlys A. Nelson System Manager, Network Manager,
Academic Computing Systems Programmer, etc., etc.
Univ. of WI - River Falls
River Falls WI 54022
Internet: Marlys.A.Nelson@uwrf.edu

------------------------Summary of responses follows------------------------
From: Stefan Mochnacki <stefan@centaur.astro.utoronto.ca>

Re "spray", try the "-d nnn" parameter, wher nnn is the delay in
microsecond (default=0). A small number like 20 should reduce lost
packets substantially, especially by PC's.

I'm afraid with hardware you need hardware solutions. We recently had trouble
on our "little" network (3 Suns, 10 PC's approx., 1 repeater), and not
even the University's network gurus had the proper equipment. So we are
reduced to making guesses, and if really desperate we sequentially
disconnect segments until the problem disappears, thereby indicating the
offending segment. Bad connectors (or rather, badly attached connectors)
are a common problem; also BEWARE of GROUND LOOPS; the jackets of connectors
should not be touching and conductors such as PC cases or other connectors
on the back of a computer. If you reduce or eliminate ground loops you can
save yourself a lot of hassle, although like everything with thin ether
there's a lot of tolerance for slop (until it all builds up and gets ya).

--------------------
From: Steve Simmons <scs@wotan.iti.org>

For the type of trouble you describe, sweat work and trial-and-error
are the only final solutions. Physical cabling problems are the number
1 tough problem of ethernets, and no sniffer will help.

For the problems you probably had, the only tool that really helps is a
TDR (stands for Time-Domain Reflectometer, I think). This beast puts a
signal on your ethernet and "listens" for reflections. Whenever this
is a physical change in the net, there is a reflection. This includes
transcievers, barrel connectors, splices, breaks, etc. With experience
one can read the echos and do a fair amount of trouble-shooting. I've
never used a TDR myself, so speak only from rumor and reading.

There is one half-way decent book on physical ethernet issues. It's
called "Keeping The Link" and is written by Nezmow (I think) [sorry, my
copy is at home]. Published in 1988. It's not great (and in places
it's not good) and is already severely dated (ignore *everything* the
man says about broadband and fibre). But it's better than nothing:
copiously illustrated, and full of tables of all those little ethernet
issues like cable lengths, impedence, etc.

--------------------
From: stern@sunne.east.sun.com (Hal Stern - Consultant)

i cover most of your questions in "Managing NFS & NIS", published
by O'reilly & associates. you can order from them directly
or probably do better at your local technical bookstore (or univ.
store).

some comments:
(a) the sniffer is probably my favorite network analyzer. it is
        useful if you have a low-level problem -- like part of the
        network that generates lots of bad packets -- or a problem
        with mixing protocols, and you're trying to figure out where
        all of those XNS or DECnet packets are coming from. with
        all suns, you may not need a sniffer -- the sunos tools should
        tell you enough to find out where the bottlenecks are.
(b) you can also use etherfind and something like "nfswatch" to
        determine where requests are coming from on your network
(c) spray is not a very reliable test -- it tests the network interface
        in conjuction with the network. there's no flow control in it,
        so if the remote CPU is a little loaded, you'll just drop
        spray packets. looks like the "network" is slow, but it could
        be that the remote host is busy.

--------------------
From: bob@kahala.soest.hawaii.edu (Bob Cunningham)

You'll probably get more substantial responses, but briefly, you need
to do two things: have good physical network gear (and know how to take
care of and troubleshoot it); and--especially as your net grows--have
(and learn to effectively use) some software monitoring tools.

If you can, plan to use 10baseT (aka "universal twisted pair") ethernet
cabling in the future rather than 10base5 (thickwire) or 10base2
(thinwire). You're probably using one of the latter now. There
are a number of advantages, the primary one being that 10baseT hubs
are "smarter" and allow you to do most of the troubleshooting you're
doing now via software tools rather than having to manually measure
continuity and such. And, he overall setup is such that an open connection
to one system will only distrupt that system, not everything else
on your net.

If 10baseT isn't feasible for you (it might not be, for a variety of
reasons), and you're stuck with 10base2 (thinwire, the RG58 cable
stuff), then see if you can get a thinwire multiport repeater (from,
say Cabletron or Allied Telesis) from which you can run a number
(typically 8) of thinwire segments with a limited number of systems on
each segment. The multiport repeaters will isolate strands that have
opens or shorts, and tell you which one by lighting up an LED. This
not only isolates failures, it tells you were to start looking.

If you need to check a thinwire (RG58) segment for physical problems,
pull the BNC T connector from a system in the middle of the segment and
use a multimeter to measure the resistence "seen" each way. You should
see 50 to 80 ohms each way. If you don't, then you have a problem in
that direction. Repeat the procedure in that direction (go in that
direction to a system halfway between the one you're at now and the
terminator). This "binary search" procedure should quickly zero in on
the cable, connector, or whatever where the problem is. Yes, you can
troubleshoot a "live" cable this way without zapping anything (though a
live cable's impedences will vary more than a cable without anything
active on it).

Other hints re working with thinwire: 1) don't ever, ever, use RG59
(75 ohm) cable, always use RG58 (50 ohm); if you make up your own
cables, be very careful about the BNC connectors (be especially
careful about exactly how far the inner gold pin extends).

Bad connections are your worse enemy, irregardless of how extensive
your network is, but they become more and more of a headache
as it gets larger. Traffic problems are typically only a problem
as your network gets much larger (ballpark figure: 30-40 Suns).

If you have excessive traffic, look (using some sort of monitor,
see below) at where most of the traffic is, and partition your
net using bridges (again, I'd recommend Cabletron or Allied Telesis).

The ultimate in traffic monitoring (and indeed, a great troubleshooting
tool in general) is something like the Sniffer Network Analyzer. But
those gadgets are quite expensive and until your network gets quite
large (ballpark: over 100 Suns) you probably don't really need one.

For a good overall picture of network traffic, and as a general tool
to discover broadcast problems and also to tie down which specific
systems are really the sources and sinks of your traffic, you can
run the nnstat monitoring program (anonymous ftp, venera.isi.edu).
Indeed, it's useful to run something like this periodically just
to get a feel for what the traffic is on your net. What is or
is not accepably high traffic depends a lot on your situation,
though. If nothing else, it's likely to suggest where you'll
find it useful to place a bridge or two to filter traffic
between different parts of your net.

Also useful is to simply check the "netstat -i" statistics for
each of your systems. A high percentage of errors (where high
might be something like (Collis)/(Ipkts + Opkts) > 5%.

Others will hopefully give you better "rules of thumb" to determine
how much traffic is too much.

For monitoring gateways and/or collection interface statistics from
bunches of machines, some sort of SNMP software monitoring tool
can be handy. There are several decent public domain packages
available, and some really nice ones (that cost $$$) and give
graphical displays, like Sun's SunNet Manager. With more than,
say 30 Suns, you'll start to find SNMP more and more useful.
If you do ever run 10baseT, then monitoring the hubs via SNMP
will tell you virtually everything you'll need to know about
traffic and problems on your net.

--------------------
From: alexl@daemon.cna.tek.com (alex;923-4483)

We use a TDR to do cable checking. It can help to identify problems
such as what you had. Each tranceiver has its signature on the
graphical output. We make the Tektronix TRDs here.

--------------------
From: tim@prism.nersc.gov (Tim Voss)

I recommend a book titled, "Managing NFS and NIS" by Hal Stern,
O'Reilly & Associates, Inc. Publisher, c.1991.
You can call the publisher for this and other works at: 1-800-338-6887

--------------------
From: louis@tots.logicon.com

No, this is not the response you're looking for, but rather a hint or
three that I've found useful.

First, spend some time (5-50 min/day) "messing around" with stuff when
everything appears to be working normally. This will help you train
your intuition for those occasions when something breaks or bends.
This is worth a great deal.

The spray(8) command often drops a lot of packets, especially when a
faster host (like an IPC) sprays a slower host (like a 3/50). See
previous paragraph.

The etherfind(8) command is a rather complex thing, but worth learning
about. It will probably use a fair amount of the messing around time
you spend. There's also ping(8), which will tell you if the hardware
on both machines works and that you have the IP addressing correct.
Ping and spray could be used with etherfind to get you started.

--------------------
From: keves@meaddata.com (Brian Keves - Consultant)

Depending on how much money you have to spend you have a few different routes
you can take.

If you have lots of money I would suggest you buy a network analyser like
"The Sniffer". This has many features and variants that will allow you to
track down many problems.

If you don't have lots of money there are a couple of programs on the SUN
that can help you. These are "traffic" and "etherfind", both of which
have manual pages. They won't TELL you what is wrong but using them you
might be able to discover some problems. Also "netstat" will give you
info on routing and transfer problems. Unlike a more costly solution you
will need to develop a working knowledge base when using these programs.
Eventually you will be able to look at some output and figure out what
is wrong, but this is definitely not a user friendly way of doing
things.

O'Reilly has some books devoted specifically to networks. You might try
them.

--------------------
From: cadreor!neil@nosun.west.sun.com (Neil Van Dyke)

A few quick ideas: Try "Unix Networking" by Kochan & Wood, Hayden
Books; "Local Area Networking" by Naugle, McGraw-Hill; the Sniffer
network protocol analyzer; and an Ethernet cable checker.

--------------------
From: DENIS@EVAX.P-E-T.Mankato.MSUS.EDU

I have found in my setting that most of the problems were cause by
problems in the physical media - ie. bad connectors on the thinnet cables,
transceiver cables not totally plugged in, etc. The tool that finally
got our cabling straightened out was to buy a Time Domain Reflectometer
from Tektronix. This shoots a signal down the line and graphically
displays what happens to the signal as it goes down the segment.
Transceiver taps have a certain signature and if the signal looks funny
around the transceiver you can change the connector and try again.
Shorts or opens also have a signatures plus if you know some physical
characteristics about your cable the TDR can give you the location.
Cost about $5k.

If this sounds like it is interesting - do not substitute the graphical
display model with the cheaper digital readout models. The graphical
readouts allow you to see if anything peculiar is occurring on the
segment where the digital readout will just deal with threshold values.
The printout feature is also nice because you can get a "picture" of
your individual cables or segments and compare with later testing of
the cable or segment. Also the Tek model has features that allow you
to test the cabling while the net is in use.

As far as software - I did not find the SUN stuff that usefully but
heard of software called nfswatch (form an internet ftp site) that
might be interesting. Other stuff seemded too expensive.

--------------------
From: ktk@nas.nasa.gov (Katy T. Kislitzin)

check out the following RFC, available from nic.ddn.mil:

1147 Stine, R.H.,ed. FYI on a network management tool catalog: Tools for
      monitoring and debugging TCP/IP internets and interconnected devices.
      1990 April; 126 p. (Format: TXT=336906, PS=555225 bytes) (Also FYI 2)

--------------------
From: Andrew Luebker <aahvdl@eye.psych.umn.edu>

My suggestion: Buy a DEC DEMPR if you are still running a single-wire
network! You can then subdivide your building into eight separate
segments. If there is a bad problem on any of the segments, a little
red indicator will light-up near the BNC jack for that segment, but
the rest of your network will keep functioning.

--------------------
From: Steve Riley <pacacc!steve@sacto.west.sun.com>

>From what I know (like I said, not much) spray will lose packets when
spraying a slower computer.

Try to get the program nfswatch3.0 (from usenet, or elsewhere). This is
supposed to be a good network program.

SunNet Manager provides network status type info, plus the ability for you
to write your own "agents" and "managers" to do whatever you wish over the
network, where SunNet manager does all the communications over the network
for you.

--------------------
From: spurgeon@sirius.cc.utexas.edu (Charles Spurgeon)

I have a Network Reading List available on archives at UTexas that
lists resources that you might find useful. Included among them are
sources of information on Ethernet troubleshooting. A new version of
the reading list is imminent. Meanwhile, you can find the list via
anonymous FTP on host ftp.utexas.edu. It's in the pub/netinfo/docs
directory as network-reading-list.txt, and in the pub/netinfo/ps
directory as network-reading-list.ps, a PostScript file.

--------------------
From: leonid@amil.co.il (Leonid Rosenboim)

>From your description I assume that you are using 10base2 media (Thin
Ethernet or Cheapernet) hardware. When one of these cables has gine bad
there is the "binary search" method to save you. The only peice of
equipment that can make this process faster is a reflectometer. The
best (and expesive) LAN analyzers have such a feature (I think the
Sniffer does).

However, if you want advise from an experienced Ethernet maintainer -
here it is: Go straight to 10baseT - Ethernet that runs on your
existing AT&T PDS cabling. If you have none, it's quite cheap to
install them. The topology there is "star" which is the easiest thing
on earth to maintain. If a cable or a plug goes bad, the station that
it connects gets disconnected from the net, the rest of them work, and
you can see a led going off for that particular port in the hub or the
MAU.

--------------------
From: dorgival@dccbhz.ufmg.br (Dorgival Olavo G Neto)

Only one more thing: a book I've found quite helpful was:

"Internetworking with TCP/IP", by Douglas Comer.

It has a thorough description of TCP/IP and all its main
protocols, and has been quite elucidative up to here.
------------------------------End of Summary------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT