ypserv dies (SUMMARY)

From: Daniel Berglund (db@fy.chalmers.se)
Date: Mon Apr 22 1991 - 06:55:29 CDT


The original question was this:

>Our ypservs (both the master and the slaves) keep dying. We are running
>4.1.1 (sun4's) and have uncommented the "-b" flag in /var/yp/Makefile.
>This worked well in 4.0.3 but the problems seem to have started after the
>upgrade. dbx says:
[ dbx output ommitted]

I got four responses. Three of them indicated the availability
of patches from Sun. However, Sun Hotline did not know about ypserv
crashes last time we called them, and we have already applied
patch 100141 in order to cut down the number of "nres_gethostbyaddr: foo.bar
!= 192.16.3.21" messages. (The unpatched version is not better, in case
you are wondering.)

Kennedy Lemke <Kennedy_J_Lemke@Princeton.EDU> suggested that a host
with an excessive number of aliases might cause ypserv to crash.
I have not been able to confirm this (unfortunately :)

One workaround is to restart ypserv as needed from a cron job, but the
problem is harder than that, since ypbind wants no less than 3 minutes
to rebind and this causes login to timeout (sigh). Programs like
amd and rarpd doesn't behave very well either.

We are now experimenting with an older (4.0.3) version of ypbind. I belive
it rebinds faster when a server goes down. This might solve the problem
temporarily. We will also try an older ypserv.

We tried to trace(1) ypserv (thanks to Kennedy Lemke) and got a
possibly interesting result: the last thing it did was to send one of
these "nres_gethostbyaddr" messages to syslog, then it got a SIGSEV
and died somewhere i nres_dorecv().

Thanks to:
ohnielse@ltf.dth.dk (Ole Holm Nielsen)
Kennedy Lemke <Kennedy_J_Lemke@Princeton.EDU>
zjat02@trc.amoco.com (Jon A. Tankersley)
phaneuf@ireq.hydro.qc.ca (Daniel Phaneuf)

-- 
Daniel Berglund                                     db@fy.chalmers.se
Chalmers Univ. of Technology, G|teborg, Sweden



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:13 CDT