SUMMARY: yp_all - RPC clnt_call (transport level) failure

From: Erwin Fritz (efritz@GLJA.com)
Date: Wed Nov 10 1999 - 09:10:23 CST


I'd like to thank the following people for helping with my problem, the
description of which is at the bottom of this message:

Thomas Carter
Timothy Newman
Kris Briscoe
Ian Durkacz
Mike Allmen
Thomas Lewis
Karl Vogel (especially, because his answer solved it)

Suggestions were:

1. See if rpc.nisd is dying and taking a while to be restarted by inetd. You
could also try restarting it during one of these episodes.

It wasn't dying, and restarting it didn't help.

2. Checking the configuration of the NIS master, which might be in trouble.

The problem occurs on the NIS master. Redoing its configuration made no
difference.

3. Ensuring that the server is using the same duplex setting as the switch it's
on.

I checked that, and they're both 100Mbps full-duplex.

4. Increasing the value of the system variable rlim_fd_cur and/or rlim_fd_max.

This turned out to work. The rlim_fd_cur and rlim_fd_max variables are the soft
and hard limits on file descriptors, respectively. I set these variables to 512
and 768, respectively, and rebooted. I haven't had the problem occur since.

-------- Original Message --------
I'm running into a strange problem. I have an E450 running Solaris 7 with the
latest patch cluster. It is the NIS master for my network of Solaris boxes, and
also is my main Samba (version 2.0.4b) file server for my Win95 PCs.

Every once in a while, at seemingly random times, I get a whole bunch of
messages like this:

thor smbd[7858]: yp_all - RPC clnt_call (transport level) failure: RPC: Timed
out

I get these for smbd, lp, cron, login, and anything else that uses NIS. When
this occurs, the E450 slows to a crawl, and my phone starts ringing off the
hook. All my other UNIX boxes (all Solaris) also slow to a crawl, because they
can't contact the E450.

I've searched through SunSolve, and it mentions a documented bug, 4011531,
describing a situation where file descriptor limits are being exceeded. The
workaround is to put the
command 'ulimit -n 256' into the ypstart script.

I tried that, but it didn't help.

During one of these episodes, I ran snoop to see what was going on, and the only
thing I could tell was that the RPC calls were taking a long time to return.

I posted this query to comp.unix.solaris, but got no replies.

Does anyone have any ideas?

-- 
Erwin Fritz
Gilbert Laustsen Jung Associates Ltd.
http://www.glja.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:32 CDT