SUMMARY: Killing processes locked on disk accesses.

From: Sandra Harimoto (kddlab!nanko.digital.co.jp!sandra@uunet.uu.net)
Date: Fri Aug 09 1991 - 01:13:13 CDT


My original posting was:
> We have occasionally had processes lock on disk access -
> a couple of times it was nemacs, a couple of times it was
> an NFS disk read. Each time the STAT field of ps was 'D'
> (Process in disk (or other short term) waits). But the
> process never seems to come out of it. It is also unkillable.
> So far, the only way I've found to get rid of such processes
> is to reboot. Is there any other solution?
>
> Vital statistics:
> System: Sun4/370 server + several sparcstation1's
> O/S: SunOS 4.0.3c
>

Well, just as I feared, the only way to get rid of the process
seems to be to reboot. Someone suggested trying to kill the
rpc.lockd's on both machines. Haven't had a chance to try it yet.

I think I did get the cause of the problem, though.
from jstewart@rodan.acs.syr.edu :
> Oftentimes, especially with emacs flavours, we've found it it because
> the partition is full, or the account using it is full. If that's so,
> then freeing up space works very well.

and from stern@sunne.east.sun.com
> there are many NFS bugs in 4.0.3[c] that cause processes
> to hang -- most of them have to do with the NFS client code
> going to sleep waiting for a page that was already freed up.
>

from jan@eik.ii.uib.no
> - take much greater care to keep filesystems < 90% full.
> (this may be worth checking. I cannot remember seeing the
> nfsd's run into disk-wait except if there was very full
> filesystems.)

It is very probable that the problem is related to disk being full.
I think the locked processes coincided with the disk going to >98%
full.

Solutions suggested are:
        Upgrading to 4.1.1
        Getting the "NFS Jumbo Patch for 4.0.3" from sun, which
         includes about 17 different bug fixes for this
         and related problems.
          Getting the lockd-patch
        Freeing up disk space.
I'm looking into these options now.

Upgrading to 4.1.1 may help although someone seems to be having
similar problems with it:

> Hi Sandra. We just installed three new Sparc2 fileservers running 4.1.1
> and have begun to experience the same problem. Occassionally, NFS reads
> will cause the nfsd processes to go into disk wait; one by one, all 8
> of our nfsd's succumb. Our installation is about as vanilla as it
> comes - pre-installed 4.1.1B. I don't have any solutions for you,
> except to say that we haven't seen the problem in a couple of days...
> We never experienced any of these troubles before we started playing with
> automount -- possibly connected? I don't know.

Thanks to the following people for responding:
        holle@asc.slb.com
        jan@eik.ii.uib.no
        jstewart@rodan.acs.syr.edu
        jnapier@ucsd.edu
        stern@sunne.east.sun.com
        oconnor!sbcoc.com!miker@oddjob.uchicago.edu
        sheryl@gwusun.gwu.edu
        tyen@mundo.eco.utexas.edu
        kevin.sheehan@fourx.aus.sun.com
        cdr@acc.stolaf.edu
        brett@den.mmc.com
        hermit@pcs.cnc.edu
        kirk@zabriskie.berkeley.edu
        sdb%hotmomma@uunet.uu.net

Thanks, sandra
email: sandra@digital.co.jp (We are not DEC!)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT