SUMMARY: NFS reporting "filesystem full" when it isn't

From: Brent Chapman (chapman@alc.com)
Date: Mon Jan 21 1991 - 13:08:48 CST


Thanks to everyone who responded. Here's the original query:

    We have a Sun 4/65 NFS client running SunOS 4.1 and a Sun 4/280 NFS
    server running SunOS 4.0.3 that frequently exhibit a bizzare behavior.
    Customers of our software have reported similar problems from within
    our software, so we're anxious to figure out what's going on here.

    The problematic behavior is that the client periodically reports
    "filesystem full" when we attempt to write to a filesystem on the
    server, even though the filesystem _isn't_ full, according to the
    "avail" column of "df" output on both the client and server, both
    before and after the write attempt. The filesystem often _was_ full a
    minute or two before the write attempt, but sufficient space was freed
    (again, according to "df") so that the attempt should have succeeded.
    After an indeterminate period of time, or if we delete some more files,
    the problem clears up.

    Does anybody know what's going on here? Is filesystem information
    being cached somewhere within the NFS client code? If so, why doesn't
    "df" reflect this cached information? Have we stumbled on some other
    wierdness of the UNIX or NFS filesystem? Is this normal behavior,
    or is it a bug?

A number of folks suggested using "df -i" to check if the problem was,
in fact, a lack of inodes rather than a lack of file space. That's a
good idea, but that isn't the problem here; only 3% of the inodes on
the filesystem in question have been consumed.

What seems to be the real answer (interactions with "biod" on the client
side) was to have been hit on by several people. I think Dave Hitz of
Auspex (uunet!auspex!hitz) describes it the best:

    The following mechanism below definitely *could* cause it. (But other
    things probably could cause it as well.)

    When a process does a write, it often hands the write off to a biod
    to do, instead of doing it itself. (This prevents the process from
    having to wait for the slow-stateless-NFS write.) The problem is,
    if the write fails there is no way to report the failure to the user
    process because the user's write system call has already returned.

    What the code does instead is mark the file (the rnode r_error field to
    be precise) with the failing error. This causes all future writes to
    the same file to fail with the same error. The idea is that the
    actual write that failed may not get reported, but the *next* time
    the programs tries to do a write, that write will receive the error.
    Better late than never than never, one might argue.

    I'm afraid I couldn't figure out what it takes to clear this r_error
    field, but obviously it eventually does get cleared.

    If you care (and it is your own application that's failing), you might
    try using synchronous writes. This will force the write to go
    immediately and eliminate the ugliness with bio daemons. It will
    probably also slow your code down.

Many thanks to all who responded:

    "Anthony A. Datri" <datri@concave.convex.com>
    "Bill Eshleman" <wde@angus.agen.ufl.edu>
    Phil Kearns <kearns@cs.wm.edu>
    barnum@pluto.crd.ge.com (Maria A. Barnum)
    cook@stout.atd.ucar.EDU (Forrest Cook)
    era@niwot.scd.ucar.EDU (Ed Arnold)
    fernwood!uunet.UU.NET!auspex!hitz (Dave Hitz)
    fernwood!uunet.UU.NET!mdisea!edm (Ed Morin)
    lbd@alux5.att.com
    mike@inti.lbl.gov (Michael Helm)
    stern@East.Sun.COM (Hal Stern - Consultant)
    vasey@mcc.com (Ron Vasey)

Thanks for your help!

-Brent

--
Brent Chapman                                   Ascent Logic Corporation
Computer Operations Manager                     180 Rose Orchard Way, Suite 200
chapman@alc.com                                 San Jose, CA  95134
                                                Phone:  408/943-0630



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:10 CDT