SUMMARY(2): File system full

From: Ulla Fischer (ulla@dmi.min.dk)
Date: Mon Jan 30 1995 - 06:16:23 CST


Dear managers,

I have received many answers to my question, so I will summarize them,
and inform you of the cause of my problems.

What happened was:
The machine is a printspoolserver. One client sends as root big files
to the server. When the files are printed, the tmp-spool-files on the
server are not removed, because of a bug in the implementation of the
lp protocol. Then someone tried to rm the spool files, and then rm
hang. Rm was killed, and that left the filesystem in a mess. I had to
run fsck on / about six times, before it didn't complain anymore. I
have installed the newest printerpatch 101317-12 on the server. Hope
this fixes the problem.

Thanks to

Jim Wright <jwright@phy.ucsf.edu>
Ed Haggerty <haggerty_edward@jpmorgan.com>
jerryr@gcm.com (Jerry Ratner)
"Jan L. Peterson" <jlp@math.byu.edu>
Wolli Steiner <Wolli.Steiner@Rhein.DE>
John Benjamins <johnb@blas.cis.mcmaster.ca>
konc@fnts07.fnal.gov (John Konc - Fermi National Accelerator Lab.)
stlee@alc.com
Steven Overhauser <spo@ee.duke.edu>
Graeme Robertson <graemer@unisys.co.nz>
Stuart.Roe@ncl.ac.uk (Stuart Roe)
Mike Rembis 66520 <ebumfr@ebu.ericsson.se>
thomas@wiwi.hu-berlin.de (Thomas Koetter)
mike@trdlnk.com (Michael Sullivan)
Ric Anderson <ric@seagull.rtd.com>
lar@trib.com (Larry Ash)
vic@raven1.imatron.com (Victor Churchill)
rscott@otter.wsipc.wednet.edu (Rob Scott)
George Pallas <gpallas@freenet.columbus.oh.us>
dav@ipc.litronic.com (David L. Markowitz)
Gene Rackow <rackow@mcs.anl.gov>
John Goggin - LTX Tech Support <jgoggin@ltx.com>
stuart@TO.mobil.com (Stuart Pearlman - RDR)

They gave me the following advices. If you are interested in the
complete answers, I'll be glad to send them to you.

1)
if I had removed a file which is still used by another process
resources will be unavailable until the process terminates. To fix,
find the process and kill it or reboot. Below more info on finding the
process.

2)
In SunOS 4, files sometimes loose connections to their parents. The
Openwin filemanager is known for this. To fix, I run fsck -f
/dev/sd0a (or sd0g or .....)

3)
About files with holes, I got two advices:
--If I had sparse files (files with holes), my problem would be
reversed: df would report *less* space than du.
--Files with holes would not cause this effect. df, du and ls -ls all
correctly report the actual disk usage of files with holes.

4)
A remote printer was not working, as a result huge amounts of messages
were being set to /var/lp/logs/lpNet and to /var/lp/logs/lpSched
filling up the filesystem. Removing the files did not free up the
disk space! It wasn't until I killed the lpNet and lpSched daemons
and restarted lpSched that the disk space appeared. If this is the
problem your having a simple reboot should fix it.

5)
Memoryproblems:
I had a sparc II do something similar but I was using 4.1.3. Turned out
to be a problem with memory. I swapped the ram with a different motherboard
and the problem went away, and never came back in either machine.
Really weird.

6)
The filesystem preserves some space for defragmentation algorithms
etc, and for root, who can usually get more than 100% on an
filesystem. The size of this space (*minfree*) can be set vith 'tunefs
-m <arg>' where the arg is the number of percent. The default is
10%. df does not see this space.

7)
Nonexisting device:
The number one useruper of space on / that I know of is people with
root access trying to write to a device that is not there.

Try this:
        ls -ls /dev | sort -n | tail
There should not be any files larger than the file MAKEDEV. If there
are then they should be the last file or two in the list and they are
very likely to be the whole problem. I will even bet you that it|they
has a name something like sto (almost st0) or fdo (almost fd0). just
blow those accedents away and / should be fine again.

8)
Check if any files are hidden under a mountpoint

About finding a process holding a open file:
-------------------------------------------

First scan the file system to identify the removed file:
        fsck -n /dev/rsd0a
This should report an unreferenced file. Note the inode number. The
-n option is very important since the file system is mounted and must
not be modified.

Now run the command:
        lsof /
lsof is a freely distributable utility available via anonymous FTP
>from ftp.cc.purdue.edu. It will list open files on the root file
system. Match the inode number reported by fsck to identify the
guilty process. Based on the command the process is running, you can
then decide whether there is a way to get the process to close the
file, or if you should just kill the process. In either case, the disk
space occupied by the unreferenced file will then be freed.

Others suggest the program ofiles og fuser instead of lsof.

One says that running "fsck -n" >>might<< show you what's happening.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:15 CDT