SUMMARY: panic: assertion failed (ufs_bmap problem)

From: Andrew Patrick (andrew@calvin.dgbt.doc.ca)
Date: Tue Dec 10 1991 - 20:18:27 CST


On Oct 29, I wrote:

| This one is driving me crazy and things are getting worse, so it is
| time for Sun-Managers :-)
|
| I am running a SPARCstation 1+ with the following configuration:
|
| - 32 or 64 MEG RAM (problems occurs with either configuration)
| made up of third-party 4 MEG SIMMs
| - no internal hard drives
| - 1 Wren VI disk setup as boot disk
| - 1 Wren VI and 1 Wren VII as additional disks
|
| Filesystem kbytes used avail capacity Mounted on
| /dev/sd1a 13651 5456 6829 44% /
| /dev/sd1e 91339 78173 4032 95% /usr
| /dev/sd1d 93131 48036 35781 57% /var
| /dev/sd1f 377255 298817 40712 88% /home
| /dev/sd2c 959294 465370 397994 54% /cgd
| /dev/sd3c 607938 211069 336075 39% /bld
|
| I am running SunOS 4.1.1B and OpenWindows 2.
|
| Periodically, the system is panicing and will not reboot without
| intervention. This was happening very rarely in the past (every 3-4
| months) but is not happening more frequently now (every 3-4 days).
| However, it seems to occur randomly and is not associated with any
| particular program or activity.
|
| The message on the screen during the panic is:
|
| assertion failed: *bnp != UFS_HOLE, file ../../ufs/ufs_bmap.c, line 326
| panic: assertion failed
|
| It then does a kernal dump and tries to reboot. During the reboot, it
| fails while displaying the Size: line with the message "Truncated
| file".
|
| Booting from the monitor and a complete power-down of the CPU will now
| fail. The only way to get the machine to reboot, and here is the wierd
| part, is to open the "pizza box" and wiggle the PROMS and the SIMMs.
| For some reason, once I have poked around a bit, the machine will boot
| and run, for a little while.
|
| I take it that UFS is related to the Unix file system (4.2). Not
| having source code, I can't look up the line referred to in the error
| message.
|
| Does anyone know what is going on? Does anyone have a suggested method
| of attack for this problem?
}-- End of excerpt from Andrew Patrick

I think I have a solution to this problem. The problem was intermitent
enough that I wanted to make sure before I sent out a summary.

My best guess is that the problem is due to bad RAM. I was using
GENERIC 4 MB SIMMs (80ns) and these seem to not be up to snuff for the
SPARCstation 1+. Further research suggests that this RAM will work in
386 PCs and SPARCstation 1's, but they fail in SPARCstation 2's. I
have replaced this RAM with 4 X 4 MB of Toshiba SIMMs and have been
running fine ever since.

I understand that there are now some specifications for SPARCstation 2
RAM, and it looks like this also applies to SPARCstation 1+'s. The
specifications I have been told are important are:

    "the CAS precharge access time (tCP) is 10 ns and the access time
    from CAS precharge (tCPA) is 45 ns"

I have ordered some RAM that meets these specs and I hope that things
will be OK.

Most of the replies suggested that I had problems with my swap space,
but I don't think that was the case. I include these replies below for
anyone who has similar problems.

Thanks to the following people for replies:

        bit!jayl (Jay Lessert)
        erueg@cfgauss.uni-math.gwdg.de (Eckhard Rueggeberg)
        kla!brandari%sunra@Sun.COM (Paul Brandariz x6546)
        maklinm%cognos.uucp@cunews.carleton.ca (Maxwell Maklin)

A digest of their replies follows:

--------
>From @cse.ogi.edu:bit!jayl@cse.ogi.edu Tue Oct 29 18:27:10 1991
Date: Tue, 29 Oct 91 14:39:57 PST
From: bit!jayl (Jay Lessert)
Subject: Re: panic: assertion failed (ufs_bmap problem)

Are you sure that wiggling is fixing it, or is cooling down for 10 minutes
while you open the case, futz around, and close the case what is *really*
doing the trick?

I'd guess bad (temperature-dependent) hardware, and I'd guess it's inside
the pizza-box, if you have not seen any disk-specific (rsd0, etc.) error
messages.

Jay Lessert {decwrl,cse.ogi.edu,sun,verdix}!bit!jayl
Bipolar Integrated Technology, Inc.
503-629-5490 (fax)503-690-1498

--------
>From @ibm.gwdg.de:erueg@cfgauss.uni-math.gwdg.de Wed Oct 30 03:33:11 1991
Date: Wed, 30 Oct 91 09:31:54 +0100
From: erueg@cfgauss.uni-math.gwdg.de (Eckhard Rueggeberg)
Subject: Re: panic: assertion failed (ufs_bmap problem)

The only thing I can say is the only thing the AnswerBook finds
for UFS_hole (in the SunOS 4.1.1 Rev B Release Manual) :

Do you by any chance have a additional swap file, which was created
by "mkfile -n ..." ?

Eckhard R"uggeberg
erueg@cfgauss.uni-math.gwdg.de

--------
>From @ibm.gwdg.de:erueg@cfgauss.uni-math.gwdg.de Wed Oct 30 15:27:45 1991
Date: Wed, 30 Oct 91 21:29:21 +0100
From: erueg@cfgauss.uni-math.gwdg.de (Eckhard Rueggeberg)
Subject: Re: panic: assertion failed (ufs_bmap problem)

I was too lazy to spell it out, sorry.

The problem is that SunOS (UNIX ?) allows files to be indicated with
a certain length in the inode without allocating the physical disk space
for it. This is e.g. part of Sun's copy protection of the AnswerBook, which
tars to a tarfile more than 1 GB using only 220 MB on the disk, so you can't
tar it to a tape somewhere and untar it somewhere else, you have to install
it with a CD.

When a write/read is attempted into the "hole" in the file, the missing
block is in fact allocated. This works for Client swap files, because
the NFS daemons handle it properly, but it definitely fails for local swap,
because the swapper doesn't triple check what it writes.
So I believe you have a swap file with a hole in it, and when it is finally
touched by huge tools, the panic comes.

You could check the two different sizes of the swapfile with ls -s on
the inode side and du on the allocation side. If they are different, I am
right. If not, consider the other answers you get.

BTW : This is not in the Manual, there they simlply say : If a not-client
swap file is created by mkfile -n ... you expect a characteristically
error message.

Eckhard R"uggeberg
erueg@cfgauss.uni-math.gwdg.de

--------
>From @ibm.gwdg.de:erueg@cfgauss.uni-math.gwdg.de Thu Oct 31 09:41:41 1991
Date: Thu, 31 Oct 91 12:35:50 +0100
From: erueg@cfgauss.uni-math.gwdg.de (Eckhard Rueggeberg)
Subject: Re: panic: assertion failed (ufs_bmap problem)

The problem is that you probabely need more. I have 32 MB RAM + 128 MB
swap in two partitions on different disks (Sun says this would speed
up things a bit) for my server which serves 3 diskless and 2 dataless
clients, which seems to be sufficient, and I had 32 MB RAM + 48 MB swap
in one partition on my SS2, which proved not to be enough, so I added
a swap file of 64 MB.

But fortunately, the OS behaviour is quite graceful when running out of
swap space (as well as when running out of disk space, even on the /
partition).

You can watch current swap level with "pstat -s" (in case you didn't know).

Eckhard R"uggeberg
erueg@cfgauss.uni-math.gwdg.de

--------
>From kla!brandari%sunra@Sun.COM Wed Oct 30 12:06:59 1991
Date: Wed, 30 Oct 91 08:19:32 PST
From: kla!brandari%sunra@Sun.COM (Paul Brandariz x6546)
Subject: Re: panic: assertion failed (ufs_bmap problem)

Andrew:
        If you created a dynamic swap file with the next command:

        mkfile -n swafile

        and then mounted it as a secondary swap partition s panic
of this type will happen since a swap file must have the data already
allocated on the disk. The truncated message might be when it attempts
to use the secondary swap space with no blocks allocated to it. Don't
know why wiggling proms alleviates the problem though (bad memory ?).

Hope this info helps.....
___________________________________________________________________________
Paul R. Brandariz E-mail Internet: kla!brandari%sunra@sun.com
KLA Instruments Corp
P.O. Box 49055 Voice: (408) 456-6546
San Jose, CA 95161-9055 Fax: (408) 434-4273
___________________________________________________________________________

--------
>From maklinm%cognos.uucp@cunews.carleton.ca Wed Oct 30 16:15:06 1991
Date: Wed, 30 Oct 91 15:32:08 EST
From: maklinm%cognos.uucp@cunews.carleton.ca (Maxwell Maklin)
Subject: Re: panic: assertion failed (ufs_bmap problem)

I believe there is patch for this try 100338-02
There may be a newer patch so check it out.

Most of these patches and others are available kindly
by Tim Ramsey at Kansas State University:

        ftp ftp.math.ksu.edu /pub/sunos.4.1.1.patches

P.S. Nice to see someone from the same geographical location
     with real world problems :^)

Max

--------

-- 
Andrew Patrick, Ph.D.       Communications Research Centre, Ottawa, CANADA
andrew@calvin.dgbt.doc.CA
            "Making computers do the things they do on television."
            "Making television do the things that computers do."



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:21 CDT