SUMMARY: panic: writeback error

From: johnb@edge.cis.mcmaster.ca
Date: Thu Jun 27 1991 - 01:00:53 CDT


On Jun 25, 10:05am, I wrote:
 
} We have a Sun 3/280S, running SunOS 4.1 which has been crashing 2-3 times a
} day for about the last week. Here's the autoconfig output when vmunix boots:
}
} SunOS Release 4.1 (PHYSUN) #1: Tue Nov 6 14:01:30 EST 1990
} Copyright (c) 1983-1990, Sun Microsystems, Inc.
} [ ... configuration messages deleted for brevity ... ]
}
} It keeps dying with the following error:
}
} panic: writeback error
} syncing file systems ....
} MEMORY ERROR! Status C4, DVMA-BIT 0, Context 4,
} Vaddr: 2677C, Paddr: 0000077C, Type 0 at 00000000
}
} Break FFFFFFFF at 0E05C710
}
} I have run the extended memory tests from the PROM, which show no errors.
} I've run the SunDiag tests, which also show no error. Is this a bad memory
} board, or something else?
}
}-- End of excerpt of Jun 25, 10:05am
 
Well, the response was great! I got my first reply, before I got my own
message mailed to me! The responses fall into 2 camps: a) a bad CPU cache
(i.e. replace the CPU board:-(, or b) having the default swap partition in
/etc/fstab as well. Detailed replies follow. I have taken the
 
/dev/xd0b swap swap rw 0 0
 
line out of fstab, since xd0 is my boot disk (/ == /dev/xd0a). If the
problem persists, we'll get the CPU board replaced. As Chris Drake points
out, this problem did just suddenly start to happen with no other real
changes, and so it probably is a CPU board problem. Since commenting out
the line in /etc/fstab is cheaper, I'll try that first:-)
 
Thank you very much to all who replied:
 
    Chris.Drake@Corp.Sun.COM (Chris Drake)
    "William (Bill) Gray" <bill%wintermute.utcc.utk.edu@utkux1.utk.edu>
    trr@lpi.liant.com (Terry Rasmussen)
    carlson@frith.egr.msu.edu (Jackie Carlson)
    riess@csq.uta.edu (Bill Riess)
    ddull@Rational.COM (David Dull)
    cam@janus.Berkeley.EDU (Carol Martin)
    Paul Quare <pq@computer-science.manchester.ac.uk>
    liz@neit.cgd.ucar.EDU (Liz Coolbaugh)
 
 
From: Chris.Drake@Corp.Sun.COM (Chris Drake)
 
} If this just started happening magically, without relation to any other
} changes, then I'd say hardware. The 'writeback' refers to the CPU cache;
} while there were a few odd cases where software could cause a panic: writeback
   ,
} these appear to have been in SunOS 3.4 or 3.5, and should not affect a 4.1
} system. If this is repeatable (like, whenever you run your application..)
} then there is possibly software involvement. One way to check is to look at
} the traceback information from coredumps, if you can save any: if the stack
} seems to be pretty random, and the user process which was running isn't the
} same one every time, then that's a good indication that your CPU board is
} starting to flake out.
}
} Chris Drake
} US Answer Center
} Sun Microsystems Software Support
 
This did just start to happen. The only thing I can remember changing at
the time this started is setting up the automounter, and running it.
 
From: "William (Bill) Gray" <bill%wintermute.utcc.utk.edu@utkux1.utk.edu>
 
} Sun has a bug that can cause writeback errors. I had the problem on
} a 4/280 running 4.1.1 and a 3/260 running 4.1. It is bug #1039410.
} It is caused by having the primary swap area selected in the rc file(s)
} via "swapon -a" and having it also in /etc/fstab. The workaround
} is to NOT have it in both places. Here is what I did on the 3/260:
}
} bill mathsun1> tail -5 /etc/fstab
} /dev/xy1g /export/sun4 4.2 rw 1 4
} # Per Sun the primary swap area must NOT be in /etc/fstab :bug #1039410 17Ma
   y91
} #/dev/xd0b swap swap rw 0 0
} /dev/xy0b swap swap rw 0 0
 
I have changed my /etc/fstab!
 
From: trr@lpi.liant.com (Terry Rasmussen)
 
} Have you tried any of the following tests:
}
} A) Swapping memory boards with another machine.
}
} B) Exchanging the cards arround on the back plane
} (which of course would mean playing arround
} with jumpers, no doubt for #A above as well...)
}
} C) Pulling a board and running with less memory for
} a while, if the problem persists, then swap out
} a memory board for the one you originally pulled.
} Needless to say this can be a time consuming and
} frustrating procedure.
}
} Lastly, I will bet that the problem is on the CPU board and
} that the PMMU has gone bad in some "wonderful and strange
} way" that is not easily or reasonably reproducable.
}
} Any way it goes I wish you much luck. We have a machine on
} site where ultimately everything was replaced (it kept having
} memory problems and "eating" system disks.) When I say every
} thing was replaced I mean that the only thing factory installed
} on the machine is the cabinet, over time everything had been
} reaplced, even the backplane. This machine is now a "stereo rack"
} for our UPS's and we are using it's system disk as data disk on
} another system a few feet away. You can't win them all, but you
} can sure try!
 
Haven't tried any of these, though I have done similar things before, and
may try pulling/swapping memory boards. I have replaced 2 memory boards,
and the Fujitsu M2361 SMD disk in this machine already this year:-( I have a
sinking feeling it's the CPU though as you also point out.
 
From: carlson@frith.egr.msu.edu (Jackie Carlson)
 
} The one and only time I saw this message was when I had
} mounted, by including in the /etc/fstab, the root swap partition.
} It's okay to mount addition swap partitions, but not the root swap
} in fstab.
}
 
 
From: riess@csq.uta.edu (Bill Riess)
 
} A problem which looks like a memory error, but isn't
} because memory checks good, but involves the disks, is
} very likely a DMA problem meaning disk controller or
} motherboard. In our case we had to replace the mother
} board.
 
 
From: ddull@Rational.COM (David Dull)
 
} DVMA is a virtual memory construct. First suspect is the disk drive, second
} is the MMU. Third is the RAM. Definitely time for a hardware call.
 
 
From: cam@janus.Berkeley.EDU (Carol Martin)
 
} We've had the same problem on one of our 3/280s. We changed
} the first memory board to no effect and decided that the
} problem must be in the cache on the cpu (also suggested
} by "writeback" error). We've now changed the cpu and the
} verdict is still out.
 
Let me know if this solved your problem. If mine goes away with the fstab
changes, I will let you know.
 
From: Paul Quare <pq@computer-science.manchester.ac.uk>
 
} Check that you don't have a line in /etc/fstab for your primary
} swap device.
 
 
From: liz@neit.cgd.ucar.EDU (Liz Coolbaugh)
 
} Panic writeback error: Very familiar! Look in your /etc/fstab file
} for entries like:
}
} #/dev/xd0b swap swap rw 0 0
} #/dev/xd1b swap swap rw 0 0
}
} If you also have the same partitions configured in your kernel:
}
} config vmunix swap on xd0b swap on xd1b
}
} this may be the cause of your writeback error. Try commenting the lines
} out of your fstab and rebooting. It worked for us.
}
} Credit goes to Sun support who responded quickly with this information
} once I called them ...
 
My config line was:
 
config vmunix root on xd0b swap generic
 
(or something close to that, this is from memory - mine is more faulty than
this machines' I'm sure:-)
 
Again thanks to all who replied.
 
 

--
E. John Benjamins                   | BITNET: JOHNB@MCMASTER
Computing and Information Services, | Internet: johnb@edge.cis.mcmaster.ca
ABB 132, McMaster University,       |
Hamilton, Ontario, Canada.          | Bus error (core dumped)panic: freeing fre
   e inode ...



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:15 CDT