SUMMARY: eeeek !!!! where is my system

From: Chris Wozniak (KAW@wapet.com.au)
Date: Wed Jun 26 1996 - 16:05:00 CDT


Estimed Gurus
Here is the (late) summary of replies to the following query:
 
> Hi Gurus
>
> The settings:
> Sun Sparc 1000E, 6 processors, 140 Gb of disks;
> Solaris 2.4;
> File and Database server
>
> The DRAMA:
> This morning I found out that various system utilities, like
> ls, more, expr, exec, etc., some rc?.d scripts (and God knows
> what else) went missing overnight !!!???
> There was no crash, no hangup, no hacker ???, no error messages in
> the logs.
>
> What gives !!!!????
> Has anyone any idea ???
> Please, I need some distraction during the N hours restore process...
>
> Chris Wozniak
> System Administrator
> Wapet
> kaw@wapet.com.au
 
This summary is late because the problem has been very thouroughly
investigated. The results have proven inconclusive, but I was able to
exclude several of the possible causes.
I have received 7 answers suggesting:
 a hacker (1);
 run-away cron (2);
 user (ignorant, or in amok :) ) (2);
 path error;
 volume management problem (1);
 buggy scsi card driver (1);
 disk problem (2);
and most offering empathy and consolation.
 
We were able to eliminate a hacker, cron, user, vol management and disk
as a possible cause.
I'm 80% certain that it wasn't the scsi driver, but we were at that time
swapping scsi cards and shuffling disks around so I guess that's that 20%.
I have since discovered in the SMSS Open Issues Supplement for Solaris
2.5 Hardware 1/96 the description of the bug 231531, that can cause
"cp -p" to delete source files under some conditions.
As we connected a couple of new Ultras (Sol 2.5) to that server just then
the bug might have had something to do with our problem.
 
I strongly recommend the patch 103162-01 to all of you running Solaris 2.5.
 
Thanks to:
Steve Madden Email: smadden@csu.edu.au
Kris Briscoe Email: hxktb0@svho1ds_1.supervalu.com
Phil Poole poole@ncifcrf.gov
Liew Chee Wah E-Mail : cwliew@bass.com.my
Herbert hwe@uebemc.siemens.de
Gary Merinstein gmerin@panix.com
 
And "bolshe spasiba" to Fedor Gnuchev for the following reply, that
I'm including here as it may be useful to someone out there, which
helped me convince my bosses that the OS files can be chewed up by
something else on the system.
 
> ----------------------------------------------------------------------------

--
> Dear Chris,
>
> this is not a recipe or diagnosis - just to keep you distracted - besides
> it have something about tapes:
>
> On FreeBSD I'd dreadfull time with a client who had several machines
> equipped with ADAPTEC 2740 SCSI cards, 2GB disk and Wangtek 525 tape.
> And despite all claims that FreeBSD loves ADAPTEC and works with *ANY*
> SCSI tape drives this setup was repeatedly crashing.
> Simple tar to tape caused panic and - after reboot it was left without
> half of /usr/bin, usr/local was hit like Stalingrad, occasionally
> /etc/shadow was missing, etc.
>
> And - mark my word - this naughty beast was crashing only when I was absent
> ( have to  admit that I failed to reproduce result - even with most
> naughty tricks like power cycling during tar or dump, pressing eject on
> tape drive during dumps, etc.)
>
> It turned out to be a small (!?) bug in code for this particular kind of
> ADAPTEC controllers. Driver turned out to be extremely sensitive to
> errors on SCSI bus - and Wangtek 525 had shaky firmcode causing them.
>
> Well, that's about it.
>
> With best regards
>
> Fedor Gnuchev
> (hm, or Ted - in this English-typing world...)
>
>  mailto:qwe@ht.eimb.rssi.ru

> Fedor Gnuchev > (hm, or Ted - in this English-typing world...) > > mailto:qwe@ht.eimb.rssi.ru Chris Wozniak System Administrator Wapet kaw@wapet.com.au



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:02 CDT