SUMMARY: Disk problems on 3/260

From: J. Porter Clark (jpc@avdms8.msfc.nasa.gov)
Date: Wed Jan 23 1991 - 09:00:58 CST


My original question was:

>>I've got an oldish 3/260 with a single Fujitsu 2333 drive and a Xylogics
>>472 controller. It's currently running 4.0.3, when it runs, which is not
>>very long at a time. Every 2-6 weeks it conks out with disk problems.
>>The non-Sun service people have replaced everything in this computer except
>>the disk drive power supply and the CPU backplane in pursuit of this and
>>other problems. The folks who own it might upgrade it if it didn't fail so
>>often. When it goes bad, it seems to get more than one partition at a time.
>>Then it won't boot. Am I looking at a drive problem, a controller
>>problem, or something else?
>>
>>Jan 21 18:48:02 deputy vmunix: xy0a: write retry (header not found) -- blk #135, abs blk #135
>>Jan 21 18:48:03 deputy vmunix: xy0d: write retry (header not found) -- blk #10870, abs blk #60450
>>Jan 21 18:48:03 deputy vmunix: xy0d: write retry (header not found) -- blk #10864, abs blk #60444
>>Jan 21 18:48:03 deputy vmunix: xy0d: write failed (header not found) -- blk #10872, abs blk #60452
>>Jan 21 18:48:04 deputy vmunix: xy0d: read retry (hard ecc error) -- blk #10864, abs blk #60444
>>Jan 21 18:48:04 deputy vmunix: xy0d: read retry (hard ecc error) -- blk #10864, abs blk #60444
>>Jan 21 18:48:04 deputy vmunix: xy0d: read failed (hard ecc error) -- blk #10864, abs blk #60444
>>Jan 21 18:48:04 deputy vmunix: xy0d: read retry (hard ecc error) -- blk #10864, abs blk #60444
>>Jan 21 18:48:04 deputy vmunix: xy0d: read retry (hard ecc error) -- blk #10864, abs blk #60444
>>Jan 21 18:48:04 deputy vmunix: xy0d: read failed (hard ecc error) -- blk #10864, abs blk #60444
>>
>>and on and on and on...

I have received many replies. Most people seemed to think that the
disk itself was or is in the process of becoming a goner. There was a
distinct note of dissatisfaction with a certain vintage of Fujitsu
drives. The disk controller came in for a fair share of blame also.
Other possible suspects/things to check, in no particular order:

1. disk cables
2. disk terminator
3. other devices in the system
4. ground wire
5. drive power supply
6. disk controller firmware
7. bad sectors that need to be remapped
8. disk in (possibly periodic) need of being reformatted
9. environmental conditions (ambient temperature)
10. flaky VME timing (reshuffle the cards)

As far as replacing the drive and controller, it's been done...at least
two times each. We're looking into all of the other possibles and also
changing to a different make and model disk drive and controller. I
have accepted the fact that the solution might not be obvious.

I confess to what should probably have been a glaring error: the disk
controller was a Xylogics 451, not a 472 as stated. Although several
people noticed the error, they were apparently too polite to rub my
nose in it. Thanks! If the different model number triggers more
responses, I'll re-summarize.

Praise be to these kind respondents (in alphabetical order):

bit!jayl (Jay Lessert)
bit!markm (Mark Morrissey)
brossard@sasun1.epfl.ch (Alain Brossard EPFL-SIC/SII)
cdr@acc.stolaf.edu (Craig D. Rice)
curt@ecn.purdue.edu (Curt Freeland)
dmorse@sun-valley.Stanford.EDU (Dennis Morse)
librainc!ho@uunet.UU.NET (Alan K. Ho)
mmikulska@UCSD.EDU (Margaret Mikulska)
nieusma@cs.Colorado.EDU (Jeff Nieusma)
rcsmith@anagld.analytics.com (Ray Smith)
riess@evax.uta.edu (Bill Riess)
rodney@snowhite.cis.uoguelph.ca (barking at airplanes)
shj@ultra.com (Steve Jay)
stpeters@dawn.crd.ge.com (Dick St.Peters)

rcsmith also scoured the sun-spots archives for me and sent me a
summary. Thanks, Ray.

selig@xanth.msfc.nasa.gov (Bill Selig) called me with his suggestion.
Thanks, Bill.

------------------------------------------------------------------------
J. Porter Clark Phone: (205)544-3661
NASA Marshall Space Flight Center FAX: (205)544-9582
Communications Systems Branch/EB33 Internet: jpc@avdms8.msfc.nasa.gov
Huntsville, AL 35812
------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:10 CDT