SUMMARY: Data Access Exception

From: David Beard (beard@spam.ua.oz.au)
Date: Wed Jul 24 1991 - 16:10:56 CDT


The original problem:

I was having trouble getting a SparcStation 1+ to boot after attempting to
upgrade from SunOS 4.0.3 to SunOS 4.1.1.

On powerup, the machine displays the SUN logo, and the ROM revision (1.3).
It also passes the "testing" stage. Next it says
Booting from sd(0,0,0)vmunix
Data Access ExecptionType b (boot), c (continue). or n (new command mode)

The cause:

Seems to have been an incorrectly configured external scsi disk which was
trashing the internal disk.

Isolating the problem:

Disconnect all the external scsi devices except the cd-rom drive. Make sure the
device is properly terminated and that the cables are seated firmly. Boot from
the cd, install mini-root, boot from mini-root etc etc. If everything goes ok,
shutdown, and connect another scsi device, and try rebooting from sd(0,0,0).
Keep adding the scsi devices until something stops working :-). I ran into my
problems the moment I connected both Wren VII's.

Fixing the problem:

Taking the Wren VII's out of their cosy little enclosures and fiddling with the
drive numbers and the terminations seems to have fixed the problem. (Although
the disks are actually set up exactly as they were when they were connected to
the SS1+ running 4.0.3!).

Thanks for very quick and helpful responses from :
  earl@division.cs.columbia.edu (Earl Smith)
  jmcrowell@ucdavis.edu (John M. Crowell)
  matt@wbst845e.xerox.com (Matt Goheen)
  johnb@edge.CIS.McMaster.CA (John Benjamins)
  alexl@daemon.cna.tek.com (alex;923-4483)
  deltam!flyer!mark@murtoa.cs.mu.oz@uunet.uu.net (mark galbraith)
  synergy!sun!Aus!kevins@Sun.COM (Kevin Sheehan {Consulting Poster Child})
  sdb%hotmomma@uunet.UU.NET (Scott Ballantyne)
  rcsmith@anagld.analytics.com (Ray Smith)
  kevinmac@ll.mit.edu (Kevin McElearney (x3556))
Apologies if I've missed anyone.

Other useful suggestions:
It appears that this sort of error can happen for any number of reasons.
I've quoting and paraphrased some of the above responses.

1. Make sure that the drive is spinning after powering up.

        I didn't really want to open the pizza box, so I assured myself that
        this wasn't the problem by doing some disk i/o e.g. installing and using
        miniroot :-)

2. Watch out for `bad' SCSI cables.
    It seems the diagnostics use different wires in the cable that the device
    driver, so that even if probe-scsi tells you it sees the drive, the device
    driver might not be able to see it.

        This was my first thought. One of the side effects of carting around
        the cd-rom drive was that I probably inadvertently swapped a couple of
        cables. I ruled this out by trying out every cable I had ...

        Its also worth checking that the cables are seated firmly.

3. There may be a problem with selecting the correct unit number for the disk
    on your system. Your probe-scsi command finds only units 3 and 6. If your
    OS is on the Quantum at unit 3, you should be booting sd(0,3,0)vmunix

        This missed the mark, as scsi addresses 0 and 3 are translated to
        devices sd3 and sd0 respectively.

4. Try powering off and back on again (with a 30 second or more delay).

5. Check that the SCSI chain is not too long. If possible try some shorter
    cables.

6. A problem with installboot.
    Try rebooting from cd-rom, boot up miniroot and reinstall the boot block.

        This wouldn't work for me, as the sd0a file partition was being trashed.
        An fsck on this partition reported a few wierd errors that confirmed
        that I no longer had a valid filesystem on this partition.

7. Wrong architecture?
    Someone managed to install Sun4 OS instead of Sun4c.

8. Incorrectly creating swap files using `mkfile -n ...'.
    You should not use "-n" for swap files.

9. A faulty SCSI terminator.

10. Be careful about the order you switch things on. I am told that Sun
    machines will generate some `noise' on the scsi the moment the system is
    powered up. Avoid problem by powering up the external scsi devices a few
    seconds after switching on the sun. Hopefully your disks will spin up
    while the sun is still doing its internal tests.

And finally - how we really fixed the problem ... (don't laugh).
While trying to isolate which SCSI device was actually trashing sd0, I managed
to get the machine up and running while connecting everything except the WREN
VIIs. Ok, so one of the WRENs was the problem. One of these disks was
terminated (internally), so I tried this disk first. No problem. Ok, this
means that it must be the other disk. To check, we opened the enclosure to
terminate the disk (borrowing the terminations from the first disk), and in the
process one of the jumper plugs DROPPED OUT. We replaced the jumper and now
everything works.

What amazes me is that this disk worked at all under 4.0.3. The disks
themselves have not been opened or moved since they were installed many months
ago. They have also performed faultlessly since I've had them. I don't know
_which_ jumper was replaced (someone else did this for me).

Thanks again to everyone for all the help.

--
beard@spam.ua.oz.au



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT