PARTIAL SUMMARY: Exabyte Jumbo Question

From: Peter Allan (peter.allan@aea.orgn.uk)
Date: Tue Aug 23 1994 - 07:42:12 CDT


Partial Summary for My "Exabyte" 8mm Jumbo Question : Issued in Mid-July (14th ?) 1994
I'm sending this out now so you get something. I am being sent away for a while, so I am not
about to reach a full solution.

> Table of contents
> =================
> 1 Description of drive
> 2 Request for device descriptions
> 3 Request for dump parameters
> 4 Crashes - causes and cures
> 4.1 history (including WS description)
> 4.2 crash description
> 4.3 So what do I do ?
> 5 Patches
> 6 Appendix (script)

Thanks to all who responded.

There were contributions from:
       Paul.Bellan-Boyer@dss.fw.gs.com (Paul Bellan-Boyer)
       cbrehm@tictac.fs.ford.com (Clay Brehm)
       deltam!dm!jt@uunet.uu.net (Jim Wills)
       kkaempf@didymus.rmi.de (Klaus Kaempf)
       libby@helios.harwell.aea.orgn.uk (Elizabeth Thick x2688)
       louis@meg.meg.saic.com (Dances on keyboards (Louis Brune))
       smithdr@qsss08.eq.gs.com (David R. Smith)
       worsham@aer.com (Robert D. Worsham)
and a policy-based protest (polite flame) from:
       anderson@neon.mitre.org (Mark S. Anderson)

Summary (Full answers available for those who ask.)
=======

> 2 Request for device descriptions

Jim Wills of Delta was quick off the mark with a man page for smt0, which I didn't have previously.
JW also passed the question to colleagues for their consideration, but nothing came of that.
Paul Bellan-Boyer and David R. Smith provided Delta's address.

David R. Smith's answer included:
> The names are actually meaningless. What is important are the major/minor
> device numbers for these "special" files. For example nrsmt0 on your
> machine points to major 104 and minor 4. These are sort of hooks into
> the device drivers. See P32 of "Writing Device Drivers" in the SUN manual
> set for a better description. To find out what will happen when you access
> nrsmt0 look in /usr/sys/sun/conf.c for the bdevsw/cdevsw structures. The
> major number is an index into this structure. (cdevsw for char. special
> files bdevsw for block special files) This point at routines that device
> driver uses. These routines should be listed in /usr/sys/`arch -k`/conf/files.
> This is used when building the kernel to get the right source code. Then,
> much as I hate to say it, it is time to grrovel through code.

> 3 Request for dump parameters

Louis Brune, Elizabeth Thick and Clay Brehm quoted default figures from the man page.
Mine says the same, but I'd taken it for granted dump has rotten defaults so you always lie to it.
I was told that on a Sun course. Clearly an experiment would have been in order before asking.

> 4.2 crash description
 
This was recognised by some of you as a SCSI problem affecting swap.
Klaus Kaempf put it like this:
> This is clearly a SCSI bus deadlock and the machine can't swap. Processes in memory
> (typically 'active' processes like clock and perfmeter which aren't swapped out)
> keep running, processes swapped out (most shells) or new programs (which must be
> loaded from disk) can't be run.
> This happens at my system, when i try to access the QIC 1/4" drive as /dev/rst1
> (SCSI address 5) when it's set at scsi adress 4 (/dev/rst0 would be right).

> 4.3 So what do I do ?

Louis:
> Is this new and recent behavior? If so, the standard question is
> "What changed?" Note that this includes cables, terminators, order of
> devices on the SCSI chain, etc. Do make sure everything is tight.
It is not new. There have been no changes to attribute it to.
All is tight, but I did notice a couple of sharpish bends, which I have straightened to some extent now.

Klaus:
> - look at /usr/sys/scsi/targets/st_conf.c, is there an entry for EXABYTE ?
Yes.

Klaus:
> What makes you sure it's an exabyte ? Open up the drive case and have a look inside !
Clay:
> First, I noticed that your tape drive was set to target 6. The usual Sun standard is to
> use targets 4 and 5 for tape drives. Target 6 is the standard port for the CD-rom.
Louis:
> And why the (censored) *are* you using a non-standard driver?

Looking inside reveals it is an exabyte 8200 with a sticker from Exabyte Corp. .
I also blew away a fair handful of dust.

Well spotted. We have a kernel with 6 in it for this smt0 tape. I decided anyway to
change it to 5 as part of a step-by-step following of advice received.

Once the tape drive was fiddled onto number 5 with the jumpers I rebooted the workstation
and found it recognised the drive immediately as st1. That saved me fiddling with files. :)
It also explains why we were on 6 before, as a previous SA must have chosen to preserve
the settings for 5. The smt0 driver is provided by Delta.
It works as st1, but *still* crashes !
David Smith says when he used an exabyte he used /dev/rst* and didn't bother with the
Delta setup.

Next I tried the patch (see below).

> In fact, the mt driver in 4.1.1 might just be able to handle the damned thing.
Haven't tried, YET.

An upgrade was suggested by some. Either to 4.1.2 or 4.1.3. I'd need a CD for 4.1.3.
It sounds like a good idea except that I have a hard time convincing
my boss we need to spend anything. My users are also anti-upgrade.
    [The SGI users next door recently got a new WS with video camera, so they can watch
    themselves on screen. What do they do that I don't ?]

> Finally, it may just be getting old. You just might be able to use it
> as an excuse for some vendor to give you a price break on a new one.
Maybe. I 'll try this last.

> 5 Patches

David sent the patch 100330-6.
Robert said "current version is 100330-06 and is available from sun's ftp server..."
Does he mean sunsite.unc.edu ? Can't reach it. Didn't find it at ic.ac.uk or ftp.uu.net .
'README' descriptions of the various bugs didn't seem quite what we get, but tried it anyway...
The config worked and the make produced a warning.
The warning was of an illegal pointer combination line 35 of /usr/kvm/sys/sun/kern_wrapxxx.c.
Because of the warning I did not run with the new kernel. (Wimp!)
kern_wrapxxx.o was produced.
(This was using a copy of a kernel text file that DID work without complaint recently.)
I might tinker with this a bit more, when there is some time.

> 6 Appendix (script)

Two people offered their own scripts.

This remark deserves quoting, if only so I can reply to it:
> If the fear-and-superstition is allowed to enter the fray, stick "mt -f /dev/nrsmt0 rewind"
> before you start the dump in your script.
I am an advanced practitioner of fear-and-superstition. I already had all the rewinds you could get.
That is, everywhere except between the 1st and 2nd dumps.
Also did anyone notice all those full pathnames ?
And the sleep statements between any pairs of commands that had the least concievable scope
for interfering with each other ?

Mark's protest:
> Your submission is quite inappropriate for this list. It is not of an urgent nature.
> You should submit you question to comp.sun.admin, or some other place.
Good point, but you don't know how much I've AVOIDED posting here !

And Finally,
There was a problem with my mail on the afternoon of Fri 15th July,
while some of you were answering. It was unimportant - my mail script couldn't
sound the gong because the machine it mounts the sounds from was down for backup....

PA



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:08 CDT