SUMMARY: Adding/Removing Devices on the SCSI Bus (LONG)

From: Kevin McElearney (kevinmac@ll.mit.edu)
Date: Wed Sep 04 1991 - 18:42:34 CDT


My original post:

There are certian instances in which we would like to add, remove, or power
cycle devices on a running machine's SCSI bus. Before this is done I make sure
nothing is active on the bus. This includes removing tapes and unmounting
disks. For the most part, disks are on a separate SCSI bus and are not
touched, also if the disk is being used for swap we do not touch the SCSI bus
with the machine on. Do any of you SCSI people :-) now first had if this is OK
or a better way of doing it.

Most people have a similar procedure as this first message:

Kevin
============================================================================
From: mikem@juliet.ll.mit.edu (name unknown)

If you're actually going to remove a disk drive from the bus, (obviously)
make absolutely sure it's not mounted anywhere. (and if you're swapping
to any partition on that disk, you're stuck; you have no choice but to
halt the machine first.)

Use L1-A (stop-A) to interrupt the CPU. This guarantees that there will be
no SCSI traffic while you're meddling with devices on the bus. Then, you
can freely add, remove, or just power-cycle devices on the bus without any
worry of causing a bus problem. When you've frobbed the bus to your heart's
content, type "c" (continue) at the console, and the CPU will pick up where
it left off, oblivious to anything that happened while it was halted.

I've had absolute and unqualified success with this method, whereas if I
turn off a device while the CPU is active (as you suggest doing), I have
(more that once) gotten a "SCSI Bus Timeout" error, occasionally it has
even proved to be fatal (caused the machine to panic & reboot).

Mike Maciolek
============================================================================
From: dan@breeze.bellcore.com (Daniel Strick)

You can often get away with this, but you should be aware that there
are some non-obvious difficulties.

Some or all of the devices on the SCSI bus may see a "bus reset".
This may cause them to reset their state in a variety of ways.
The state may include operating parameters which were set to
non-default values by the host adapter during initial device
configuration. If the host adapter does not reinitialize the
devices appropriately, various strange problems may result.
The SCSI bus may even wedge (or perhaps worse, just run slow).
NB: most tape drives rewind after a bus reset.

Dan Strick, aka dan@bellcore.com or bellcore!dan, (201)829-4624
============================================================================
From: stern@sunne.East.Sun.COM (Hal Stern - Consultant)

you can take tapes and cd-roms on/off line with
no problems. however, anytime you unplug cables
with "live" devices still on the bus you always
run the risk of zapping something with a small
capacitive discharge. rare, but it can happen.

--hal
============================================================================
From: feldt@phyast.nhn.uoknor.edu (Andy Feldt)

Kevin,

   I don't know if there is a better way. I do the same thing periodically
with a cartridge tape I move around from machine to machine - haven't had
a problem yet...

Andy Feldt
============================================================================
From: brand@lll-winken.llnl.gov (Russel Brand)

i power cycle the tape drive drive on my sparc II external scsi bus
all the time. I get some warnings in the console window but no bad effcts

/w
============================================================================
>From pilotti@Proto.SAIC.Com Fri Aug 30 18:19:47 1991

My experience is that it is never safe to remove or attach devices to a running
system. Although sometimes things seem to work (power on a new drive, it is
seen and accessable) screwy things can happen later (all of a sudden some block
is unreadable -- power cycle ALL devices, including CPU, to fix).

I would LOVE TO HAVE a way to do this, particularly with the availability of
removable drives (eg. ZMicro Tranzpac). Please summarize any answers you
receive.

+Keith
============================================================================
From: turtle@sciences.sdsu.edu (Andrew Scherpbier)

I frequently remove my CDROM from the bus of my SS-1 while the machine
is running. This is how I do it:

        unmount the CDROM disk if it is mounted.
        type 'sync;sync;sync;sync;sync' just to be sure.
        L1-A the system.
        remove the drive from the SCSI chain.
        type 'c' at the monitor prompt.
        refresh the screen (assuming you run some sort of windowing system)

If you get good at this, then the time between the L1-A and the 'c'
takes only 15 to 30 seconds and does not cause any problems with the
system services. If you keep it down for longer than 5 minutes, you
are not only likely to get angry phone calls from users, but tcp
connections may have timed out. I have never had any problems doing
this (except once when I typed 'b' instead of 'c'...)

(On a Sun4, you need to type 'g' instead of 'c')

I use the same procedure for putting the drive back into the chain, but
here you have to be carefull that you make sure the drive is powered up
completely before you continue the system.

--Andrew
============================================================================
From: phillips@athena.Qualcomm.COM (Marc Phillips)

If you have a true Sun workstation you can hit the L1 key and the "a" key
in combination to pause the machine. This stops everything. Add or remove
unmounted disks or tapes. Then type "c" to continue. Everything should
go smoothly.

Marc PHillips
============================================================================
From: George A. Planansky <gplan@aer.com>

I've removed, etc., stuff from a scsi bus, on our sun3's, and the worst
that's happened, were lots of scsi error messages. Once I plugged everything
back together, it all ran fine again. I think the main thing is to
sync disks before shutting down, if you shut down ... . Let me know
if you hear any horror stories -- I do cross my fingers sometimes :-) .

George Planansky
============================================================================
From: trr@lpi.liant.com (Terry Rasmussen)

Sounds like a good [ractice for doing things "on the fly."

-terry
============================================================================
From: Matthew Donaldson <matthew@cs.adelaide.edu.au>

Well, for a long time now we've added and removed disks on a running system
with few problems, like this: (it needs two people)

one person presses the break key on the console of the machine
the other person quickly removes/adds a scsi device
the first person then types 'c' or whatever to continue

The worst that generally happens is that we typed break in the middle of
a scsi request, in which case a message appears on the screen to the effect
that the request failed and is being retried. If a device is removed
and you try to access it, the kernel will mark the device as offline.

DISCLAIMER: I know little about the internal workings of the scsi protocol
             and how the kernel deals with it, so doing this may have
             effects I am not aware of.

                        -Matthew
============================================================================
From: Alastair Young <alastair@eucad.co.uk>

On SS1 and SS1+ machines you can blow the bus by unplugging things with the
machine powered up. Or at least that is what the engineer said we did to one
of ours.

Alastair Young
============================================================================
From: Wilson N G <noel@essex.ac.uk>

We adopt the regime of pressing L1-A, fiddling with the bus, and continuing
the machine. This seems to keep all devices happy.
============================================================================
From: kpc!kpc.com!cdr@uunet.UU.NET (Carl Rigney)

Its still risky - you're taking a gamble each time from what I've heard.

However, Andataco offers a product that lets you move disks & tapes in
and out with the power live and even change target IDs around without
needing to boot a changed kernal. You still have to unmount a disk
before you pull it out, of course. They probably have an 800 number.

I've never used Andataco products myself, but their demo at Sun Expo was
very convincing.

--
Carl Rigney
============================================================================
From: mis@seiden.com (Mark Seiden)

power cycling devices should work unless there the device is active or you use internally powered termination. (i.e. on the device).

removing tapes should work without any power cycling.

unmounting disks requires the cooperation and knowledge of the operating system.

unplugging wires will not work since it will cause all the signals in the bus to bounce around and will upset any semblance of termination.

also you might blow a fuse on some sparcstation cpu boards.

there are a couple scsi switches and (effectively) routers which let you share devices among several scsi busses without ill effects.

-- mark seiden, mis@seiden.com, 1-(203) 329 2722 (voice), 1-(203) 322 1566 (fax)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:19 CDT