Summary: SCSI transport "reset" errors

From: Art Hebert (art@infinity.com)
Date: Wed Jul 19 1995 - 11:56:11 CDT


 Original question:

> I am getting these errors when attempting to dump a disk to an
> external Seagate ST32550N 2gb disk drive. The system then panics.
>
> Solaris 2.4 all recommended patches installed, Sparc 20. It also
> is happening on configurations other than Solaris.

 Possible Solution:

 This fixed the problem:

 I modified the /etc/system file as follows:

 forceload: drv/esp
 set sd:sd_max_throttle=10

 This fixed the problem.

 I haven't tried any of the following but will test them:

 Many Thanks to:

Dan Razzell <razzell@cs.ubc.ca>
poffen@San-Jose.ate.slb.com (Russ Poffenberger)
Travis <tehoyt01@msuacad.morehead-st.edu>
Steve_Pyrczak@racesmtp.afsc.noaa.gov
dzambon@hawkeye (Dan A. Zambon)
Adrian Lee <adrian@cs.uq.oz.au>

 There comments follow:

From: Dan Razzell <razzell@cs.ubc.ca>
To: Art Hebert <art@infinity.com>
Subject: SCSI transport "reset" errors
Mime-Version: 1.0 (Generated by Ean X.400 to MIME gateway)
Content-Length: 2621

There is a problem in the implementation of Tagged Command Queueing
in Seagate Barradudas. Seagate steadfastly pretends surprise at this.
However, it has been widely reported, not only on Sun hosts, but also
DEC, HP, and IBM. I have accumulated quite a collection of messages
on this problem.

The SPARC Drivers group at Sun tells me that Seagate TCQ is flaky.
Indeed, Sun puts its own firmware on Seagate drives that it resells.

Whether a given drive is affected depends on what firmware revision it
is running. You can determine the rev of your drive by running format
and giving the "inquiry" command. I believe that for the ST32550N,
rev 17 is known to be good, and some people have found rev 13 to work
also. The information is somewhat imprecise because not everyone fully
identifies their drive model when reporting a problem, and Seagate rev
numbers differ among every drive type and model.

Whatever, you can easily determine for yourself whether your drives have
a TCQ problem by disabling TCQ in the host. The simplest way to do this
in Solaris 2.4 is to put the following line in /etc/system and reboot:

  set scsi_options=0x378

This disables TCQ globally. The meaning of these bits is described in
/usr/include/sys/scsi/conf/autoconf.h.

Depending on your SCSI host adapter, you may also be able to disable TCQ
for individual adapters or drives. That's unnecessary to prove the
existence of a problem with the drive, but might be nice if some drives
will tolerate TCQ and others not. Consult the man page for that adapter.

Seagate claims in its product literature that the Barracuda drives support
TCQ. If your drive has a TCQ problem, you are therefore in a strong legal
position to insist that Seagate remedy it.

Seagate will not disclose to us nor to our reseller what its warranty policy
is with respect to these drives. However, without admitting that there is
a problem, it recently replaced two of our drives. This did not happen
because it was clear that we knew what we were talking about, it happened
only when it became evident that we were preparing to take legal action.

I hope you won't find things quite so difficult.

     .^.^. Dan Razzell <razzell@cs.ubc.ca>
    . o o . Laboratory for Computational Intelligence
    . >v< . University of British Columbia
_____mm.mm_____ http://www.cs.ubc.ca/nest/lci

********

Seagate has a web server that has product specs and jumper layouts, it is
accessed via "http://www.seagate.com".

However, I looked, and they don't mention a jumper to disable the cache. It
may only be changeable via the SCSI interface (you would need a program to do
this, I know that the Adaptec EZ-SCSI utilities on a PC can modify these
settings.)

--
Russ Poffenberger               DOMAIN: poffen@San-Jose.ate.slb.com
Schlumberger Technologies ATE   UUCP:   {uunet,decwrl,amdahl}!sjsca4!poffen
1601 Technology Drive		CIS:	72401,276
San Jose, Ca. 95110             Voice: (408)437-5254  FAX: (408)437-5246

********

I had the exact same error. My problem was that I had an exabyte drive between my cpu and Seagate 9.0Gb drive. The disk drive can talk at 10m/s but the tape drive cannot and it gets confused and offlines itself. The solution to my problem was either to move the drive closer to the cpu, i.e. put the tape drive (this could be another slow device such as an external cdrom) at the end of the chain. If you cannot move it, then you can create the file /kernel/drv/esp.conf and put this in it:

scsi-options=0x178;

This will slow down the chain to 5m/s yet still support Fast SCSI-2. DON'T forget the ";". I was told that it is VERY important...I don't know what will happen if it's not there, but I didn't try and I'm sure you wouldn't want to either. :) Good luck!!

Travis

*******



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:29 CDT