SUMMARY: SCSI reconnect/reset error

From: Bradley Beattie (beattie@poincare.mskcc.org)
Date: Wed May 03 1995 - 09:23:24 CDT


Thanks to all who replied. A full 50% of the respondents correctly
identified the problem to be with the drive's firmware. This was a great
surprise to me because the problem disk had a rev level of 14, whereas, the
otherwise identical correctly functioning drive had a rev level of 13.
Several respondents indicated that rev levels less than 17 are supposed to
be problematic (maybe 13 is an exception). Three respondents correctly
associated the errors with tagged queueing. I wish to extend a special
thanks to Zoltan who sent a detailed description on how to disable tagged
queueing, which I have summarized below. Perhaps someone can comment on exactly
what tagged queueing is and what impact disabling it will have.

Thanks again.
Brad

--- Respondents ---

vzspa@calgary.chevron.com (zoltan s. palmai)
Birger.Wathne@vest.sdata.no (Birger A. Wathne)
Mr T Crummey (DIJ) <tom@sees.bangor.ac.uk>
Todd Pfaff <todd@water.eng.mcmaster.ca>
Kevin.Sheehan@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
carlo@hub.eng.wayne.edu (Carlo Musante)
Al.Venz@seag.fingerhut.com (Al Venz)
Andrew Weston <andreww@adacel.com.au>
irana@hydres.co.uk
Nino Margetic <nino@well.ox.ac.uk>

--- Original Question ---

> Hello SCSI Experts,
>
> We're having some problems with one of our SEAGATE-ST15150N drives. When
> under moderately heavy access (i.e. ufsrestore, du -s) the drive will cause
> a series of
>
> WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):
> No command for reconnect of Target 1 Lun 0
> WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0 (sd1):
> SCSI transport failed: reason 'reset':
> retrying command
>
> messages to be sent to the console ultimately resulting in a system crash.
>
> I strongly suspect it to be a hardware problem since our other ST15150N works
> fine and I've tried a variety of cable/termination and device combinations, all
> with no effect. I returned the drive (still under waranty) and they adjusted
> it's power supply, replaced it's fan and gave it a clean bill of health.
> Connected to our SPARC 20 however, it is still a sick puppy.
>
> I'd appreciate it if someone could explain the meaning of the above messages,
> perhaps providing some direction in identifying the source of the hardware
> problem (if it is one).
>
> We're running Solaris 2.3

--- To Turn Off Tagged Queueing ---

1-To check the current options settings, as root type:
 
        # adb -k /kernel/unix /dev/mem
        scsi_options/X
        $q (to exit adb)

The "scsi_options/X" command to adb causes a hexadecimal value to be
displayed which identifies the currently set scsi options (see table below).

SCSI option value to set the corresponding bit to 1
Disconnect/reconnect 0x008 (bit3=1, starting with bit 0)
Linked commands 0x010 (bit4=1)
Synchronous transfer 0x020 (bit5=1)
Parity 0x040 (bit6=1)
Tagged Queuing 0x080 (bit7=1)
Fast scsi 0x100 (bit8=1, or bit 9 if starting with 1)
Wide scsi 0x200 (bit9=1)
 
2-To disable tagged queueing add/modify the following line in /etc/system and
  reboot:
 
        set scsi_options=0x378

  Note: 0x3f8 is the default for Solaris 2.3



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:23 CDT