SUMMARY: SCSI errors

From: Mark C. Farone (farone@gainesville.fl.us)
Date: Thu Nov 20 1997 - 11:30:22 CST


SUMMARY OF PROBLEM:

SCSI tagged queuing cmd timeout errors (see orignal post below).

ATTEMPTS AT A SOLUTION:

 o Disable tagged queuing (TQ) for the entire system:
   -Add this to /etc/system.
         set scsi_options=0x80

Temporarily turning off TQ provided a quick solution, but subtantially
degraded performance. Solaris also sent 10 Warning messages to politely
acknowledge that TQ was disabled.

 o Throttle the number of TQ commands:
   -Add this to /etc/system:
        forceload: drv/esp
        set sd:sd_max_throttle=10

This reduced the number of allowed TQ commands, but I still received
timeout errors.

 o Disable TQ for a specific target or controller:
   -Add this to /kernel/drv/esp.conf for a specific target
         name="esp" parent="/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000"
         reg=0xf,0x800000,0x40
         target1-scsi-options=0x58
         scsi-options=0x178;

This option was supposed to turn off TQ for the specific target, but it
turned off TQ for the entire controller. I tried tweaking it, but I wasn't
able to turn off TQ for a specific target. I don't know if it was my poor
kung-fu or the extent of the problem.

BOTTOM LINE:
   1. One of the disks did not seem to properly support tagged queuing.
Turning off TQ on the entire controller was necessary to support this
not-fully-SCSI-2 disk.
   2. The DAT drive (which I didn't suspect at first) is having SCSI-level
hardware trouble. To say it another way, this DAT causes the same TQ
errors on *other targets* when attached to my test system (a SS10).

KUDOS TO:
David Schiffrin <daves@adnc.com> (thanks for the resend, too)
Joel Lee <jlee@thomas.com>
Sanjay Srivastava <sanjays@netcom.com>
bismark@alta.Jpl.Nasa.Gov (Bismark Espinoza)

At 2:35 PM -0500 11/12/97, Mark C. Farone wrote:
>Hello, all.
>
>I have a SparcStation20 running Sol2.5.1, primarily as a host for Sybase
>SQL Server.
>
>Periodically for the past 2 weeks, when writing into the raw disk used by
>Sybase at c1t5d0s5, I get this message:
>
>Nov 12 13:42:23 sun3 unix: WARNING:
>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100
>0/esp@0,80000 (esp1):
>Nov 12 13:42:23 sun3 unix: Disconnected tagged cmds (8) timeout for
>Target
>5.
>Nov 12 13:42:24 sun3 unix: 0
>Nov 12 13:42:24 sun3 unix: WARNING:
>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100
>0/esp@0,80000/sd@5,0 (sd20):
>Nov 12 13:42:24 sun3 unix: SCSI transport failed: reason 'timeout': re
>Nov 12 13:42:25 sun3 unix: trying command
>Nov 12 13:42:25 sun3 unix: WARNING:
>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100
>0/esp@0,80000/sd@5,0 (sd20):
>Nov 12 13:42:25 sun3 unix: SCSI transport failed: reason 'reset': retr
>Nov 12 13:42:25 sun3 unix: ying command
>
>
>What I have tried:
> 1. Upgraded to new harddisks (which I had planned to do anyway).
> 2. Tried new cables.
> 3. Tried new active terminators.
> 4. Tried reseating the card.
> 5. Tried putting the disks on another controller (c0). In this case, I
>get the same error, just specific to c0. In fact, I moved everything off c1
>and put them all on c0 (which, btw is where the / fs lives).
>
>For what it's worth, currently I have a DAT drive at c1t4, and harddisks at
>c0t3, c1t1 and c1t5.
>
>It appears that regardless of the controller, disks, or cables, I get this
>error which points to the raw disk used by Sybase.
>
>An impact of this problem is that Sybase blocks all other spid's until the
>SCSI times out (between 1-2 minutes!) during which the spid is unkillable.
>
>Of course, this isn't happening on any other machines with exactly the same
>hardware and software setup.
>
>Thanks for *any* help.
>
>--
>Mark C. Farone Hooked on fishin'
>Systems Analyst, Gainesville Sun Not drugs.
>farone@gvillesun.com

--
Mark C. Farone                               Why read when you can
Systems Analyst, Gainesville Sun             Just sit and stare at things?
farone@gainesville.fl.us                 			-schwa



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:09 CDT