SUMMARY - SCSI timeouts on SS1+ causing kernel panic.

From: kean@talon.ucs.orst.edu
Date: Wed Jun 12 1991 - 11:20:43 CDT


Sorry for the great delay in summarizing. Being the only unix
admin/informed-type in a group is hell on task followthrough.
 
In late March 1991 I wrote:

>I have a Sparcstation 1+ running the 4.1.1 GENERIC kernel
>(no patches) with 2 105Mb internal Quantums and two CDC
>Wren V's in an external shoebox. Recently, the 1+ has been
>dying with a BAD TRAP error message as shown below. The
>panic usually (14 out of 17 times) follows the scsi driver
>message "esp0: Disconnected command timeout for Target 2 Lun 0",
>where target 2 is a Wren V. This is the only device on the
>scsi bus that times out. There are times when the timeout
>hasn't caused a kernel panic but they are few and far between.

>All of this points to problems with the Wren V. Has anyone
>else seen something similar to this? What are some possible
>causes for the command timeout?

Bzzzt. The symptoms were caused by two problems. The Wren V was fine,
but the scsi cabling wasn't. The 4.1.1 scsi driver also has problems with
marginal scsi busses. The ribbon cable connecting the Wren to the DB-50 plug
on the shoebox was frayed at the Wren end. I ordered a replacement cable and
installed patch 100243-01 (which fixed the immediate problem). Thanks to:

Kevin Sheehan synergy!kevin@Sun.COM
Randy Holt randy@den.mmc.com
Ron Gaug ron@sarah.lerc.nasa.gov

A useful tool is Sun's patch/problem report system. It can be reached at
1-800-477-4768, login guest, and has a simple menu interface that allows you
to look at known bugs, order patches on disk, tape or email, and report bugs
as well. Useful.

Here is the README for patch 100243-01:

Patch-ID# 100243-01
Keywords: esp scsi recovery
Synopsis: SunOS 4.1.1 sun4c:esp host adapter can cause panic during error recovery
Date: 11-Mar-91
 
SunOS release: 4.1.1
  
Unbundled Product:
 
Unbundled Release:
 
Topic: scsa/esp host adapter
 
BugId's fixed with this patch: 1046580,1048141,1046305

Architectures for which this patch is available: sun4c

Patches which may conflict with this patch:

Obsoleted by: SVR4, 4.1.2

Problem Description:

 1046580:
     During some portions of SCSI error recovery, the target driver
     can attempt try and get the host adapter driver to send either
     a BUS DEVICE RESET message or a ABORT OPERATION message to
     a target that appears to have had a command time out while
     disconnected.

     The problem is that the code in esp.c that forms a proxy command
     to send to the target has a bug in it which can write random values
     over a random portion of the esp's softc structure. This can wipe
     out portions of important data in the softc structure- including
     putting a garbage value into a pointer the DMA gate array CSR.
 
 1048141: esp does not always recognize a marginal SCSI bus
 1046305: some XXgetcap cases reversed. Only affects 3rd party SCSI target
           drivers.
        
 
INSTALL:
    
    as root:

    mv /sys/sun4c/OBJ/esp.o /sys/sun4c/OBJ/esp.o.orig
    cp sun4c/esp.o /sys/sun4c/esp.o
    chmod 444 /sys/sun4c/esp.o

  Rebuild and install a new kernel and reboot the system.
  Please refer to the Systems and Networking Administration
  Manual on building and installing a custom kernel.

Kean

Kean Stump (503)-737-4740
OSSHE Network Operations Center DOMAIN: kean@ucs.orst.edu
Oregon State System of Higher Education UUCP: hplabs!hp-pcd!orstcs!kean



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:15 CDT