SUMMARY: Xylogics 753 controller not responding

From: Dave Meer (meer@sun-valley.stanford.edu)
Date: Wed Mar 20 1991 - 15:57:15 CST


First, a copy of my original posting:
______________________________________________________________________________
        Once in January and twice in the last 2 days, our 4/370 server has
        died or come very close to dying due to problems with our disk
        controller. The output into /var/adm/messages reads, typically:
        
Mar 9 23:05:24 sun-valley vmunix: xdc0: returned unmatched iopb addr fff003cc
Mar 9 23:05:32 sun-valley vmunix: xdc0: controller not responding
Mar 9 23:07:34 sun-valley last message repeated 5 times
Mar 9 23:08:18 sun-valley vmunix: xdc0: controller not responding
Mar 9 23:14:49 sun-valley last message repeated 12 times
Mar 9 23:15:19 sun-valley vmunix: xdc0: controller not responding
Mar 9 23:21:06 sun-valley last message repeated 12 times
Mar 9 23:21:24 sun-valley vmunix: xdc0: controller not responding
Mar 9 23:28:10 sun-valley last message repeated 10 times

        The unmatched iopb addr message appeared 2 out of 3 times. The other
        occurrence simply started with the controller not responding message.

        This continues for about seven hours. By this time, system performance
        has degraded to the point where logging in to the server is impossible.
        The first time, a user who was here early in the morning managed
        to L1-A and reboot the server. This morning, a different user
        reported that L1-A had no effect and he had to resort to power-cycling
        the machine. Unfortunately, I haven't been around with the system
        in this state to see exactly what's happening on the server.
        
        The machine is a 4/370 running SunOS 4.1. The controller,
        sitting in slot 7, is a Xylogics 753 with PROM E2186 2.22.
        The 753 has 3 Fujitsu M2382K disks on it.
        
        We've been running this setup since November without any problems.
        Has anyone run into this problem before? Is my controller going bad?
        Is this a problem with the software driver? Do I have the right
        PROM rev?
        
        Any solutions/suggestions/wild guesses greatfully accepted.

                                                        -Dave
______________________________________________________________________________

Next, thanks to all those who replied:
        
        tsacas@issy.ilog.fr (Stephane Tsacas)
        From: Paul Quare <pq@computer-science.manchester.ac.uk>
        sundev!fletch!kevin@Sun.COM (Kevin Sheehan {Consulting Poster Child})
        curt@ecn.purdue.edu (Curt Freeland)
        Stuart McRobert <sm@doc.imperial.ac.uk>
        
Finally, a summary of the responses:

        A couple people suggested checking the bus grant jumper on the
        backplane, particularly if we had just installed any new
        boards. This was not the case for us. Curt Freeland pointed
        out a few problems with the xd disk driver in his response:
>Welcome to the world of the "unfinished" xd disk driver. If you have
>source code, look at xd.c sometime. We have seen this problem, as have
>others. We are working on instrumenting the driver to see how we get into
>those states. It looks like it will happen on 3/XXX 4/XXX machines with
>enough load (we have seen it on everything but a 3/100 system). We have
>inserted halts at the point where those errors occur (and some printf's
>so we can see what is happening). I recommend you do not do a sync
>(or even a g0 to force a sync), as we usually end up with a trashed
>super block upon doing so after one of these errors. We typically do an
>L1-A and k2.
        Others suggested possible problems with the disk controller
        or the cabling.
        
Our solution:
        Since the 753 is under maintenance, we had a new one sent
        out. It featured PROM rev 2.3. We also switched one of
        the 3 disks to a 7053 controller. So far, the problem has not
        recurred. We'll just have to wait and see.
        
        
                                        Thanks for all the help,
                                        
                                                -Dave
David Meer
Aerospace Robotics Laboratory
Durand Building, Room 017
Stanford University
Stanford, CA 94305
e-mail: meer@sun-valley.stanford.edu



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:12 CDT