SUMMARY:Number crunching hangs up SS5 with SunOS 4.1.3U1 [Firmware Bug]

From: Born Again Hacker (janaka@ee.pdx.edu)
Date: Tue Dec 20 1994 - 07:45:11 CST


Quite a while back, I had posted a question (duplicated at the end of this
message) about mysterious system hangs that had crippled the SS5s in our
brand new electronic instrumentation lab. Whenever Labview (from National
Instruments) did serious number crunching, the SS5 hung so hard that it
required a power cycle to get back up again.

After a long and tortured process, we discovered that our SS5s indeed
possessed a firmware problem. (BugID 1151654) If a certain rarely
used quad floating point op was attempted, if the conditions were
right, the machine would hang hard. Labview was doing this, pretty
much at will. This bug was apparently present in all SS5s that have a
firmware rev lower than 3.2. (All ours were 2.5.)

We limped along with some Sun loaners while Sun tried to either get hold
of modules that were fixed or tried to convince National Instruments to
code around the bug.

Finally, months after the initial bug report, we received our properly
up-revved modules. Labview worked fine on them. So, our SS5s are happy
once more, although the first outing of our lab was a complete disaster.
The time spent reconfiguring machine after machine to isolate the problem,
then reconfiguring around the loaners, and then reconfiguring it all back
again was really costly for us.

After all this and the MBONE audio problems we've had with SS5s, I hate 'em!!

-janaka

[Here's the original message. Many thanks to everyone who replied. Although
they didn't directly point to the problem, many of them helped us down the
path to tracking down the bug.]

> From: janaka@ee.pdx.edu
> To: sun-managers@eecs.nwu.edu
> Cc: sun-managers@eecs.nwu.edu, cat@ee.pdx.edu, marcins@ee.pdx.edu
> Subject: Number crunching hangs up SS5 with SunOS 4.1.3U1
>
> We recently got a number of SS5s to equip a lab to be used as a computer
> controlled electronics lab. We are running a software product called
> Labview (from National Instruments) which uses either a sbus based GPIB
> card or an ethernet based GPIB gateway to control the instruments.
>
> We started noticing that the SS5s can be made to routinely hang up (lock up
> so hard that only a power cycle can wake it up again) by running Labview and
> trying to read information from a scope. If various number crunching
> elements are removed from the input filters in Labview, it seems to work
> okay. The minute the number crunching elements are added to the labview
> pipeline, the SS5 hangs up. This happens with both the GPIB and ethernet
> based communications media to the instruments.
>
> The same configurations work without any problems on Sparc IPCs in the
> same lab.
>
> I tried both generic 4.1.3U1 and with the ms2 patch (on the CD) as well.
>
> Anyone run into a problem like this? Found a miracle patch? We'd love
> to know.
>
> Thanks in advance.
>
> -janaka
>
                                                           Janaka Jayawardena
LOCAL: janaka Director of Computer Systems CS/EE
INTERNET:janaka@ee.pdx.edu (503)-725-5410
USNAIL: Portland State University (EE), P.O.Box 751, Portland, OR 97207
=============================================================================



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:17 CDT